Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2025 Nov 6;26(6):bbaf578. doi: 10.1093/bib/bbaf578

TMBclaw: tumor clone-aware graph learning improves immunotherapy response prediction across heterogeneous cohorts

Yixuan Wang 1,#, Tianyi Zhu 2,#, Xiaofeng Song 3, Peng Chen 4, Xiaoyan Zhu 5, Zhili Chang 6,7, Xiaonan Wang 8,9, Xin Lai 10,, Jiayin Wang 11,
PMCID: PMC12596255  PMID: 41206114

Abstract

Immune checkpoint inhibitors (ICIs) have emerged as a cornerstone of modern oncology, necessitating the development of robust biomarkers for optimizing patient stratification and treatment selection. While tumor mutation burden (TMB) has demonstrated prognostic value, conventional quantification methods based on mutation counts fail to reflect immunogenic neoantigen presentation due to intratumoral clonal heterogeneity. Recent efforts have focused on mutation subsets derived from tumor clonality, yet the complex interactions among clones remain a significant obstacle to accurate prognosis. This challenge is further exacerbated by the inherent constraints of limited cohort sizes in clinical studies, which severely compromise model generalizability across heterogeneous cohorts. Therefore, we propose TMBclaw (Tumor Mutation Burden–based Clonal attention with Laplacian Adaptive Weighting), a graph-regularized multi-task learning framework for immunotherapy response prediction. TMBclaw establishes unified integration of group-structured cohorts while enabling cross-cohort knowledge transfer and clonal relationship exploration. For clinical validation, we utilized four cohorts of 238 patients with non–small-cell lung cancer (NSCLC), melanoma, or nasopharyngeal carcinoma treated with ICIs, along with external multicenter validation cohorts (N = 1433) of melanoma and NSCLC patients from public datasets. Comparative analyses demonstrate that TMBclaw significantly outperforms conventional methods in prognostic accuracy and risk stratification. Through systematic quantification of clonal dynamics and discriminative identification of driver clones, TMBclaw shows potential to improve understanding of tumor heterogeneity and provides interpretable insights into the immunotherapy process.

Keywords: tumor mutation burden, tumor clonal heterogeneity, clinical decision-making, multi-task learning, graph Laplacian regularization

Introduction

Immune checkpoint inhibitors (ICIs) have revolutionized advanced cancer treatment, achieving durable responses in 20%–40% of patients [4, 8]. However, a substantial proportion derives limited clinical benefits despite the considerable healthcare expenditures [21, 43]. Accurate prediction of ICI efficacy remains an unmet clinical need that could enhance precision oncology by enabling clinically actionable stratification of patients based on their likelihood of therapeutic benefit [51, 53]. Tumor mutation burden (TMB), quantified as the total number of non-synonymous mutations per megabase [1, 5, 46], has gained regulatory recognition through the National Comprehensive Cancer Network guidelines and the U.S. Food and Drug Administration approvals as a predictive biomarker [23, 42]. However, the clinical utility of TMB quantification remains controversial.

Emerging insights from immuno-oncology reveal that not all mutations are created equal in their immunogenic potential [22, 26]. Mutations from various clones generate distinct neoantigens and trigger varying ICI responses [16, 24, 29]. This biological nuance exposes the critical limitation of conventional TMB quantification, which indiscriminately aggregates mutations regardless of their clonal status, potentially obscuring clinically relevant signals [3, 6, 28]. Clonal TMB (cTMB), defined by truncal mutations present in all tumor cells, has been shown to correlate with ICI response in multiple studies [2, 34, 52], yet current evidence remains inadequate to establish its superiority over conventional TMB conclusively. While subclonal TMB shows promise in predicting long-term response [44], its spatial limitation prevents it from fully representing the entire tumor. The complexity of neoantigenic heterogeneity due to clonal architecture requires further mechanistic investigation to enhance predictive accuracy.

In addition, clinical implementation of immunotherapy faces multifaceted challenges as patient cohorts are frequently constrained by substantial costs and adverse effects, resulting in limited sample sizes. Such data scarcity predisposes neural network models to overfitting through memorization of training patterns rather than learning generalizable features [54]. On top of that, technical variability, such as variations in panel size, Next-Generation Sequencing (NGS) platforms, and tumor types across cohorts, induces significant inconsistencies in TMB quantification [50]. This multidimensional heterogeneity compromises clinical decision-making for ICIs [20] and makes naïve pooling of datasets ineffective. Balancing data augmentation with accommodating cohort heterogeneity has posed a significant challenge for developing reliable predictive models.

To address the issues of predicting immunotherapy responses in heterogeneous cohorts, we propose TMBclaw (Tumor Mutation Burden–based Clonal attention with Laplacian Adaptive Weighting), a hierarchical, graph-regularized multi-task learning (MTL) framework. TMBclaw captures cohort heterogeneity via an undirected graph in which nodes represent patient cohorts and weighted edges encode their similarities. We introduce an edge-sensitive graph Laplacian regularization that optimizes information flow across cohorts, enabling effective knowledge transfer while preserving distinct biological characteristics through adaptive penalty terms [45, 55]. At the level of tumor clonal heterogeneity, TMBclaw integrates clonal genomic features using an attention-based multiple-instance learning (MIL) framework [19, 41], adaptively weighting clone contributions. This dual-layer design captures both broad population patterns and critical clone-specific features. Clinical validation was performed using four institutional cohorts (N = 238) from the Sun Yat-sen University Cancer Center and the Second Affiliated Hospital of Xi’an Jiaotong University, involving non–small-cell lung cancer (NSCLC), melanoma, and nasopharyngeal carcinoma (NPC) [12, 13, 27]. We also collected 1433 samples from public studies to form group-structured data [17, 18, 25, 35–38, 40, 47, 48]. The results demonstrate the superior and robust performance of TMBclaw in ICI response prediction and risk stratification, with interpretability in its analyses. The source code can be downloaded from https://github.com/AadSama0404/TMBclaw.

Methods

TMBclaw is a neural network model for prognostic prediction by jointly modeling tumor clonal and cohort heterogeneity within a hierarchical framework. The model simultaneously processes clonal mutation profiles and inter-patient variability, capturing their nonlinear interactions to generate more accurate and clinically generalizable predictions. TMBclaw provides a patient’s likelihood of responding to ICIs, outputting a binary classification (non-responder or responder) and a sample-specific response probability. The complete architectural implementation is visualized in Fig. 1. Panel A represents the Cohort Heterogeneity Layer of TMBclaw, where all cohorts are integrated, with each cohort treated as a distinct prediction task. MTL is employed to enable information sharing across cohorts while preserving their unique characteristics. Panel B depicts the Tumor Clonal Heterogeneity Layer of TMBclaw, which is task specific and focuses on a single cohort. This layer applies MIL to process individual patient data and adapt to clonal genomic features. The resulting clone-specific attention weights are subsequently fed into the final classifier to predict immunotherapy response. Together, Panels A and B constitute the Training Pipeline of TMBclaw. Panel C illustrates the Recall Pipeline, where new samples are predicted by their task-specific models if available or by the most similar task model if unseen.

Figure 1.

Presents the TMBclaw architecture. Panel a connects patient cohorts by a similarity graph, enabling multi-task learning with graph Laplacian regularization to share information across related cohorts. Panel B processes clonal and subclonal mutations using attention-based multiple-instance learning to derive clone-level weights, which are aggregated and used in a classifier to predict immunotherapy response. Panel C illustrates the recall pipeline, where a new sample is assigned either to its cohort-specific model or, if unseen, to the most similar cohort model for prediction.

TMBclaw architecture overview. (A) Cohort heterogeneity layer: Patient cohorts are modeled as nodes in an undirected graph, with edge weights reflecting inter-cohort similarity. Multi-task learning with graph Laplacian regularization enables robust training through structured information sharing. (B) Tumor clonal heterogeneity layer: For a patient in a specific cohort, clonal mutations are integrated via multiple instance learning. Feature extraction is performed by two fully connected layers, followed by an attention mechanism that quantifies clone-specific contributions. The final classifier outputs a binary response prediction (0 = non-responder, 1 = responder). (C) Recall pipeline: For a new sample, cohort membership is first determined. If the sample belongs to a known cohort, prediction is made by its task-specific model. If from an unseen cohort, similarity to existing cohorts is computed and the most similar cohort model is used to generate the prediction.

Patient cohort characteristics and data preprocessing

In this study, we compiled 238 samples from various patient cohorts. These included 75 patients diagnosed with NSCLC and 26 patients with melanoma from the Second Affiliated Hospital of Xi’an Jiaotong University, sequenced at the Geneplus-Beijing Institute. Additionally, 73 NSCLC patients and 64 patients with pharyngeal or metastatic nasopharyngeal carcinoma (NPC) were sourced from the Sun Yat-sen University Cancer Center. All patients were treated with anti-PD-(L)1, anti-CTLA-4 therapy, or monotherapy between December 2015 and January 2018. Eligible patients met the following criteria: (i) age between 18 and 70 years; (ii) an Eastern Cooperative Oncology Group performance status of 0–1; (iii) histologically or cytologically confirmed diagnosis of NSCLC, melanoma, or NPC with either metastatic disease or locoregional recurrence; (iv) progression after at least one prior systemic treatment; (v) measurable disease by radiological assessment. Exclusion criteria included central nervous system metastases, prior malignancy, autoimmune disorders, history of immunotherapy, active tuberculosis infection, pregnancy, or current use of immunosuppressive agents. These patients were organized as four related yet distinct cohorts based on disease type or treatment regimen, which resulted in Experimental_Cohorts.

In addition, we incorporated external multicenter validation cohorts comprising 1433 patients from previously public retrospective immunotherapy studies. These external cohorts included 713 patients with NSCLC and 720 patients with metastatic melanoma, referred to as External_NSCLC_Cohorts and External_Mel_Cohorts, respectively. All of them had received ICIs. Treatments consisted of anti-PD-(L)1 therapy, anti-CTLA-4 therapy, or combination anti-CTLA-4/anti-PD-(L)1 regimens, with a small subset receiving other agents.

Patient responses to ICIs were categorized according to the Response Evaluation Criteria in Solid Tumors version 1.1 [10], with complete response and partial response encoded as 1 (responders), and stable disease and progressive disease encoded as 0 (non-responders). The survival endpoint was defined as progression-free survival (PFS) with event status, where event includes disease progression, tumor recurrence, or all-cause mortality [9].

The clinical characteristics of these cohorts are summarized in Table 1, while basic demographics and patient-level datasets are comprehensively documented in Supplementary Table S1.

Table 1.

Summary of study cohort information

Cohort type Cohort size Cancer type Responder, n (%) Non-responder, n (%) Cohort source
Experimental 75 NSCLC 25 (33.3) 50 (66.7) The Second Affiliated Hospital of Xi’an Jiaotong University
26 Melanoma 9 (34.6) 17 (65.3) The Second Affiliated Hospital of Xi’an Jiaotong University
73 NSCLC 13 (17.8) 60 (82.1) Fang et al. [12]
64 NPC 8 (12.5) 56 (87.5) Fang et al. [13]; Ma et al. [27]
External_NSCLC 16 NSCLC 10 (62.5) 6 (37.5) Rizvi et al. [37]
68 NSCLC 24 (35.3) 44 (64.7) Hellmann et al. [17]
227 NSCLC 69 (30.3) 158 (69.6) Rizvi et al. [36]
156 NSCLC 56 (35.8) 100 (64.1) Samstein et al. [40]
246 NSCLC 61 (24.7) 185 (75.2) Vanguri et al. [48]
External_Mel 37 Melanoma 20 (54.0) 17 (45.9) Hugo et al. [18]
140 Melanoma 55 (39.2) 85 (60.7) Liu et al. [25]
70 Melanoma 15 (21.4) 55 (78.5) Riaz et al. [35]
48 Melanoma 9 (18.7) 39 (81.2) Roh et al. [38]
105 Melanoma 17 (16.1) 88 (83.8) Van Allen et al. [47]
320 Melanoma 195 (60.9) 125 (39.0) Samstein et al. [40]

As shown in Table 1, the proportion of responders (label = 1) varied considerably among cohorts, ranging from 12.5% to 62.5%, with the majority below 50%, reflecting both the presence of class imbalance and the heterogeneity in cohort composition.

For each patient, somatic mutations were first subjected to clonal architecture inference using the PyClone algorithm [39] to delineate clonal mutation profiles from bulk sequencing data. The mutation count, average cellular prevalence, and average variant allele frequency were subsequently derived for each clone. We identified the clone with the highest average cellular prevalence as the master clone, which is associated with the cTMB, and the remaining clones were attributed to subclonal TMB. Subsequently, the mutation counts underwent feature scaling to ensure consistency in range for gradient descent optimization, while accounting for the influence of outliers. Supplementary Table S2 details patient-level clonal weights derived from TMBclaw across different cohorts.

Tumor clonal heterogeneity: MIL framework with attention pooling

For each patient in a specific cohort, TMBclaw adopts an MIL framework with attention-based pooling to model tumor clonal heterogeneity. Specifically, each patient is represented as a “bag” with the ICI response status as the bag-level label. Within each bag, individual tumor clones represent “instances” whose combined features determine the patient’s clinical outcome. This structure allows the model to handle the absence of instance-level annotations by leveraging the attention pooling mechanism to evaluate interactions among tumor clones dynamically, assign interpretable contribution weights, and identify key clones influencing the therapeutic response.

The cohort-specific MIL model is a modular architecture that consists of a feature extraction layer, a self-attention layer, and a bag-level classifier layer. This model allows the model to process the variability in the number of tumor clones (or instances) per patient, while the attention pooling mechanism ensures that the model remains permutation invariant.

Let Inline graphic represent the original mutation feature of tumor clone, and let Inline graphic denote the unknown instance-level label for each clone, for Inline graphic. In the binary supervised classification problem, the bag consists of a set of instances Inline graphic, with the sequences between instances being independent, and the number of clones Inline graphic varying across bags. The bag is assigned a binary label Inline graphic based on the aggregate response of the clones within it, with the assumption that a responder is identified when at least one clone shows a positive immune response:

graphic file with name DmEquation1.gif (1)

For illustrative purposes, we first consider the model formulation under a single cohort: Inline graphic.

The feature extraction layer comprises two fully connected layers, each followed by a ReLU activation function: the first maps the input feature space Inline graphic to a higher-dimensional feature space to capture richer feature information, while the second re-reduces the high-dimensional features to remove noise and redundant information [11]. The ReLU activation function can effectively overcome the gradient disappearance problem in deep networks of traditional activation functions [30].

A permutation-invariant pooling function is used to compute the bag-level representation in the MIL framework, ensuring that the representation is independent of the number of instances in the bag. A classifier with a Sigmoid activation function then predicts the positive immune response score Inline graphic. Given the class imbalance in clinical data, we propose to construct a cost-sensitive negative log-likelihood (NLL) loss function [33] that yields

graphic file with name DmEquation2.gif (2)

where Inline graphic denotes the weight hyperparameter to emphasize minority class samples during optimization; Inline graphic denotes the parameter matrix. Finally, the parameters are updated by a gradient descent algorithm.

Self-attention mechanism for inter-clone interaction

Building on the concept from Shao et al. [41], the self-attention mechanism is used to capture correlations between clones, which may provide more informative features and reduce uncertainty in the model [49]. Specifically, the inter-clone interactions are modeled as follows:

graphic file with name DmEquation3.gif
graphic file with name DmEquation4.gif
graphic file with name DmEquation5.gif
graphic file with name DmEquation6.gif
graphic file with name DmEquation7.gif (3)

where Inline graphic denotes hidden feature matrix obtained after the two-layer feature extraction module; Inline graphic, Inline graphic, and Inline graphic denote the query matrix, key matrix, and value matrix respectively; Inline graphic denotes the attention weight matrix which captures the relative importance of clones. To facilitate multiplication with the value matrix Inline graphic and quantify the weight of each clone more clearly, the attention weight is reshaped into a vector rather than a matrix:

graphic file with name DmEquation8.gif (4)

where Inline graphic denotes the clone weight vector, which is then multiplied by the value matrix Inline graphic to generate the bag-level representation.

Cohort heterogeneity: MTL with graph Laplacian regularization

For all patient cohorts, TMBclaw utilizes an MTL approach integrated with graph Laplacian regularization to address cohort heterogeneity. This method preserves the topological relationships among distinct patient cohorts through edge-weighted similarity metrics, effectively addressing training instability caused by limited cohort size while enhancing generalizability.

For Inline graphic learning tasks (i.e. Inline graphic cohorts), each task Inline graphic has a training set Inline graphic, where the goal is to learn the parameters Inline graphic that map the input Inline graphic to output Inline graphic [45]. The optimization objective for each task is given by

graphic file with name DmEquation9.gif
graphic file with name DmEquation10.gif (5)

where Inline graphic denotes the parameters vector of Inline graphic tasks; Inline graphic denotes the loss of Inline graphic-th task with the corresponding weight hyperparameter Inline graphic. By default, Inline graphic can be obtained by cross-validation, and a more interpretable method is cosine similarity.

Simple optimization of Eq. (5) leads to decoupling between tasks. To share information among tasks for joint learning, an additional regularization is generally added to constrain the tasks’ parameters:

graphic file with name DmEquation11.gif (6)

where Inline graphic denotes the regularization term; Inline graphic denotes the regularization hyperparameter that controls the balance between the loss function (first term) and the regularization (second term). A common strategy is to regularize the distance between parameters using the relationship among tasks.

Our study introduces a graph Laplacian regularization term to account for the differences between datasets. We construct an undirected graph Inline graphic, where the set of vertices Inline graphic corresponds to all cohorts and the set of edges Inline graphic reflects the similarity among cohorts [55]. To capture these correlations, we calculated several correlation matrices:

graphic file with name DmEquation12.gif
graphic file with name DmEquation13.gif
graphic file with name DmEquation14.gif (7)

where Inline graphic denotes the Laplacian matrix; Inline graphic denotes the diagonal Degree matrix with its diagonal element Inline graphic representing the sum of the similarities between Inline graphic-th cohort and all other cohorts; Inline graphic denotes the Similarity matrix with its element Inline graphic defined by the cosine similarity. To preserve the local topological relation among cohorts, we introduce a graph Laplacian and define a regularization term as follows:

graphic file with name DmEquation15.gif (8)

where Inline graphic denotes the sample processed in one iteration; Inline graphic denotes the output matrix. From the formula above, the graph Laplacian regularization promotes similar cohorts with similar predictions and corresponding model parameters. This smoothness assumption may effectively capture the latent structure between cohorts. The relationships among cohorts are preserved for implicit data augmentation by introducing graph Laplace regularization. The mathematical derivation and explanation are detailed in the Supplementary Materials. The total loss of TMBclaw is defined as

graphic file with name DmEquation16.gif (9)

Model implementation and training

To elucidate the unique advantages of TMBclaw over existing MTL approaches, we conducted a systematic comparison with conventional training strategies. Unlike traditional pooled or task-specific methods that represent extremes in MTL, TMBclaw offers a balanced alternative by integrating the benefits of dependency modeling and task-specific learning. Table 2 compares the modeling assumptions of the three training strategies:

Table 2.

Modeling assumptions of different training strategies

Strategy Model structure Parameter sharing Regularization
Pooled Single unified model Hard None
Task specific Task-specific models None None
TMBclaw Task-specific models Soft Graph Laplacian Regularization

To further elaborate:

  • (i) Pooled training strategy assumes a shared data distribution and applies a single model across all tasks, potentially overlooking task-specific heterogeneity;

  • (ii) Task-specific training strategy fully isolates each task, allowing task-specific modeling at the cost of ignoring shared signals across tasks; and

  • (iii) TMBclaw retains task-specific model structures while incorporating Graph Laplacian Regularization, which enables soft parameter sharing and explicitly models inter-task structure.

To provide a fair comparison with TMBclaw, we implemented two representative baseline models: MultiLayer Perceptron (MLP) and Support Vector Machine (SVM). The MLP consists of three fully connected layers with 64, 32, and 16 hidden units, respectively. Each hidden layer is followed by a ReLU activation. The final output layer employs a sigmoid activation for binary classification and MLP is trained using the Adam optimizer. Notably, MLP omits the MIL framework in TMBclaw by operating on pre-aggregated bag-level features and replaces the Attention Pooling module with a single fully connected layer.

The SVM is implemented using the sklearn.svm.SVC class from Scikit-learn. We use the default Radial Basis Function kernel and enabled probability output for compatibility with evaluation metrics.

In our implementation, all neural network models were updated using the PyTorch framework (version 1.8.1) and trained with the Adam optimizer, adopting the same hyperparameters recommended by the authors or by default. The computational environment was configured with Python 3.9, supported by key libraries including NumPy (version 1.26.4) for numerical operations, Pandas (version 2.2.3) for data handling, and Scikit-learn (version 1.1.1) for machine learning tasks. We employed a stratified 3-fold cross-validation procedure to reduce evaluation bias caused by specific train-test splits and ensure the results represent the overall dataset performance. We reported the average metrics across the folds. All random processes, including data shuffling and weight initialization, were controlled by fixing the random seed to ensure reproducibility.

Results

TMBclaw enhances predictive performance and robustness

We benchmarked TMBclaw, a regularized multi-task learning model, against baseline approaches grouped into two training strategies: (i) pooled training strategy (including pooled analysis, pooled MLP, and pooled SVM) and (ii) task-specific training strategy (including Separate Analysis, Separate MLP, and Separate SVM). Zero-padding was applied for the MLP and SVM baselines to ensure fixed-dimensional inputs. All evaluated models were strictly tested on identical patient sets to ensure a fair comparison.

Given the class imbalance commonly observed in clinical cohorts, we evaluated model performance using multiple classification metrics rather than relying solely on accuracy. Specifically, we calculated class-specific F1 scores (F1 score (0) and F1 score (1)), True Positives (TP), True Negatives (TN), Specificity (Sp), Sensitivity (Se), Positive Predictive Value (PPV), and Negative Predictive Value (NPV). In addition, we introduced the Distance to the Optimal Point (DOP) [15, 32] which is defined as

graphic file with name DmEquation17.gif (10)

This metric integrates Sp, Se, PPV, and NPV to comprehensively assess model performance. By minimizing DOP, we determined the optimal classification threshold, which was subsequently applied to the model’s predicted positive probabilities for label assignment. Table 3 presents the quantitative performance metrics of different models and training strategies.

Table 3.

Quantitative performance metrics of different models and training strategies

Experimental_Cohorts
Analytic models and strategies TMBclaw Pooled analysis Pooled MLP Pooled SVM Separate analysis Separate MLP Separate SVM
ACC 71.31% 68.35% 62.87% 66.24% 66.24% 62.87% 59.07%
F1 score (0) 0.7963 0.7768 0.7256 0.7509 0.7501 0.7295 0.6706
F1 score (1) 0.5144 0.4135 0.3514 0.3696 0.3361 0.3185 0.4522
TP 36 26 25 26 27 22 40
TN 133 136 124 131 130 127 100
DOP 0.7347 0.8746 0.9654 0.9012 0.8963 0.9887 0.8675
External_NSCLC_Cohorts
Analytic models and strategies TMBclaw Pooled analysis Pooled MLP Pooled SVM Separate analysis Separate MLP Separate SVM
ACC 66.76% 62.34% 65.73% 64.40% 66.48% 63.22% 62.78%
F1 score (0) 0.7496 0.7032 0.7460 0.7283 0.7450 0.7173 0.7120
F1 score (1) 0.5024 0.4678 0.4628 0.4791 0.4310 0.4693 0.4607
TP 114 113 101 112 98 110 110
TN 338 309 344 324 352 318 315
DOP 0.7870 0.8590 0.8390 0.8294 0.8430 0.8482 0.8617
External_Mel_Cohorts
Analytic models and strategies TMBclaw Pooled analysis Pooled MLP Pooled SVM Separate analysis Separate MLP Separate SVM
ACC 67.94% 60.11% 62.04% 61.74% 67.50% 67.21% 66.46%
F1 score (0) 0.6963 0.6444 0.6634 0.6440 0.7022 0.6938 0.6906
F1 score (1) 0.6569 0.5456 0.5621 0.5847 0.6374 0.6464 0.6333
TP 209 162 166 183 197 203 196
TN 251 245 254 235 260 252 254
DOP 0.6506 0.8161 0.7810 0.7735 0.6657 0.6619 0.6786

Values in bold represent the optimal performance.

In the Experimental_Cohorts, TMBclaw demonstrates consistently superior performance across multiple evaluation metrics. TMBclaw achieves the highest accuracy (ACC = 71.31%) with balanced detection rates (TP = 36, TN = 133), indicating effective classification of both responder and non-responder groups. This balanced predictive ability is further confirmed by the best F1 scores for both classes (F1 score (0) = 0.7963 and F1 score (1) = 0.5144), highlighting strong precision–recall characteristics. Most notably, TMBclaw attains the lowest DOP, comprehensively demonstrating its optimal balance between sensitivity, specificity, and predictive values.

Similarly, TMBclaw demonstrates consistent superiority across both the External_NSCLC_Cohorts and External_Mel_Cohorts, outperforming all comparator methods. This advantage is evident in its best performance across multiple evaluation metrics, particularly its lowest DOP, highlighting a superior overall balance between sensitivity, specificity, and predictive values.

We further extended our analysis to include the area under the curve (AUC), the area under the precision–recall curve (AUPRC), and detailed comparisons of sample-score distribution.

According to Fig. 2A, TMBclaw achieves an AUC of 0.68 across the Experimental_Cohorts, indicating its excellent predictive performance. Under the pooled training strategy, TMBclaw outperforms other models by 9%–20% in AUC compared to pooled analysis (AUC = 0.59), pooled MLP (AUC = 0.48), and pooled SVM (AUC = 0.59). This improvement likely stems from its ability to preserve cohort-specific characteristics while effectively accounting for cohort heterogeneity. Simultaneously, under the task-specific training strategy, TMBclaw achieves an 8%–18% higher AUC than other models by utilizing information from relevant tasks with Separate Analysis (AUC = 0.57), Separate MLP (AUC = 0.50), and Separate SVM (AUC = 0.60).

Figure 2.

Compares the predictive performance of TMBclaw with six baseline models across the Experimental_Cohorts. Panel a presents ROC curves for each model, with TMBclaw achieving the highest AUC of 0.68. Panel B shows the AUPRCs, where TMBclaw outperforms all baselines. Panel C displays violin plots of predicted scores for responders and non-responders, separated by model. Statistical significance between groups is evaluated using the Mann–Whitney U test, with TMBclaw showing the most significant separation.

Predictive performances of different models and training strategies across the Experimental_Cohorts. (A) Receiver operating characteristic (ROC) curves with corresponding area under the curve (AUC) values for TMBclaw and baseline models. TMBclaw achieves the highest AUC (0.68), outperforming all pooled and separately trained methods. (B) Area under the precision–recall curve (AUPRC) for each model. TMBclaw yields the highest AUPRC, reflecting improved predictive reliability in imbalanced clinical data. (C) Violin plots of model-predicted scores stratified by responder status. Each plot shows the score distribution for responders (dark) and non-responders (light), with associated P-values from Mann–Whitney U tests. TMBclaw demonstrates the most statistically significant separation (P = 9.12e−05), supporting its superior discriminative ability.

In addition, TMBclaw shows optimal AUPRC performance among all models (Fig. 2B), reflecting its effectiveness in capturing positive cases. Furthermore, the sample-scores predicted by TMBclaw differ significantly between responders and non-responders, with a P value of <.02 (Fig. 2C).

Across both External_NSCLC_Cohorts (Fig. 3A) and External_Mel_Cohorts (Fig. 3B), TMBclaw demonstrates 4%–6% and 2%–12% higher AUCs compared to conventional models, respectively. TMBclaw also achieves consistently superior AUPRCs and clear stratification between responders and non-responders, evidenced by significantly distinct sample-score distribution (P < .02), confirming its robust predictive performance.

Figure 3.

Compares the predictive performance of TMBclaw and six baseline models across two additional cohorts.

Predictive performances of different models and training strategies across External_NSCLC_Cohorts (A) and External_Mel_Cohorts (B).

TMBclaw refines clinical risk stratification and survival prediction

In addition to objective response, PFS is a critical measure, as it reflects both the time to disease progression and the duration of treatment effectiveness, offering a comprehensive evaluation of therapeutic benefit. Therefore, we studied the classification ability of TMBclaw for patients’ progression-free survival outcomes following ICIs.

We first confirmed the prognostic value of TMBclaw’s prediction score (sample-score) through Cox regression analyses adjusting for TMB and cohort effects. We performed the following comparative analysis: (i) sample-score − model: Cox regression incorporating only known clinical covariates (TMB + cohort); (ii) sample-score + model: Cox regression incorporating both known covariates and TMBclaw’s prediction score.

In the Experimental_Cohorts, inclusion of sample-score substantially improved model performance (Table 4). Specifically, the sample-score consistently exhibited negative coefficients (βsample-score = −1.01 ± 0.18, min–max: −1.21 to −0.84), indicating that higher values predict better survival, consistent with its design as a response probability metric. Notably, the absolute coefficient magnitude of sample-score exceeded that of TMB (e.g. Fold 1 |βsample-score| = .99 versus |βTMB| = .37), highlighting its stronger prognostic relevance. Moreover, sample-score achieved statistical significance (P < .05) across all three folds, whereas other covariates showed more variable patterns. Importantly, incorporating sample-score yielded consistent improvements in predictive discrimination, with average C-index values increased.

Table 4.

Cox coefficients of models with and without sample-score in the experimental cohorts

Fold Model C Index Covariate Coef exp(coef) P
1 Sample_score− 0.640729 Cohort 0.215803 1.240858 .006906
TMB −0.41015 0.663551 .005483
Sample_score+ 0.65545 Cohort 0.133078 1.142339 .112351
TMB −0.37308 0.688608 .011806
sample_score −0.98875 0.372043 .005794
2 Sample_score− 0.597603 cohort 0.283083 1.327215 .000317
TMB −0.45393 0.635126 .001975
Sample_score+ 0.621575 cohort 0.199464 1.220749 .021546
TMB −0.43055 0.650154 .003413
sample_score −0.84189 0.430895 .034286
3 Sample_score− 0.61431 cohort 0.321125 1.378678 2.79e−05
TMB −0.33774 0.71338 .017479
Sample_score+ 0.606133 cohort 0.256033 1.291795 .001337
TMB −0.28467 0.75226 .051169
sample_score −1.20903 0.298486 .001562

Values in bold represent the optimal performance.

Similar findings were observed in the External_NSCLC_Cohorts and External_Mel_Cohorts (Supplementary Table S4), further confirming the robust prognostic value of TMBclaw. Based on these results, we next stratified patients by sample-score and performed Kaplan–Meier survival analyses to directly assess differences in survival outcomes between high- and low-sample-score groups.

The Kaplan–Meier survival curves in Fig. 4A and Supplementary Fig. S4 reveal that in the Experimental_Cohorts, patients with low sample-scores predicted by TMBclaw exhibit significantly poorer PFS compared to those with high sample-scores, with quantitative metrics showing HR = 2.4, P = .0000. In contrast, other models perform with limited effectiveness in risk stratification (Fig. 4B and Fig. 4C).

Figure 4.

Figure 4 presents survival analyses for four prediction models. Panels A–C show Kaplan-Meier curves for TMBclaw, Pooled Analysis, Separate Analysis, and Separate SVM, respectively. Patients are divided into Score-High (≥ 0.5) and Score-Low (< 0.5) groups. The Score-High group shows significantly better PFS in all models, with TMBclaw achieving the highest hazard ratio separation. Panel D-F compare hazard ratios across four cohorts and the whole cohort for each model, showing that TMBclaw provides more consistent and distinguishable survival stratification across cohorts.

Survival analysis of different models and training strategies across the Experimental_Cohorts. (A–D) Kaplan–Meier survival curves of TMBclaw. (A), pooled analysis (B) and separate analysis (C). Patients are stratified into score-high (Score-H: sample-score ≥ 0.5) and score-low (Score-L: sample-score < 0.5) groups. P-values are calculated using the log-rank test. (D-F) Forest plots show hazard ratios (HRs) for progression-free survival (PFS) stratified by cohort and overall population. TMBclaw exhibits more consistent and statistically robust cohort performance than other models, reinforcing its utility for clinically interpretable patient stratification.

The forest plot in Fig. 4D4F further illustrates that TMBclaw consistently stratifies patients across all cohorts, with worse PFS observed in the predicted Score-H groups (HR < 1), indicating strong and stable stratified performance.

This superior performance is also demonstrated by survival analysis in the External_NSCLC_Cohorts (Fig. 5A, Supplementary Fig. S5) and External_Mel_Cohorts (Fig. 5B, Supplementary Fig. S6) cohorts, with significant patient stratification and consistent risk assessment across cohorts.

Figure 5.

Compares survival outcomes of different models and training strategies in the External_NSCLC_Cohorts and External_Mel_Cohorts.

Survival analysis of different models and training strategies across the External_NSCLC_Cohorts (A) and External_Mel_Cohorts (B). Rows without hazard ratio (HR) plots correspond to studies with only one stratified group, where HR estimation was not feasible.

To further validate TMBclaw’s risk stratification performance, we conducted time-dependent ROC analyses (Supplementary Figs. S7S9). TMBclaw maintained robust predictive performance for both short-term and long-term survival across all cohorts, confirming its temporal stability and clinical utility.

Ablation analysis of TMBclaw: multi-clonal structure integration and regularized design

To assess whether incorporating all detected clones and their heterogeneity improves prognostic performance, we compared TMBclaw against alternative predictive features—including total TMB, clonal TMB (cTMB, mutations present in the master clone), and subclonal TMB (mutations exclusive to minor subclones)—in ablation experiments. Performance was evaluated using DOP and AUC, as summarized in Table 5.

Table 5.

Quantitative performance metrics of different predictive features

Predictive features Experimental_Cohorts External_NSCLC_Cohorts External_Mel_Cohorts
DOP AUC DOP AUC DOP AUC
TMBclaw 0.7347 67.87% 0.7870 65.56% 0.6506 72.60%
Total TMB 0.8772 50.46% 0.8355 61.39% 0.6624 70.66%
cTMB 0.8880 57.28% 0.9637 53.70% 0.6522 70.97%
Subclonal TMB 0.8312 63.84% 0.8384 62.46% 0.6638 69.62%

Values in bold represent the optimal performance.

In the Experimental_Cohorts, TMBclaw achieves both the lowest DOP of 0.7347 and highest AUC of 67.87%, outperforming total TMB (DOP = 0.8772, AUC = 50.46%), cTMB (DOP = 0.8880, AUC = 57.28%), and subclonal TMB (DOP = 0.8312, AUC = 63.84%). The performance limitations of total TMB may stem from its failure to account for clonal architecture by indiscriminately aggregating all mutations, while cTMB completely omits subclonal information, which limits the model’s ability to capture tumor heterogeneity and thereby weakens its prognostic capability.

Performance comparisons in the External_NSCLC_Cohorts and External_Mel_Cohorts further confirm that the TMBclaw outperforms alternative predictive features, highlighting its robustness and generalization capability across heterogeneous cohorts.

To systematically evaluate the contributions of each module in TMBclaw and analyze the effectiveness of its innovative design, we designed the following ablation experiments: (i) Base-MIL: a MIL framework with attention pooling and task-specific training, excluding all regularization terms. This setting is designed to assess the model’s sensitivity to cohort heterogeneity without structural constraints; (ii) Reg-MLP: a task-specific MLP that replaces the clonal-structured MIL model while incorporating the graph Laplacian regularization term. This setting is intended to examine how clonal selection bias affects prediction robustness.

Table 6 presents the performance differences between TMBclaw and its two core variants (Base-MIL and Reg-MLP), highlighting the marginal contributions of each module to the overall model design. In the Experimental_Cohorts, TMBclaw achieves the lowest DOP and highest AUC in treatment response prediction, demonstrating the benefit of combining clonal-structured modeling with graph-based regularization.

Table 6.

Quantitative performance metrics of different predictive models

Models Experimental_Cohorts External_NSCLC_Cohorts External_Mel_Cohorts
DOP AUC DOP AUC DOP AUC
TMBclaw 0.7347 67.87% 0.7870 65.56% 0.6506 72.60%
Base-MIL 0.8963 56.58% 0.8430 62.50% 0.6657 71.42%
Reg-MLP 0.9533 52.43% 0.8462 61.37% 0.6700 71.09%

Values in bold represent the optimal performance.

Ablation experiments conducted on the External_NSCLC_Cohorts and External_Mel_Cohorts further corroborate this advantage, with TMBclaw consistently delivering the lowest DOP and highest AUC.

TMBclaw reveals interpretable clonal contributions via attention visualization

Our analysis of TMBclaw’s interpretation of clonal structure focused on the learned importance of individual clones. Specifically, clones were indexed in descending order of cellular prevalence, reflecting their evolutionary hierarchy. The corresponding attention weights are visualized in Fig. 6, with detailed patient-level clonal weights summarized in Supplementary Table S2.

Figure 6.

Clone weight visualization: Panel A shows violin plots of weight distributions across cohorts (mean values labeled). Panel B shows the average clone weights for each cohort as a heat map, with darker colors indicating greater significance.

Visualization of clone weights. (A) Distribution of clone weights across different cohorts, shown as violin plots annotated with mean values. (B) Average weight of each clone within individual cohorts.

In the Experimental_Cohorts (Fig. 6A), the master clone (clone 1) stands out with a significantly greater weight, achieving a mean ~0.506 and the highest median, highlighting its crucial role in prognostic prediction. By contrast, subclones contribute substantially less on average. Similar trends are observed in the External_NSCLC_Cohorts and External_Mel_Cohorts, where the master clone consistently exhibits the highest median weight, with mean values of 0.464 and 0.446, respectively, further indicating its predictive importance.

To gain deeper insight into clonal contributions across patient cohorts, we visualized the average weight of each clone within individual cohorts in a heatmap (Fig. 6B). In most melanoma and NPC cohorts, the master clone consistently dominates with the greatest average weight. It is worth noting that in certain External_NSCLC_Cohorts (e.g. External_NSCLC 1 and External_NSCLC 2), some subclones exhibit non-negligible weights.

This observation aligns with previous studies implicating the role of subclones in prognostic outcomes, particularly in NSCLC. Biologically, the master clone (clone 1) is often the dominant clone in terms of cellular prevalence, reflecting its critical role in tumor progression and therapy response. The higher attention weight assigned to clone 1 suggests that it is biologically more influential in predicting outcomes, as it likely drives the primary tumor characteristics and response to treatment. Subclones, while contributing less on average, can still play a role, particularly in more heterogeneous cancers like melanoma, where subclonal evolution is often linked to therapeutic resistance and disease recurrence.

These findings highlight that accounting for all mutant clones and their differential contributions could improve prognostic modeling, a capability supported by TMBclaw.

Discussion

In this study, we developed an interpretable prognostic prediction model based on clonal mutations applicable to heterogeneous patient cohorts. TMBclaw meticulously integrates tumor clones within an MIL framework, which holds significant potential to elucidate complex tumor heterogeneity and provides interpretable insights for clinical decision-making. By introducing the concept of MTL, our model successfully unites multiple cohort-specific models within a unified computational framework.

TMBclaw mines the relationship between cohorts via a graph Laplacian regularization term and achieves information sharing among cohorts, solving the insufficiency of data in the Separate Analysis and the neglect of cohort heterogeneity in the pooled analysis. In clinical patient cohorts covering NSCLC, melanoma, and NPC, TMBclaw outperforms classical machine learning models under different analytic strategies by delivering more precise prognostic predictions and enhanced risk stratification. Benefiting from the self-attention mechanism, our model integrates all clones and learns the interrelationships between clones to quantify their contribution. This strategy enables the screening of key clones that mainly influence the prognosis of immunotherapy, which is crucial for tailoring targeted therapies and optimizing treatment strategies.

TMBclaw could benefit from further exploration of deep learning networks to enhance prediction efficacy. Additionally, incorporating a broader array of genetic mutation signatures may provide a more comprehensive understanding of tumor heterogeneity, ultimately improving the model’s ability to support targeted therapeutic strategies.

Moreover, we recognize that the tumor immune microenvironment (TIME) is a pivotal determinant of ICI efficacy [7, 14, 31]. While our current study was limited to genomic sequencing data without matched transcriptomic profiles, which prevented direct quantification of immune cell infiltration, future integration of RNA sequencing or multi-omics datasets will allow us to explicitly link predicted risk groups with TIME characteristics. Such extensions would not only strengthen the biological interpretability of our framework but also expand its translational potential in guiding immunotherapy.

Conclusion

TMBclaw is an advanced prognostic prediction framework. It provides more precise quantification of tumor immunogenicity, with interpretable outputs offering transparent and understandable treatment decision-making support for clinicians, which is a key consideration in personalized treatment strategies. Moreover, TMBclaw effectively alleviates training difficulties caused by data scarcity, significantly improving prediction accuracy and patient stratification, further enhancing its practical value in clinical decision-making and advancing the application of precision oncology.

Key Points

  • We propose TMBclaw, a hierarchical graph-regularized multi-task learning framework that explicitly models tumor clonal architecture and cohort-level heterogeneity, addressing limitations of conventional TMB-based biomarkers.

  • Through adaptive graph Laplacian regularization and attention-based multiple-instance learning, TMBclaw enables effective cross-cohort knowledge transfer and interpretable identification of immunotherapy-relevant clones.

  • Extensive validation on 1671 patients across institutional and public cohorts demonstrates that TMBclaw significantly improves prognostic accuracy and risk stratification in immune checkpoint inhibitor response prediction.

Supplementary Material

Supplymentary_Material_bbaf578
Supplementary_Table_S1_bbaf578
Supplementary_Table_S2_bbaf578
Supplementary_Table_S3_bbaf578
Supplementary_Table_S4_bbaf578

Contributor Information

Yixuan Wang, Department of Biomedical Engineering, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, 29 Jiangjun Avenue, Jiangning, Nanjing 211106, Jiangsu, China.

Tianyi Zhu, School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, 28 Xianning West Road, Beilin, Xi’an 710049, Shaanxi, China.

Xiaofeng Song, Department of Biomedical Engineering, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, 29 Jiangjun Avenue, Jiangning, Nanjing 211106, Jiangsu, China.

Peng Chen, College of Computer Science and Technology, Zhejiang University, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, China.

Xiaoyan Zhu, School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, 28 Xianning West Road, Beilin, Xi’an 710049, Shaanxi, China.

Zhili Chang, School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, 28 Xianning West Road, Beilin, Xi’an 710049, Shaanxi, China; Geneseeq Research Institute, Nanjing Geneseeq Technology Inc., 128 Huakang Road, Pukou, Nanjing 210032, Jiangsu, China.

Xiaonan Wang, School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, 28 Xianning West Road, Beilin, Xi’an 710049, Shaanxi, China; Geneseeq Research Institute, Nanjing Geneseeq Technology Inc., 128 Huakang Road, Pukou, Nanjing 210032, Jiangsu, China.

Xin Lai, School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, 28 Xianning West Road, Beilin, Xi’an 710049, Shaanxi, China.

Jiayin Wang, School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, 28 Xianning West Road, Beilin, Xi’an 710049, Shaanxi, China.

Author contributions

W.Y., L.X., and W.J. conceived this research. W.Y., Z.T., and C.P. designed the model. Z.T. implemented the program and performed the experiments. W.Y., S.X., W.X., and C.Z. collected and analyzed the data. W.Y., Z.T., L.X., and W.J. wrote the manuscript. S.X., C.P., Z.X., W.X., and C.Z. revised the manuscript. All authors have read and agreed to the latest version of the manuscript.

Conflict of interest: W.X. and C.Z. are employed by Nanjing Geneseeq Technology Inc. The remaining authors declare that the research was conducted without commercial or financial relationships that could be construed as a potential conflict of interest.

Funding

This work was supported by the National Natural Science Foundation of China [grant nos. 62302215, 72293581, 72293580, and 72274152].

References

  • 1. Anagnostou  V, Bardelli  A, Chan  TA. et al.  The status of tumor mutational burden and immunotherapy. Nat Cancer  2022;3:652–6. 10.1038/s43018-022-00382-1 [DOI] [PubMed] [Google Scholar]
  • 2. Boll  LM, Perera-Bel  J, Rodriguez-Vida  A. et al.  The impact of mutational clonality in predicting the response to immune checkpoint inhibitors in advanced urothelial cancer. Sci Rep  2023;13:15287. 10.1038/s41598-023-42495-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Budczies  J, Kazdal  D, Menzel  M. et al.  Tumour mutational burden: clinical utility, challenges and emerging improvements. Nat Rev Clin Oncol  2024;21:725–42. 10.1038/s41571-024-00932-9 [DOI] [PubMed] [Google Scholar]
  • 4. Carlino  MS, Larkin  J, Long  GV. Immune checkpoint inhibitors in melanoma. Lancet  2021;398:1002–14. 10.1016/S0140-6736(21)01206-X [DOI] [PubMed] [Google Scholar]
  • 5. Chalmers  ZR, Connelly  CF, Fabrizio  D. et al.  Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med  2017;9:1–14. 10.1186/s13073-017-0424-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Chan  TA, Yarchoan  M, Jaffee  E. et al.  Development of tumor mutation burden as an immunotherapy biomarker: utility for the oncology clinic. Ann Oncol  2019;30:44–56. 10.1093/annonc/mdy495 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Chen  B, Khodadoust  MS, Liu  CL. et al. Profiling Tumor Infiltrating Immune Cells with CIBERSORT.In: von Stechow, L. (eds) Cancer Systems Biology. Methods in Molecular Biology, vol 1711. Humana Press, New York, NY. 10.1007/978-1-4939-7493-1_12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Dall'Olio  FG, Marabelle  A, Caramella  C. et al.  Tumour burden and efficacy of immune-checkpoint inhibitors. Nat Rev Clin Oncol  2022;19:75–90. 10.1038/s41571-021-00564-3 [DOI] [PubMed] [Google Scholar]
  • 9. Dancey  JE, Dodd  LE, Ford  R. et al.  Recommendations for the assessment of progression in randomised cancer treatment trials. Eur J Cancer  2009;45:281–9. 10.1016/j.ejca.2008.10.042 [DOI] [PubMed] [Google Scholar]
  • 10. Eisenhauer  EA, Therasse  P, Bogaerts  J. et al.  New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer  2009;45:228–47. 10.1016/j.ejca.2008.10.026 [DOI] [PubMed] [Google Scholar]
  • 11. Elmoznino  E, Bonner  MF. High-performing neural network models of visual cortex benefit from high latent dimensionality. PLoS Comput Biol  2024;20:e1011792. 10.1371/journal.pcbi.1011792 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Fang  W, Ma  Y, Yin  JC. et al.  Comprehensive genomic profiling identifies novel genetic predictors of response to anti–PD-(L) 1 therapies in non–small cell lung cancer. Clin Cancer Res  2019;25:5015–26. 10.1158/1078-0432.CCR-19-0585 [DOI] [PubMed] [Google Scholar]
  • 13. Fang  W, Yang  Y, Ma  Y. et al.  Camrelizumab (shr-1210) alone or in combination with gemcitabine plus cisplatin for nasopharyngeal carcinoma: results from two single-arm, phase 1 trials. Lancet Oncol  2018;19:1338–50. 10.1016/S1470-2045(18)30495-9 [DOI] [PubMed] [Google Scholar]
  • 14. Fernández  EA, Mahmoud  YD, Veigas  F. et al.  Unveiling the immune infiltrate modulation in cancer and response to immunotherapy by MIXTURE-an enhanced deconvolution method. Brief Bioinform  2021;22:bbaa317. 10.1093/bib/bbaa317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Fernández  EA, Valtuille  R, Presedo  JM. et al.  Comparison of different methods for hemodialysis evaluation by means of ROC curves: from artificial intelligence to current methods. Clin Nephrol  2005;64:205–13. 10.5414/cnp64205 [DOI] [PubMed] [Google Scholar]
  • 16. Frankell  AM, Dietzen  M, Al Bakir  M. et al.  The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature  2023;616:525–33. 10.1038/s41586-023-05783-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Hellmann  MD, Nathanson  T, Rizvi  H. et al.  Genomic features of response to combination immunotherapy in patients with advanced non-small-cell lung cancer. Cancer Cell  2018;33:843–852.e4. 10.1016/j.ccell.2018.03.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Hugo  W, Zaretsky  JM, Sun  L. et al.  Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell  2016;165:35–44. 10.1016/j.cell.2016.02.065 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Ilse  M, Tomczak  JM, Welling  M. Attention-based deep multiple instance learning. ICML  2018;80:2127–36. 10.48550/arXiv.1802.04712 [DOI] [Google Scholar]
  • 20. Jardim  DL, Goodman  A, de  Melo  GD. et al.  The challenges of tumor mutational burden as an immunotherapy biomarker. Cancer Cell  2021;39:154–73. 10.1016/j.ccell.2020.10.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Johnson  PC, Gainor  JF, Sullivan  RJ. et al.  Immune checkpoint inhibitors - the need for innovation. N Engl J Med  2023;388:1529–32. 10.1056/NEJMsb2300232 [DOI] [PubMed] [Google Scholar]
  • 22. Koh  G, Degasperi  A, Zou  X. et al.  Mutational signatures: emerging concepts, caveats and clinical applications. Nat Rev Cancer  2021;21:619–37. 10.1038/s41568-021-00377-7 [DOI] [PubMed] [Google Scholar]
  • 23. Lemery  S, Keegan  P, Pazdur  R. First FDA approval agnostic of cancer site - when a biomarker defines the indication. N Engl J Med  2017;377:1409–12. 10.1056/NEJMp1709968 [DOI] [PubMed] [Google Scholar]
  • 24. Litchfield  K, Reading  JL, Puttick  C. et al.  Meta-analysis of tumor- and T cell-intrinsic mechanisms of sensitization to checkpoint inhibition. Cell  2021;184:596–614.e14. 10.1016/j.cell.2021.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Liu  D, Schilling  B, Liu  D. et al.  Integrative molecular and clinical modeling of clinical outcomes to PD1 blockade in patients with metastatic melanoma. Nat Med  2019;25:1916–27. 10.1038/s41591-019-0654-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Łuksza  M, Sethna  ZM, Rojas  LA. et al.  Neoantigen quality predicts immunoediting in survivors of pancreatic cancer. Nature  2022;606:389–95. 10.1038/s41586-022-04735-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Ma  Y, Fang  W, Zhang  Y. et al.  A phase I/II open-label study of nivolumab in previously treated advanced or recurrent nasopharyngeal carcinoma and other solid tumors. Oncol  2019;24:891–e431. 10.1634/theoncologist.2019-0284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. McGrail  DJ, Pilié  PG, Rashid  NU. et al.  High tumor mutation burden fails to predict immune checkpoint blockade response across all cancer types. Ann Oncol  2021;32:661–72https://pubmed.ncbi.nlm.nih.gov/33736924/. 10.1016/j.annonc.2021.02.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. McGranahan  N, Furness  AJ, Rosenthal  R. et al.  Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science  2016;351:1463–9. 10.1126/science.aaf1490 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Nair  V, Hinton  GE. Rectified linear units improve restricted Boltzmann machines. ICML  2010;27:807–14. 10.5555/3104322.3104425 [DOI] [Google Scholar]
  • 31. Nava  A, Alves da Quinta  D, Prato  L. et al.  Novel evaluation approach for molecular signature-based deconvolution methods. J Biomed Inform  2023;142:104387. 10.1016/j.jbi.2023.104387 [DOI] [PubMed] [Google Scholar]
  • 32. Nibeyro  G, Baronetto  V, Folco  JI. et al.  Unraveling tumor specific neoantigen immunogenicity prediction: a comprehensive analysis. Front Immunol  2023;14:1094236. 10.3389/fimmu.2023.1094236 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Phan  TH, Yamamoto  K. Resolving class imbalance in object detection with weighted cross entropy losses. ArXiv 2020;abs/2006.01413. https://arxiv.org/abs/2006.01413, 10.21315/mjms-11-2024-909, 32, 144, 155. [DOI] [Google Scholar]
  • 34. Ravi  A, Hellmann  MD, Arniella  MB. et al.  Genomic and transcriptomic analysis of checkpoint blockade response in advanced non-small cell lung cancer. Nat Genet  2023;55:807–19. 10.1038/s41588-023-01355-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Riaz  N, Havel  JJ, Makarov  V. et al.  Tumor and microenvironment evolution during immunotherapy with nivolumab. Cell  2017;171:934–949.e16. 10.1016/j.cell.2017.09.028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Rizvi  H, Sanchez-Vega  F, La  K. et al.  Molecular determinants of response to anti–programmed cell death (PD)-1 and anti-programmed death-ligand 1 (PD-L1) blockade in patients with non-small-cell lung cancer profiled with targeted next-generation sequencing. J Clin Oncol  2018;36:633–41. 10.1200/JCO.2017.75.3384 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Rizvi  NA, Hellmann  MD, Snyder  A. et al.  Mutational landscape determines sensitivity to PD-1 blockade in non–small cell lung cancer. Science  2015;348:124–8. 10.1126/science.aaa1348 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Roh  W, Chen  P-L, Reuben  A. et al.  Integrated molecular analysis of tumor biopsies on sequential CTLA-4 and PD-1 blockade reveals markers of response and resistance. Sci Transl Med  2017;9:eaah3560. 10.1126/scitranslmed.aah3560 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Roth  A, Khattra  J, Yap  D. et al.  PyClone: statistical inference of clonal population structure in cancer. Nat Methods  2014;11:396–8. 10.1038/nmeth.2883 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Samstein  RM, Lee  C-H, Shoushtari  AN. et al.  Tumor mutational load predicts survival after immunotherapy across multiple cancer types. Nat Genet  2019;51:202–6. 10.1038/s41588-018-0312-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Shao  Z, Bian  H, Chen  Y. et al.  TransMIL: transformer based correlated multiple instance learning for whole slide image classification. NeurIPS  2021;34:2136–46. 10.48550/arXiv.2106.00908 [DOI] [Google Scholar]
  • 42. Subbiah  V, Solit  DB, Chan  TA. et al.  The FDA approval of pembrolizumab for adult and pediatric patients with tumor mutational burden (TMB) ≥10: a decision centered on empowering patients and their physicians. Ann Oncol  2020;31:1115–8. 10.1016/j.annonc.2020.07.002 [DOI] [PubMed] [Google Scholar]
  • 43. Tang  S, Qin  C, Hu  H. et al.  Immune checkpoint inhibitors in non-small cell lung cancer: progress, challenges, and prospects. Cells  2022;11:320. 10.3390/cells11030320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Thummalapalli  R, Ricciuti  B, Bandlamudi  C. et al.  Clinical and molecular features of long-term response to immune checkpoint inhibitors in patients with advanced non-small cell lung cancer. Clin Cancer Res  2023;29:4408–18. 10.1158/1078-0432.CCR-23-1207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Thung  KH, Wee  CY. A brief review on multi-task learning. Multimed Tools Appl  2018;77:29705–25. 10.1007/s11042-018-6463-x [DOI] [Google Scholar]
  • 46. Valero  C, Lee  M, Hoen  D. et al.  The association between tumor mutational burden and prognosis is dependent on treatment context. Nat Genet  2021;53:11–5. 10.1038/s41588-020-00752-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Van Allen  EM, Miao  D, Schilling  B. et al.  Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science  2015;350:207–11. 10.1126/science.aad0095 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Vanguri  RS, Luo  J, Aukerman  AT. et al.  Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat Cancer  2022;3:1151–64. 10.1038/s43018-022-00416-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Vaswani  A, Shardlow  L, Parmar  N. et al.  Attention is all you need. NeurIPS  2017;3:5998–6008. https://arxiv.org/abs/1706.03762 [Google Scholar]
  • 50. Vega  DM, Yee  LM, McShane  LM. et al.  Aligning tumor mutational burden (TMB) quantification across diagnostic platforms: phase II of the Friends of Cancer Research TMB Harmonization Project. Ann Oncol  2021;32:1626–36. 10.1016/j.annonc.2021.09.016 [DOI] [PubMed] [Google Scholar]
  • 51. Wang  X, Lamberti  G, Di Federico  A. et al.  Tumor mutational burden for the prediction of PD-(L)1 blockade efficacy in cancer: challenges and opportunities. Ann Oncol  2024;35:508–22. 10.1016/j.annonc.2024.03.007 [DOI] [PubMed] [Google Scholar]
  • 52. Westcott  PMK, Muyas  F, Hauck  H. et al.  Mismatch repair deficiency is not sufficient to elicit tumor immunogenicity. Nat Genet  2023;55:1686–95. 10.1038/s41588-023-01499-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Yoo  SK, Fitzgerald  CW, Cho  BA. et al.  Prediction of checkpoint inhibitor immunotherapy efficacy for cancer using routine blood tests and clinical data. Nat Med  2025;31:869–80. 10.1038/s41591-024-03398-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Zantvoort  K, Nacke  B, Görlich  D. et al.  Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions. NPJ Digit Med  2024;7:361. 10.1038/s41746-024-01360-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Zhu  X, Suk  HI, Lee  SW. et al.  Subspace regularized sparse multitask learning for multiclass neurodegenerative disease identification. IEEE T BIO-MED ENG  2016;63:607–18https://pubmed.ncbi.nlm.nih.gov/26276982/. 10.1109/TBME.2015.2466616 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplymentary_Material_bbaf578
Supplementary_Table_S1_bbaf578
Supplementary_Table_S2_bbaf578
Supplementary_Table_S3_bbaf578
Supplementary_Table_S4_bbaf578

Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES