Abstract
Immune checkpoint inhibitors (ICIs) have emerged as a cornerstone of modern oncology, necessitating the development of robust biomarkers for optimizing patient stratification and treatment selection. While tumor mutation burden (TMB) has demonstrated prognostic value, conventional quantification methods based on mutation counts fail to reflect immunogenic neoantigen presentation due to intratumoral clonal heterogeneity. Recent efforts have focused on mutation subsets derived from tumor clonality, yet the complex interactions among clones remain a significant obstacle to accurate prognosis. This challenge is further exacerbated by the inherent constraints of limited cohort sizes in clinical studies, which severely compromise model generalizability across heterogeneous cohorts. Therefore, we propose TMBclaw (Tumor Mutation Burden–based Clonal attention with Laplacian Adaptive Weighting), a graph-regularized multi-task learning framework for immunotherapy response prediction. TMBclaw establishes unified integration of group-structured cohorts while enabling cross-cohort knowledge transfer and clonal relationship exploration. For clinical validation, we utilized four cohorts of 238 patients with non–small-cell lung cancer (NSCLC), melanoma, or nasopharyngeal carcinoma treated with ICIs, along with external multicenter validation cohorts (N = 1433) of melanoma and NSCLC patients from public datasets. Comparative analyses demonstrate that TMBclaw significantly outperforms conventional methods in prognostic accuracy and risk stratification. Through systematic quantification of clonal dynamics and discriminative identification of driver clones, TMBclaw shows potential to improve understanding of tumor heterogeneity and provides interpretable insights into the immunotherapy process.
Keywords: tumor mutation burden, tumor clonal heterogeneity, clinical decision-making, multi-task learning, graph Laplacian regularization
Introduction
Immune checkpoint inhibitors (ICIs) have revolutionized advanced cancer treatment, achieving durable responses in 20%–40% of patients [4, 8]. However, a substantial proportion derives limited clinical benefits despite the considerable healthcare expenditures [21, 43]. Accurate prediction of ICI efficacy remains an unmet clinical need that could enhance precision oncology by enabling clinically actionable stratification of patients based on their likelihood of therapeutic benefit [51, 53]. Tumor mutation burden (TMB), quantified as the total number of non-synonymous mutations per megabase [1, 5, 46], has gained regulatory recognition through the National Comprehensive Cancer Network guidelines and the U.S. Food and Drug Administration approvals as a predictive biomarker [23, 42]. However, the clinical utility of TMB quantification remains controversial.
Emerging insights from immuno-oncology reveal that not all mutations are created equal in their immunogenic potential [22, 26]. Mutations from various clones generate distinct neoantigens and trigger varying ICI responses [16, 24, 29]. This biological nuance exposes the critical limitation of conventional TMB quantification, which indiscriminately aggregates mutations regardless of their clonal status, potentially obscuring clinically relevant signals [3, 6, 28]. Clonal TMB (cTMB), defined by truncal mutations present in all tumor cells, has been shown to correlate with ICI response in multiple studies [2, 34, 52], yet current evidence remains inadequate to establish its superiority over conventional TMB conclusively. While subclonal TMB shows promise in predicting long-term response [44], its spatial limitation prevents it from fully representing the entire tumor. The complexity of neoantigenic heterogeneity due to clonal architecture requires further mechanistic investigation to enhance predictive accuracy.
In addition, clinical implementation of immunotherapy faces multifaceted challenges as patient cohorts are frequently constrained by substantial costs and adverse effects, resulting in limited sample sizes. Such data scarcity predisposes neural network models to overfitting through memorization of training patterns rather than learning generalizable features [54]. On top of that, technical variability, such as variations in panel size, Next-Generation Sequencing (NGS) platforms, and tumor types across cohorts, induces significant inconsistencies in TMB quantification [50]. This multidimensional heterogeneity compromises clinical decision-making for ICIs [20] and makes naïve pooling of datasets ineffective. Balancing data augmentation with accommodating cohort heterogeneity has posed a significant challenge for developing reliable predictive models.
To address the issues of predicting immunotherapy responses in heterogeneous cohorts, we propose TMBclaw (Tumor Mutation Burden–based Clonal attention with Laplacian Adaptive Weighting), a hierarchical, graph-regularized multi-task learning (MTL) framework. TMBclaw captures cohort heterogeneity via an undirected graph in which nodes represent patient cohorts and weighted edges encode their similarities. We introduce an edge-sensitive graph Laplacian regularization that optimizes information flow across cohorts, enabling effective knowledge transfer while preserving distinct biological characteristics through adaptive penalty terms [45, 55]. At the level of tumor clonal heterogeneity, TMBclaw integrates clonal genomic features using an attention-based multiple-instance learning (MIL) framework [19, 41], adaptively weighting clone contributions. This dual-layer design captures both broad population patterns and critical clone-specific features. Clinical validation was performed using four institutional cohorts (N = 238) from the Sun Yat-sen University Cancer Center and the Second Affiliated Hospital of Xi’an Jiaotong University, involving non–small-cell lung cancer (NSCLC), melanoma, and nasopharyngeal carcinoma (NPC) [12, 13, 27]. We also collected 1433 samples from public studies to form group-structured data [17, 18, 25, 35–38, 40, 47, 48]. The results demonstrate the superior and robust performance of TMBclaw in ICI response prediction and risk stratification, with interpretability in its analyses. The source code can be downloaded from https://github.com/AadSama0404/TMBclaw.
Methods
TMBclaw is a neural network model for prognostic prediction by jointly modeling tumor clonal and cohort heterogeneity within a hierarchical framework. The model simultaneously processes clonal mutation profiles and inter-patient variability, capturing their nonlinear interactions to generate more accurate and clinically generalizable predictions. TMBclaw provides a patient’s likelihood of responding to ICIs, outputting a binary classification (non-responder or responder) and a sample-specific response probability. The complete architectural implementation is visualized in Fig. 1. Panel A represents the Cohort Heterogeneity Layer of TMBclaw, where all cohorts are integrated, with each cohort treated as a distinct prediction task. MTL is employed to enable information sharing across cohorts while preserving their unique characteristics. Panel B depicts the Tumor Clonal Heterogeneity Layer of TMBclaw, which is task specific and focuses on a single cohort. This layer applies MIL to process individual patient data and adapt to clonal genomic features. The resulting clone-specific attention weights are subsequently fed into the final classifier to predict immunotherapy response. Together, Panels A and B constitute the Training Pipeline of TMBclaw. Panel C illustrates the Recall Pipeline, where new samples are predicted by their task-specific models if available or by the most similar task model if unseen.
Figure 1.
TMBclaw architecture overview. (A) Cohort heterogeneity layer: Patient cohorts are modeled as nodes in an undirected graph, with edge weights reflecting inter-cohort similarity. Multi-task learning with graph Laplacian regularization enables robust training through structured information sharing. (B) Tumor clonal heterogeneity layer: For a patient in a specific cohort, clonal mutations are integrated via multiple instance learning. Feature extraction is performed by two fully connected layers, followed by an attention mechanism that quantifies clone-specific contributions. The final classifier outputs a binary response prediction (0 = non-responder, 1 = responder). (C) Recall pipeline: For a new sample, cohort membership is first determined. If the sample belongs to a known cohort, prediction is made by its task-specific model. If from an unseen cohort, similarity to existing cohorts is computed and the most similar cohort model is used to generate the prediction.
Patient cohort characteristics and data preprocessing
In this study, we compiled 238 samples from various patient cohorts. These included 75 patients diagnosed with NSCLC and 26 patients with melanoma from the Second Affiliated Hospital of Xi’an Jiaotong University, sequenced at the Geneplus-Beijing Institute. Additionally, 73 NSCLC patients and 64 patients with pharyngeal or metastatic nasopharyngeal carcinoma (NPC) were sourced from the Sun Yat-sen University Cancer Center. All patients were treated with anti-PD-(L)1, anti-CTLA-4 therapy, or monotherapy between December 2015 and January 2018. Eligible patients met the following criteria: (i) age between 18 and 70 years; (ii) an Eastern Cooperative Oncology Group performance status of 0–1; (iii) histologically or cytologically confirmed diagnosis of NSCLC, melanoma, or NPC with either metastatic disease or locoregional recurrence; (iv) progression after at least one prior systemic treatment; (v) measurable disease by radiological assessment. Exclusion criteria included central nervous system metastases, prior malignancy, autoimmune disorders, history of immunotherapy, active tuberculosis infection, pregnancy, or current use of immunosuppressive agents. These patients were organized as four related yet distinct cohorts based on disease type or treatment regimen, which resulted in Experimental_Cohorts.
In addition, we incorporated external multicenter validation cohorts comprising 1433 patients from previously public retrospective immunotherapy studies. These external cohorts included 713 patients with NSCLC and 720 patients with metastatic melanoma, referred to as External_NSCLC_Cohorts and External_Mel_Cohorts, respectively. All of them had received ICIs. Treatments consisted of anti-PD-(L)1 therapy, anti-CTLA-4 therapy, or combination anti-CTLA-4/anti-PD-(L)1 regimens, with a small subset receiving other agents.
Patient responses to ICIs were categorized according to the Response Evaluation Criteria in Solid Tumors version 1.1 [10], with complete response and partial response encoded as 1 (responders), and stable disease and progressive disease encoded as 0 (non-responders). The survival endpoint was defined as progression-free survival (PFS) with event status, where event includes disease progression, tumor recurrence, or all-cause mortality [9].
The clinical characteristics of these cohorts are summarized in Table 1, while basic demographics and patient-level datasets are comprehensively documented in Supplementary Table S1.
Table 1.
Summary of study cohort information
| Cohort type | Cohort size | Cancer type | Responder, n (%) | Non-responder, n (%) | Cohort source |
|---|---|---|---|---|---|
| Experimental | 75 | NSCLC | 25 (33.3) | 50 (66.7) | The Second Affiliated Hospital of Xi’an Jiaotong University |
| 26 | Melanoma | 9 (34.6) | 17 (65.3) | The Second Affiliated Hospital of Xi’an Jiaotong University | |
| 73 | NSCLC | 13 (17.8) | 60 (82.1) | Fang et al. [12] | |
| 64 | NPC | 8 (12.5) | 56 (87.5) | Fang et al. [13]; Ma et al. [27] | |
| External_NSCLC | 16 | NSCLC | 10 (62.5) | 6 (37.5) | Rizvi et al. [37] |
| 68 | NSCLC | 24 (35.3) | 44 (64.7) | Hellmann et al. [17] | |
| 227 | NSCLC | 69 (30.3) | 158 (69.6) | Rizvi et al. [36] | |
| 156 | NSCLC | 56 (35.8) | 100 (64.1) | Samstein et al. [40] | |
| 246 | NSCLC | 61 (24.7) | 185 (75.2) | Vanguri et al. [48] | |
| External_Mel | 37 | Melanoma | 20 (54.0) | 17 (45.9) | Hugo et al. [18] |
| 140 | Melanoma | 55 (39.2) | 85 (60.7) | Liu et al. [25] | |
| 70 | Melanoma | 15 (21.4) | 55 (78.5) | Riaz et al. [35] | |
| 48 | Melanoma | 9 (18.7) | 39 (81.2) | Roh et al. [38] | |
| 105 | Melanoma | 17 (16.1) | 88 (83.8) | Van Allen et al. [47] | |
| 320 | Melanoma | 195 (60.9) | 125 (39.0) | Samstein et al. [40] |
As shown in Table 1, the proportion of responders (label = 1) varied considerably among cohorts, ranging from 12.5% to 62.5%, with the majority below 50%, reflecting both the presence of class imbalance and the heterogeneity in cohort composition.
For each patient, somatic mutations were first subjected to clonal architecture inference using the PyClone algorithm [39] to delineate clonal mutation profiles from bulk sequencing data. The mutation count, average cellular prevalence, and average variant allele frequency were subsequently derived for each clone. We identified the clone with the highest average cellular prevalence as the master clone, which is associated with the cTMB, and the remaining clones were attributed to subclonal TMB. Subsequently, the mutation counts underwent feature scaling to ensure consistency in range for gradient descent optimization, while accounting for the influence of outliers. Supplementary Table S2 details patient-level clonal weights derived from TMBclaw across different cohorts.
Tumor clonal heterogeneity: MIL framework with attention pooling
For each patient in a specific cohort, TMBclaw adopts an MIL framework with attention-based pooling to model tumor clonal heterogeneity. Specifically, each patient is represented as a “bag” with the ICI response status as the bag-level label. Within each bag, individual tumor clones represent “instances” whose combined features determine the patient’s clinical outcome. This structure allows the model to handle the absence of instance-level annotations by leveraging the attention pooling mechanism to evaluate interactions among tumor clones dynamically, assign interpretable contribution weights, and identify key clones influencing the therapeutic response.
The cohort-specific MIL model is a modular architecture that consists of a feature extraction layer, a self-attention layer, and a bag-level classifier layer. This model allows the model to process the variability in the number of tumor clones (or instances) per patient, while the attention pooling mechanism ensures that the model remains permutation invariant.
Let
represent the original mutation feature of tumor clone, and let
denote the unknown instance-level label for each clone, for
. In the binary supervised classification problem, the bag consists of a set of instances
, with the sequences between instances being independent, and the number of clones
varying across bags. The bag is assigned a binary label
based on the aggregate response of the clones within it, with the assumption that a responder is identified when at least one clone shows a positive immune response:
![]() |
(1) |
For illustrative purposes, we first consider the model formulation under a single cohort:
.
The feature extraction layer comprises two fully connected layers, each followed by a ReLU activation function: the first maps the input feature space
to a higher-dimensional feature space to capture richer feature information, while the second re-reduces the high-dimensional features to remove noise and redundant information [11]. The ReLU activation function can effectively overcome the gradient disappearance problem in deep networks of traditional activation functions [30].
A permutation-invariant pooling function is used to compute the bag-level representation in the MIL framework, ensuring that the representation is independent of the number of instances in the bag. A classifier with a Sigmoid activation function then predicts the positive immune response score
. Given the class imbalance in clinical data, we propose to construct a cost-sensitive negative log-likelihood (NLL) loss function [33] that yields
![]() |
(2) |
where
denotes the weight hyperparameter to emphasize minority class samples during optimization;
denotes the parameter matrix. Finally, the parameters are updated by a gradient descent algorithm.
Self-attention mechanism for inter-clone interaction
Building on the concept from Shao et al. [41], the self-attention mechanism is used to capture correlations between clones, which may provide more informative features and reduce uncertainty in the model [49]. Specifically, the inter-clone interactions are modeled as follows:
![]() |
![]() |
![]() |
![]() |
![]() |
(3) |
where
denotes hidden feature matrix obtained after the two-layer feature extraction module;
,
, and
denote the query matrix, key matrix, and value matrix respectively;
denotes the attention weight matrix which captures the relative importance of clones. To facilitate multiplication with the value matrix
and quantify the weight of each clone more clearly, the attention weight is reshaped into a vector rather than a matrix:
![]() |
(4) |
where
denotes the clone weight vector, which is then multiplied by the value matrix
to generate the bag-level representation.
Cohort heterogeneity: MTL with graph Laplacian regularization
For all patient cohorts, TMBclaw utilizes an MTL approach integrated with graph Laplacian regularization to address cohort heterogeneity. This method preserves the topological relationships among distinct patient cohorts through edge-weighted similarity metrics, effectively addressing training instability caused by limited cohort size while enhancing generalizability.
For
learning tasks (i.e.
cohorts), each task
has a training set
, where the goal is to learn the parameters
that map the input
to output
[45]. The optimization objective for each task is given by
![]() |
![]() |
(5) |
where
denotes the parameters vector of
tasks;
denotes the loss of
-th task with the corresponding weight hyperparameter
. By default,
can be obtained by cross-validation, and a more interpretable method is cosine similarity.
Simple optimization of Eq. (5) leads to decoupling between tasks. To share information among tasks for joint learning, an additional regularization is generally added to constrain the tasks’ parameters:
![]() |
(6) |
where
denotes the regularization term;
denotes the regularization hyperparameter that controls the balance between the loss function (first term) and the regularization (second term). A common strategy is to regularize the distance between parameters using the relationship among tasks.
Our study introduces a graph Laplacian regularization term to account for the differences between datasets. We construct an undirected graph
, where the set of vertices
corresponds to all cohorts and the set of edges
reflects the similarity among cohorts [55]. To capture these correlations, we calculated several correlation matrices:
![]() |
![]() |
![]() |
(7) |
where
denotes the Laplacian matrix;
denotes the diagonal Degree matrix with its diagonal element
representing the sum of the similarities between
-th cohort and all other cohorts;
denotes the Similarity matrix with its element
defined by the cosine similarity. To preserve the local topological relation among cohorts, we introduce a graph Laplacian and define a regularization term as follows:
![]() |
(8) |
where
denotes the sample processed in one iteration;
denotes the output matrix. From the formula above, the graph Laplacian regularization promotes similar cohorts with similar predictions and corresponding model parameters. This smoothness assumption may effectively capture the latent structure between cohorts. The relationships among cohorts are preserved for implicit data augmentation by introducing graph Laplace regularization. The mathematical derivation and explanation are detailed in the Supplementary Materials. The total loss of TMBclaw is defined as
![]() |
(9) |
Model implementation and training
To elucidate the unique advantages of TMBclaw over existing MTL approaches, we conducted a systematic comparison with conventional training strategies. Unlike traditional pooled or task-specific methods that represent extremes in MTL, TMBclaw offers a balanced alternative by integrating the benefits of dependency modeling and task-specific learning. Table 2 compares the modeling assumptions of the three training strategies:
Table 2.
Modeling assumptions of different training strategies
| Strategy | Model structure | Parameter sharing | Regularization |
|---|---|---|---|
| Pooled | Single unified model | Hard | None |
| Task specific | Task-specific models | None | None |
| TMBclaw | Task-specific models | Soft | Graph Laplacian Regularization |
To further elaborate:
(i) Pooled training strategy assumes a shared data distribution and applies a single model across all tasks, potentially overlooking task-specific heterogeneity;
(ii) Task-specific training strategy fully isolates each task, allowing task-specific modeling at the cost of ignoring shared signals across tasks; and
(iii) TMBclaw retains task-specific model structures while incorporating Graph Laplacian Regularization, which enables soft parameter sharing and explicitly models inter-task structure.
To provide a fair comparison with TMBclaw, we implemented two representative baseline models: MultiLayer Perceptron (MLP) and Support Vector Machine (SVM). The MLP consists of three fully connected layers with 64, 32, and 16 hidden units, respectively. Each hidden layer is followed by a ReLU activation. The final output layer employs a sigmoid activation for binary classification and MLP is trained using the Adam optimizer. Notably, MLP omits the MIL framework in TMBclaw by operating on pre-aggregated bag-level features and replaces the Attention Pooling module with a single fully connected layer.
The SVM is implemented using the sklearn.svm.SVC class from Scikit-learn. We use the default Radial Basis Function kernel and enabled probability output for compatibility with evaluation metrics.
In our implementation, all neural network models were updated using the PyTorch framework (version 1.8.1) and trained with the Adam optimizer, adopting the same hyperparameters recommended by the authors or by default. The computational environment was configured with Python 3.9, supported by key libraries including NumPy (version 1.26.4) for numerical operations, Pandas (version 2.2.3) for data handling, and Scikit-learn (version 1.1.1) for machine learning tasks. We employed a stratified 3-fold cross-validation procedure to reduce evaluation bias caused by specific train-test splits and ensure the results represent the overall dataset performance. We reported the average metrics across the folds. All random processes, including data shuffling and weight initialization, were controlled by fixing the random seed to ensure reproducibility.
Results
TMBclaw enhances predictive performance and robustness
We benchmarked TMBclaw, a regularized multi-task learning model, against baseline approaches grouped into two training strategies: (i) pooled training strategy (including pooled analysis, pooled MLP, and pooled SVM) and (ii) task-specific training strategy (including Separate Analysis, Separate MLP, and Separate SVM). Zero-padding was applied for the MLP and SVM baselines to ensure fixed-dimensional inputs. All evaluated models were strictly tested on identical patient sets to ensure a fair comparison.
Given the class imbalance commonly observed in clinical cohorts, we evaluated model performance using multiple classification metrics rather than relying solely on accuracy. Specifically, we calculated class-specific F1 scores (F1 score (0) and F1 score (1)), True Positives (TP), True Negatives (TN), Specificity (Sp), Sensitivity (Se), Positive Predictive Value (PPV), and Negative Predictive Value (NPV). In addition, we introduced the Distance to the Optimal Point (DOP) [15, 32] which is defined as
![]() |
(10) |
This metric integrates Sp, Se, PPV, and NPV to comprehensively assess model performance. By minimizing DOP, we determined the optimal classification threshold, which was subsequently applied to the model’s predicted positive probabilities for label assignment. Table 3 presents the quantitative performance metrics of different models and training strategies.
Table 3.
Quantitative performance metrics of different models and training strategies
| Experimental_Cohorts | |||||||
|---|---|---|---|---|---|---|---|
| Analytic models and strategies | TMBclaw | Pooled analysis | Pooled MLP | Pooled SVM | Separate analysis | Separate MLP | Separate SVM |
| ACC | 71.31% | 68.35% | 62.87% | 66.24% | 66.24% | 62.87% | 59.07% |
| F1 score (0) | 0.7963 | 0.7768 | 0.7256 | 0.7509 | 0.7501 | 0.7295 | 0.6706 |
| F1 score (1) | 0.5144 | 0.4135 | 0.3514 | 0.3696 | 0.3361 | 0.3185 | 0.4522 |
| TP | 36 | 26 | 25 | 26 | 27 | 22 | 40 |
| TN | 133 | 136 | 124 | 131 | 130 | 127 | 100 |
| DOP | 0.7347 | 0.8746 | 0.9654 | 0.9012 | 0.8963 | 0.9887 | 0.8675 |
| External_NSCLC_Cohorts | |||||||
| Analytic models and strategies | TMBclaw | Pooled analysis | Pooled MLP | Pooled SVM | Separate analysis | Separate MLP | Separate SVM |
| ACC | 66.76% | 62.34% | 65.73% | 64.40% | 66.48% | 63.22% | 62.78% |
| F1 score (0) | 0.7496 | 0.7032 | 0.7460 | 0.7283 | 0.7450 | 0.7173 | 0.7120 |
| F1 score (1) | 0.5024 | 0.4678 | 0.4628 | 0.4791 | 0.4310 | 0.4693 | 0.4607 |
| TP | 114 | 113 | 101 | 112 | 98 | 110 | 110 |
| TN | 338 | 309 | 344 | 324 | 352 | 318 | 315 |
| DOP | 0.7870 | 0.8590 | 0.8390 | 0.8294 | 0.8430 | 0.8482 | 0.8617 |
| External_Mel_Cohorts | |||||||
| Analytic models and strategies | TMBclaw | Pooled analysis | Pooled MLP | Pooled SVM | Separate analysis | Separate MLP | Separate SVM |
| ACC | 67.94% | 60.11% | 62.04% | 61.74% | 67.50% | 67.21% | 66.46% |
| F1 score (0) | 0.6963 | 0.6444 | 0.6634 | 0.6440 | 0.7022 | 0.6938 | 0.6906 |
| F1 score (1) | 0.6569 | 0.5456 | 0.5621 | 0.5847 | 0.6374 | 0.6464 | 0.6333 |
| TP | 209 | 162 | 166 | 183 | 197 | 203 | 196 |
| TN | 251 | 245 | 254 | 235 | 260 | 252 | 254 |
| DOP | 0.6506 | 0.8161 | 0.7810 | 0.7735 | 0.6657 | 0.6619 | 0.6786 |
Values in bold represent the optimal performance.
In the Experimental_Cohorts, TMBclaw demonstrates consistently superior performance across multiple evaluation metrics. TMBclaw achieves the highest accuracy (ACC = 71.31%) with balanced detection rates (TP = 36, TN = 133), indicating effective classification of both responder and non-responder groups. This balanced predictive ability is further confirmed by the best F1 scores for both classes (F1 score (0) = 0.7963 and F1 score (1) = 0.5144), highlighting strong precision–recall characteristics. Most notably, TMBclaw attains the lowest DOP, comprehensively demonstrating its optimal balance between sensitivity, specificity, and predictive values.
Similarly, TMBclaw demonstrates consistent superiority across both the External_NSCLC_Cohorts and External_Mel_Cohorts, outperforming all comparator methods. This advantage is evident in its best performance across multiple evaluation metrics, particularly its lowest DOP, highlighting a superior overall balance between sensitivity, specificity, and predictive values.
We further extended our analysis to include the area under the curve (AUC), the area under the precision–recall curve (AUPRC), and detailed comparisons of sample-score distribution.
According to Fig. 2A, TMBclaw achieves an AUC of 0.68 across the Experimental_Cohorts, indicating its excellent predictive performance. Under the pooled training strategy, TMBclaw outperforms other models by 9%–20% in AUC compared to pooled analysis (AUC = 0.59), pooled MLP (AUC = 0.48), and pooled SVM (AUC = 0.59). This improvement likely stems from its ability to preserve cohort-specific characteristics while effectively accounting for cohort heterogeneity. Simultaneously, under the task-specific training strategy, TMBclaw achieves an 8%–18% higher AUC than other models by utilizing information from relevant tasks with Separate Analysis (AUC = 0.57), Separate MLP (AUC = 0.50), and Separate SVM (AUC = 0.60).
Figure 2.
Predictive performances of different models and training strategies across the Experimental_Cohorts. (A) Receiver operating characteristic (ROC) curves with corresponding area under the curve (AUC) values for TMBclaw and baseline models. TMBclaw achieves the highest AUC (0.68), outperforming all pooled and separately trained methods. (B) Area under the precision–recall curve (AUPRC) for each model. TMBclaw yields the highest AUPRC, reflecting improved predictive reliability in imbalanced clinical data. (C) Violin plots of model-predicted scores stratified by responder status. Each plot shows the score distribution for responders (dark) and non-responders (light), with associated P-values from Mann–Whitney U tests. TMBclaw demonstrates the most statistically significant separation (P = 9.12e−05), supporting its superior discriminative ability.
In addition, TMBclaw shows optimal AUPRC performance among all models (Fig. 2B), reflecting its effectiveness in capturing positive cases. Furthermore, the sample-scores predicted by TMBclaw differ significantly between responders and non-responders, with a P value of <.02 (Fig. 2C).
Across both External_NSCLC_Cohorts (Fig. 3A) and External_Mel_Cohorts (Fig. 3B), TMBclaw demonstrates 4%–6% and 2%–12% higher AUCs compared to conventional models, respectively. TMBclaw also achieves consistently superior AUPRCs and clear stratification between responders and non-responders, evidenced by significantly distinct sample-score distribution (P < .02), confirming its robust predictive performance.
Figure 3.
Predictive performances of different models and training strategies across External_NSCLC_Cohorts (A) and External_Mel_Cohorts (B).
TMBclaw refines clinical risk stratification and survival prediction
In addition to objective response, PFS is a critical measure, as it reflects both the time to disease progression and the duration of treatment effectiveness, offering a comprehensive evaluation of therapeutic benefit. Therefore, we studied the classification ability of TMBclaw for patients’ progression-free survival outcomes following ICIs.
We first confirmed the prognostic value of TMBclaw’s prediction score (sample-score) through Cox regression analyses adjusting for TMB and cohort effects. We performed the following comparative analysis: (i) sample-score − model: Cox regression incorporating only known clinical covariates (TMB + cohort); (ii) sample-score + model: Cox regression incorporating both known covariates and TMBclaw’s prediction score.
In the Experimental_Cohorts, inclusion of sample-score substantially improved model performance (Table 4). Specifically, the sample-score consistently exhibited negative coefficients (βsample-score = −1.01 ± 0.18, min–max: −1.21 to −0.84), indicating that higher values predict better survival, consistent with its design as a response probability metric. Notably, the absolute coefficient magnitude of sample-score exceeded that of TMB (e.g. Fold 1 |βsample-score| = .99 versus |βTMB| = .37), highlighting its stronger prognostic relevance. Moreover, sample-score achieved statistical significance (P < .05) across all three folds, whereas other covariates showed more variable patterns. Importantly, incorporating sample-score yielded consistent improvements in predictive discrimination, with average C-index values increased.
Table 4.
Cox coefficients of models with and without sample-score in the experimental cohorts
| Fold | Model | C Index | Covariate | Coef | exp(coef) | P |
|---|---|---|---|---|---|---|
| 1 | Sample_score− | 0.640729 | Cohort | 0.215803 | 1.240858 | .006906 |
| TMB | −0.41015 | 0.663551 | .005483 | |||
| Sample_score+ | 0.65545 | Cohort | 0.133078 | 1.142339 | .112351 | |
| TMB | −0.37308 | 0.688608 | .011806 | |||
| sample_score | −0.98875 | 0.372043 | .005794 | |||
| 2 | Sample_score− | 0.597603 | cohort | 0.283083 | 1.327215 | .000317 |
| TMB | −0.45393 | 0.635126 | .001975 | |||
| Sample_score+ | 0.621575 | cohort | 0.199464 | 1.220749 | .021546 | |
| TMB | −0.43055 | 0.650154 | .003413 | |||
| sample_score | −0.84189 | 0.430895 | .034286 | |||
| 3 | Sample_score− | 0.61431 | cohort | 0.321125 | 1.378678 | 2.79e−05 |
| TMB | −0.33774 | 0.71338 | .017479 | |||
| Sample_score+ | 0.606133 | cohort | 0.256033 | 1.291795 | .001337 | |
| TMB | −0.28467 | 0.75226 | .051169 | |||
| sample_score | −1.20903 | 0.298486 | .001562 |
Values in bold represent the optimal performance.
Similar findings were observed in the External_NSCLC_Cohorts and External_Mel_Cohorts (Supplementary Table S4), further confirming the robust prognostic value of TMBclaw. Based on these results, we next stratified patients by sample-score and performed Kaplan–Meier survival analyses to directly assess differences in survival outcomes between high- and low-sample-score groups.
The Kaplan–Meier survival curves in Fig. 4A and Supplementary Fig. S4 reveal that in the Experimental_Cohorts, patients with low sample-scores predicted by TMBclaw exhibit significantly poorer PFS compared to those with high sample-scores, with quantitative metrics showing HR = 2.4, P = .0000. In contrast, other models perform with limited effectiveness in risk stratification (Fig. 4B and Fig. 4C).
Figure 4.
Survival analysis of different models and training strategies across the Experimental_Cohorts. (A–D) Kaplan–Meier survival curves of TMBclaw. (A), pooled analysis (B) and separate analysis (C). Patients are stratified into score-high (Score-H: sample-score ≥ 0.5) and score-low (Score-L: sample-score < 0.5) groups. P-values are calculated using the log-rank test. (D-F) Forest plots show hazard ratios (HRs) for progression-free survival (PFS) stratified by cohort and overall population. TMBclaw exhibits more consistent and statistically robust cohort performance than other models, reinforcing its utility for clinically interpretable patient stratification.
The forest plot in Fig. 4D–4F further illustrates that TMBclaw consistently stratifies patients across all cohorts, with worse PFS observed in the predicted Score-H groups (HR < 1), indicating strong and stable stratified performance.
This superior performance is also demonstrated by survival analysis in the External_NSCLC_Cohorts (Fig. 5A, Supplementary Fig. S5) and External_Mel_Cohorts (Fig. 5B, Supplementary Fig. S6) cohorts, with significant patient stratification and consistent risk assessment across cohorts.
Figure 5.
Survival analysis of different models and training strategies across the External_NSCLC_Cohorts (A) and External_Mel_Cohorts (B). Rows without hazard ratio (HR) plots correspond to studies with only one stratified group, where HR estimation was not feasible.
To further validate TMBclaw’s risk stratification performance, we conducted time-dependent ROC analyses (Supplementary Figs. S7–S9). TMBclaw maintained robust predictive performance for both short-term and long-term survival across all cohorts, confirming its temporal stability and clinical utility.
Ablation analysis of TMBclaw: multi-clonal structure integration and regularized design
To assess whether incorporating all detected clones and their heterogeneity improves prognostic performance, we compared TMBclaw against alternative predictive features—including total TMB, clonal TMB (cTMB, mutations present in the master clone), and subclonal TMB (mutations exclusive to minor subclones)—in ablation experiments. Performance was evaluated using DOP and AUC, as summarized in Table 5.
Table 5.
Quantitative performance metrics of different predictive features
| Predictive features | Experimental_Cohorts | External_NSCLC_Cohorts | External_Mel_Cohorts | |||
|---|---|---|---|---|---|---|
| DOP | AUC | DOP | AUC | DOP | AUC | |
| TMBclaw | 0.7347 | 67.87% | 0.7870 | 65.56% | 0.6506 | 72.60% |
| Total TMB | 0.8772 | 50.46% | 0.8355 | 61.39% | 0.6624 | 70.66% |
| cTMB | 0.8880 | 57.28% | 0.9637 | 53.70% | 0.6522 | 70.97% |
| Subclonal TMB | 0.8312 | 63.84% | 0.8384 | 62.46% | 0.6638 | 69.62% |
Values in bold represent the optimal performance.
In the Experimental_Cohorts, TMBclaw achieves both the lowest DOP of 0.7347 and highest AUC of 67.87%, outperforming total TMB (DOP = 0.8772, AUC = 50.46%), cTMB (DOP = 0.8880, AUC = 57.28%), and subclonal TMB (DOP = 0.8312, AUC = 63.84%). The performance limitations of total TMB may stem from its failure to account for clonal architecture by indiscriminately aggregating all mutations, while cTMB completely omits subclonal information, which limits the model’s ability to capture tumor heterogeneity and thereby weakens its prognostic capability.
Performance comparisons in the External_NSCLC_Cohorts and External_Mel_Cohorts further confirm that the TMBclaw outperforms alternative predictive features, highlighting its robustness and generalization capability across heterogeneous cohorts.
To systematically evaluate the contributions of each module in TMBclaw and analyze the effectiveness of its innovative design, we designed the following ablation experiments: (i) Base-MIL: a MIL framework with attention pooling and task-specific training, excluding all regularization terms. This setting is designed to assess the model’s sensitivity to cohort heterogeneity without structural constraints; (ii) Reg-MLP: a task-specific MLP that replaces the clonal-structured MIL model while incorporating the graph Laplacian regularization term. This setting is intended to examine how clonal selection bias affects prediction robustness.
Table 6 presents the performance differences between TMBclaw and its two core variants (Base-MIL and Reg-MLP), highlighting the marginal contributions of each module to the overall model design. In the Experimental_Cohorts, TMBclaw achieves the lowest DOP and highest AUC in treatment response prediction, demonstrating the benefit of combining clonal-structured modeling with graph-based regularization.
Table 6.
Quantitative performance metrics of different predictive models
| Models | Experimental_Cohorts | External_NSCLC_Cohorts | External_Mel_Cohorts | |||
|---|---|---|---|---|---|---|
| DOP | AUC | DOP | AUC | DOP | AUC | |
| TMBclaw | 0.7347 | 67.87% | 0.7870 | 65.56% | 0.6506 | 72.60% |
| Base-MIL | 0.8963 | 56.58% | 0.8430 | 62.50% | 0.6657 | 71.42% |
| Reg-MLP | 0.9533 | 52.43% | 0.8462 | 61.37% | 0.6700 | 71.09% |
Values in bold represent the optimal performance.
Ablation experiments conducted on the External_NSCLC_Cohorts and External_Mel_Cohorts further corroborate this advantage, with TMBclaw consistently delivering the lowest DOP and highest AUC.
TMBclaw reveals interpretable clonal contributions via attention visualization
Our analysis of TMBclaw’s interpretation of clonal structure focused on the learned importance of individual clones. Specifically, clones were indexed in descending order of cellular prevalence, reflecting their evolutionary hierarchy. The corresponding attention weights are visualized in Fig. 6, with detailed patient-level clonal weights summarized in Supplementary Table S2.
Figure 6.
Visualization of clone weights. (A) Distribution of clone weights across different cohorts, shown as violin plots annotated with mean values. (B) Average weight of each clone within individual cohorts.
In the Experimental_Cohorts (Fig. 6A), the master clone (clone 1) stands out with a significantly greater weight, achieving a mean ~0.506 and the highest median, highlighting its crucial role in prognostic prediction. By contrast, subclones contribute substantially less on average. Similar trends are observed in the External_NSCLC_Cohorts and External_Mel_Cohorts, where the master clone consistently exhibits the highest median weight, with mean values of 0.464 and 0.446, respectively, further indicating its predictive importance.
To gain deeper insight into clonal contributions across patient cohorts, we visualized the average weight of each clone within individual cohorts in a heatmap (Fig. 6B). In most melanoma and NPC cohorts, the master clone consistently dominates with the greatest average weight. It is worth noting that in certain External_NSCLC_Cohorts (e.g. External_NSCLC 1 and External_NSCLC 2), some subclones exhibit non-negligible weights.
This observation aligns with previous studies implicating the role of subclones in prognostic outcomes, particularly in NSCLC. Biologically, the master clone (clone 1) is often the dominant clone in terms of cellular prevalence, reflecting its critical role in tumor progression and therapy response. The higher attention weight assigned to clone 1 suggests that it is biologically more influential in predicting outcomes, as it likely drives the primary tumor characteristics and response to treatment. Subclones, while contributing less on average, can still play a role, particularly in more heterogeneous cancers like melanoma, where subclonal evolution is often linked to therapeutic resistance and disease recurrence.
These findings highlight that accounting for all mutant clones and their differential contributions could improve prognostic modeling, a capability supported by TMBclaw.
Discussion
In this study, we developed an interpretable prognostic prediction model based on clonal mutations applicable to heterogeneous patient cohorts. TMBclaw meticulously integrates tumor clones within an MIL framework, which holds significant potential to elucidate complex tumor heterogeneity and provides interpretable insights for clinical decision-making. By introducing the concept of MTL, our model successfully unites multiple cohort-specific models within a unified computational framework.
TMBclaw mines the relationship between cohorts via a graph Laplacian regularization term and achieves information sharing among cohorts, solving the insufficiency of data in the Separate Analysis and the neglect of cohort heterogeneity in the pooled analysis. In clinical patient cohorts covering NSCLC, melanoma, and NPC, TMBclaw outperforms classical machine learning models under different analytic strategies by delivering more precise prognostic predictions and enhanced risk stratification. Benefiting from the self-attention mechanism, our model integrates all clones and learns the interrelationships between clones to quantify their contribution. This strategy enables the screening of key clones that mainly influence the prognosis of immunotherapy, which is crucial for tailoring targeted therapies and optimizing treatment strategies.
TMBclaw could benefit from further exploration of deep learning networks to enhance prediction efficacy. Additionally, incorporating a broader array of genetic mutation signatures may provide a more comprehensive understanding of tumor heterogeneity, ultimately improving the model’s ability to support targeted therapeutic strategies.
Moreover, we recognize that the tumor immune microenvironment (TIME) is a pivotal determinant of ICI efficacy [7, 14, 31]. While our current study was limited to genomic sequencing data without matched transcriptomic profiles, which prevented direct quantification of immune cell infiltration, future integration of RNA sequencing or multi-omics datasets will allow us to explicitly link predicted risk groups with TIME characteristics. Such extensions would not only strengthen the biological interpretability of our framework but also expand its translational potential in guiding immunotherapy.
Conclusion
TMBclaw is an advanced prognostic prediction framework. It provides more precise quantification of tumor immunogenicity, with interpretable outputs offering transparent and understandable treatment decision-making support for clinicians, which is a key consideration in personalized treatment strategies. Moreover, TMBclaw effectively alleviates training difficulties caused by data scarcity, significantly improving prediction accuracy and patient stratification, further enhancing its practical value in clinical decision-making and advancing the application of precision oncology.
Key Points
We propose TMBclaw, a hierarchical graph-regularized multi-task learning framework that explicitly models tumor clonal architecture and cohort-level heterogeneity, addressing limitations of conventional TMB-based biomarkers.
Through adaptive graph Laplacian regularization and attention-based multiple-instance learning, TMBclaw enables effective cross-cohort knowledge transfer and interpretable identification of immunotherapy-relevant clones.
Extensive validation on 1671 patients across institutional and public cohorts demonstrates that TMBclaw significantly improves prognostic accuracy and risk stratification in immune checkpoint inhibitor response prediction.
Supplementary Material
Contributor Information
Yixuan Wang, Department of Biomedical Engineering, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, 29 Jiangjun Avenue, Jiangning, Nanjing 211106, Jiangsu, China.
Tianyi Zhu, School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, 28 Xianning West Road, Beilin, Xi’an 710049, Shaanxi, China.
Xiaofeng Song, Department of Biomedical Engineering, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, 29 Jiangjun Avenue, Jiangning, Nanjing 211106, Jiangsu, China.
Peng Chen, College of Computer Science and Technology, Zhejiang University, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, China.
Xiaoyan Zhu, School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, 28 Xianning West Road, Beilin, Xi’an 710049, Shaanxi, China.
Zhili Chang, School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, 28 Xianning West Road, Beilin, Xi’an 710049, Shaanxi, China; Geneseeq Research Institute, Nanjing Geneseeq Technology Inc., 128 Huakang Road, Pukou, Nanjing 210032, Jiangsu, China.
Xiaonan Wang, School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, 28 Xianning West Road, Beilin, Xi’an 710049, Shaanxi, China; Geneseeq Research Institute, Nanjing Geneseeq Technology Inc., 128 Huakang Road, Pukou, Nanjing 210032, Jiangsu, China.
Xin Lai, School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, 28 Xianning West Road, Beilin, Xi’an 710049, Shaanxi, China.
Jiayin Wang, School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, 28 Xianning West Road, Beilin, Xi’an 710049, Shaanxi, China.
Author contributions
W.Y., L.X., and W.J. conceived this research. W.Y., Z.T., and C.P. designed the model. Z.T. implemented the program and performed the experiments. W.Y., S.X., W.X., and C.Z. collected and analyzed the data. W.Y., Z.T., L.X., and W.J. wrote the manuscript. S.X., C.P., Z.X., W.X., and C.Z. revised the manuscript. All authors have read and agreed to the latest version of the manuscript.
Conflict of interest: W.X. and C.Z. are employed by Nanjing Geneseeq Technology Inc. The remaining authors declare that the research was conducted without commercial or financial relationships that could be construed as a potential conflict of interest.
Funding
This work was supported by the National Natural Science Foundation of China [grant nos. 62302215, 72293581, 72293580, and 72274152].
References
- 1. Anagnostou V, Bardelli A, Chan TA. et al. The status of tumor mutational burden and immunotherapy. Nat Cancer 2022;3:652–6. 10.1038/s43018-022-00382-1 [DOI] [PubMed] [Google Scholar]
- 2. Boll LM, Perera-Bel J, Rodriguez-Vida A. et al. The impact of mutational clonality in predicting the response to immune checkpoint inhibitors in advanced urothelial cancer. Sci Rep 2023;13:15287. 10.1038/s41598-023-42495-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Budczies J, Kazdal D, Menzel M. et al. Tumour mutational burden: clinical utility, challenges and emerging improvements. Nat Rev Clin Oncol 2024;21:725–42. 10.1038/s41571-024-00932-9 [DOI] [PubMed] [Google Scholar]
- 4. Carlino MS, Larkin J, Long GV. Immune checkpoint inhibitors in melanoma. Lancet 2021;398:1002–14. 10.1016/S0140-6736(21)01206-X [DOI] [PubMed] [Google Scholar]
- 5. Chalmers ZR, Connelly CF, Fabrizio D. et al. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med 2017;9:1–14. 10.1186/s13073-017-0424-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Chan TA, Yarchoan M, Jaffee E. et al. Development of tumor mutation burden as an immunotherapy biomarker: utility for the oncology clinic. Ann Oncol 2019;30:44–56. 10.1093/annonc/mdy495 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Chen B, Khodadoust MS, Liu CL. et al. Profiling Tumor Infiltrating Immune Cells with CIBERSORT.In: von Stechow, L. (eds) Cancer Systems Biology. Methods in Molecular Biology, vol 1711. Humana Press, New York, NY. 10.1007/978-1-4939-7493-1_12 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Dall'Olio FG, Marabelle A, Caramella C. et al. Tumour burden and efficacy of immune-checkpoint inhibitors. Nat Rev Clin Oncol 2022;19:75–90. 10.1038/s41571-021-00564-3 [DOI] [PubMed] [Google Scholar]
- 9. Dancey JE, Dodd LE, Ford R. et al. Recommendations for the assessment of progression in randomised cancer treatment trials. Eur J Cancer 2009;45:281–9. 10.1016/j.ejca.2008.10.042 [DOI] [PubMed] [Google Scholar]
- 10. Eisenhauer EA, Therasse P, Bogaerts J. et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 2009;45:228–47. 10.1016/j.ejca.2008.10.026 [DOI] [PubMed] [Google Scholar]
- 11. Elmoznino E, Bonner MF. High-performing neural network models of visual cortex benefit from high latent dimensionality. PLoS Comput Biol 2024;20:e1011792. 10.1371/journal.pcbi.1011792 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Fang W, Ma Y, Yin JC. et al. Comprehensive genomic profiling identifies novel genetic predictors of response to anti–PD-(L) 1 therapies in non–small cell lung cancer. Clin Cancer Res 2019;25:5015–26. 10.1158/1078-0432.CCR-19-0585 [DOI] [PubMed] [Google Scholar]
- 13. Fang W, Yang Y, Ma Y. et al. Camrelizumab (shr-1210) alone or in combination with gemcitabine plus cisplatin for nasopharyngeal carcinoma: results from two single-arm, phase 1 trials. Lancet Oncol 2018;19:1338–50. 10.1016/S1470-2045(18)30495-9 [DOI] [PubMed] [Google Scholar]
- 14. Fernández EA, Mahmoud YD, Veigas F. et al. Unveiling the immune infiltrate modulation in cancer and response to immunotherapy by MIXTURE-an enhanced deconvolution method. Brief Bioinform 2021;22:bbaa317. 10.1093/bib/bbaa317 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Fernández EA, Valtuille R, Presedo JM. et al. Comparison of different methods for hemodialysis evaluation by means of ROC curves: from artificial intelligence to current methods. Clin Nephrol 2005;64:205–13. 10.5414/cnp64205 [DOI] [PubMed] [Google Scholar]
- 16. Frankell AM, Dietzen M, Al Bakir M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 2023;616:525–33. 10.1038/s41586-023-05783-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Hellmann MD, Nathanson T, Rizvi H. et al. Genomic features of response to combination immunotherapy in patients with advanced non-small-cell lung cancer. Cancer Cell 2018;33:843–852.e4. 10.1016/j.ccell.2018.03.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Hugo W, Zaretsky JM, Sun L. et al. Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell 2016;165:35–44. 10.1016/j.cell.2016.02.065 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Ilse M, Tomczak JM, Welling M. Attention-based deep multiple instance learning. ICML 2018;80:2127–36. 10.48550/arXiv.1802.04712 [DOI] [Google Scholar]
- 20. Jardim DL, Goodman A, de Melo GD. et al. The challenges of tumor mutational burden as an immunotherapy biomarker. Cancer Cell 2021;39:154–73. 10.1016/j.ccell.2020.10.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Johnson PC, Gainor JF, Sullivan RJ. et al. Immune checkpoint inhibitors - the need for innovation. N Engl J Med 2023;388:1529–32. 10.1056/NEJMsb2300232 [DOI] [PubMed] [Google Scholar]
- 22. Koh G, Degasperi A, Zou X. et al. Mutational signatures: emerging concepts, caveats and clinical applications. Nat Rev Cancer 2021;21:619–37. 10.1038/s41568-021-00377-7 [DOI] [PubMed] [Google Scholar]
- 23. Lemery S, Keegan P, Pazdur R. First FDA approval agnostic of cancer site - when a biomarker defines the indication. N Engl J Med 2017;377:1409–12. 10.1056/NEJMp1709968 [DOI] [PubMed] [Google Scholar]
- 24. Litchfield K, Reading JL, Puttick C. et al. Meta-analysis of tumor- and T cell-intrinsic mechanisms of sensitization to checkpoint inhibition. Cell 2021;184:596–614.e14. 10.1016/j.cell.2021.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Liu D, Schilling B, Liu D. et al. Integrative molecular and clinical modeling of clinical outcomes to PD1 blockade in patients with metastatic melanoma. Nat Med 2019;25:1916–27. 10.1038/s41591-019-0654-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Łuksza M, Sethna ZM, Rojas LA. et al. Neoantigen quality predicts immunoediting in survivors of pancreatic cancer. Nature 2022;606:389–95. 10.1038/s41586-022-04735-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Ma Y, Fang W, Zhang Y. et al. A phase I/II open-label study of nivolumab in previously treated advanced or recurrent nasopharyngeal carcinoma and other solid tumors. Oncol 2019;24:891–e431. 10.1634/theoncologist.2019-0284 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. McGrail DJ, Pilié PG, Rashid NU. et al. High tumor mutation burden fails to predict immune checkpoint blockade response across all cancer types. Ann Oncol 2021;32:661–72https://pubmed.ncbi.nlm.nih.gov/33736924/. 10.1016/j.annonc.2021.02.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. McGranahan N, Furness AJ, Rosenthal R. et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science 2016;351:1463–9. 10.1126/science.aaf1490 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Nair V, Hinton GE. Rectified linear units improve restricted Boltzmann machines. ICML 2010;27:807–14. 10.5555/3104322.3104425 [DOI] [Google Scholar]
- 31. Nava A, Alves da Quinta D, Prato L. et al. Novel evaluation approach for molecular signature-based deconvolution methods. J Biomed Inform 2023;142:104387. 10.1016/j.jbi.2023.104387 [DOI] [PubMed] [Google Scholar]
- 32. Nibeyro G, Baronetto V, Folco JI. et al. Unraveling tumor specific neoantigen immunogenicity prediction: a comprehensive analysis. Front Immunol 2023;14:1094236. 10.3389/fimmu.2023.1094236 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Phan TH, Yamamoto K. Resolving class imbalance in object detection with weighted cross entropy losses. ArXiv 2020;abs/2006.01413. https://arxiv.org/abs/2006.01413, 10.21315/mjms-11-2024-909, 32, 144, 155. [DOI] [Google Scholar]
- 34. Ravi A, Hellmann MD, Arniella MB. et al. Genomic and transcriptomic analysis of checkpoint blockade response in advanced non-small cell lung cancer. Nat Genet 2023;55:807–19. 10.1038/s41588-023-01355-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Riaz N, Havel JJ, Makarov V. et al. Tumor and microenvironment evolution during immunotherapy with nivolumab. Cell 2017;171:934–949.e16. 10.1016/j.cell.2017.09.028 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Rizvi H, Sanchez-Vega F, La K. et al. Molecular determinants of response to anti–programmed cell death (PD)-1 and anti-programmed death-ligand 1 (PD-L1) blockade in patients with non-small-cell lung cancer profiled with targeted next-generation sequencing. J Clin Oncol 2018;36:633–41. 10.1200/JCO.2017.75.3384 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Rizvi NA, Hellmann MD, Snyder A. et al. Mutational landscape determines sensitivity to PD-1 blockade in non–small cell lung cancer. Science 2015;348:124–8. 10.1126/science.aaa1348 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Roh W, Chen P-L, Reuben A. et al. Integrated molecular analysis of tumor biopsies on sequential CTLA-4 and PD-1 blockade reveals markers of response and resistance. Sci Transl Med 2017;9:eaah3560. 10.1126/scitranslmed.aah3560 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Roth A, Khattra J, Yap D. et al. PyClone: statistical inference of clonal population structure in cancer. Nat Methods 2014;11:396–8. 10.1038/nmeth.2883 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Samstein RM, Lee C-H, Shoushtari AN. et al. Tumor mutational load predicts survival after immunotherapy across multiple cancer types. Nat Genet 2019;51:202–6. 10.1038/s41588-018-0312-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Shao Z, Bian H, Chen Y. et al. TransMIL: transformer based correlated multiple instance learning for whole slide image classification. NeurIPS 2021;34:2136–46. 10.48550/arXiv.2106.00908 [DOI] [Google Scholar]
- 42. Subbiah V, Solit DB, Chan TA. et al. The FDA approval of pembrolizumab for adult and pediatric patients with tumor mutational burden (TMB) ≥10: a decision centered on empowering patients and their physicians. Ann Oncol 2020;31:1115–8. 10.1016/j.annonc.2020.07.002 [DOI] [PubMed] [Google Scholar]
- 43. Tang S, Qin C, Hu H. et al. Immune checkpoint inhibitors in non-small cell lung cancer: progress, challenges, and prospects. Cells 2022;11:320. 10.3390/cells11030320 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Thummalapalli R, Ricciuti B, Bandlamudi C. et al. Clinical and molecular features of long-term response to immune checkpoint inhibitors in patients with advanced non-small cell lung cancer. Clin Cancer Res 2023;29:4408–18. 10.1158/1078-0432.CCR-23-1207 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Thung KH, Wee CY. A brief review on multi-task learning. Multimed Tools Appl 2018;77:29705–25. 10.1007/s11042-018-6463-x [DOI] [Google Scholar]
- 46. Valero C, Lee M, Hoen D. et al. The association between tumor mutational burden and prognosis is dependent on treatment context. Nat Genet 2021;53:11–5. 10.1038/s41588-020-00752-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Van Allen EM, Miao D, Schilling B. et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science 2015;350:207–11. 10.1126/science.aad0095 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Vanguri RS, Luo J, Aukerman AT. et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat Cancer 2022;3:1151–64. 10.1038/s43018-022-00416-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Vaswani A, Shardlow L, Parmar N. et al. Attention is all you need. NeurIPS 2017;3:5998–6008. https://arxiv.org/abs/1706.03762 [Google Scholar]
- 50. Vega DM, Yee LM, McShane LM. et al. Aligning tumor mutational burden (TMB) quantification across diagnostic platforms: phase II of the Friends of Cancer Research TMB Harmonization Project. Ann Oncol 2021;32:1626–36. 10.1016/j.annonc.2021.09.016 [DOI] [PubMed] [Google Scholar]
- 51. Wang X, Lamberti G, Di Federico A. et al. Tumor mutational burden for the prediction of PD-(L)1 blockade efficacy in cancer: challenges and opportunities. Ann Oncol 2024;35:508–22. 10.1016/j.annonc.2024.03.007 [DOI] [PubMed] [Google Scholar]
- 52. Westcott PMK, Muyas F, Hauck H. et al. Mismatch repair deficiency is not sufficient to elicit tumor immunogenicity. Nat Genet 2023;55:1686–95. 10.1038/s41588-023-01499-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Yoo SK, Fitzgerald CW, Cho BA. et al. Prediction of checkpoint inhibitor immunotherapy efficacy for cancer using routine blood tests and clinical data. Nat Med 2025;31:869–80. 10.1038/s41591-024-03398-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Zantvoort K, Nacke B, Görlich D. et al. Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions. NPJ Digit Med 2024;7:361. 10.1038/s41746-024-01360-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Zhu X, Suk HI, Lee SW. et al. Subspace regularized sparse multitask learning for multiclass neurodegenerative disease identification. IEEE T BIO-MED ENG 2016;63:607–18https://pubmed.ncbi.nlm.nih.gov/26276982/. 10.1109/TBME.2015.2466616 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.























