Decoding TCR recognition via geometric deep learning of immunological fingerprints

Chun Shang; Kevin C Chan; Ruhong Zhou

doi:10.1093/bib/bbag048

. 2026 Mar 16;27(2):bbag048. doi: 10.1093/bib/bbag048

Decoding TCR recognition via geometric deep learning of immunological fingerprints

Chun Shang ^1,², Kevin C Chan ³, Ruhong Zhou ^4,^5,^6,^7,^✉

PMCID: PMC12989321 PMID: 41834397

Abstract

T cell receptor (TCR) recognition of peptide–major histocompatibility complex (pMHC) molecules is the critical first step in adaptive immune activation, shaping immunity against pathogens and tumors, as well as tolerance to self. Despite extensive structural characterization of TCR–pMHC complexes, the molecular principles underlying this process remain incompletely understood, hindered by the inherent duality of TCR specificity and cross-reactivity. Traditional structural analyses often fall short in capturing the multidimensional features that govern TCR–pMHC engagement. Here, we introduce a multimodal geometric deep learning framework that systematically extracts and learns various physicochemical and spatial features from pMHC interfaces, which encode key immunological cues for TCR recognition. Applied to a curated dataset of human leukocyte antigens HLA-A^*02–peptide–TCR crystal structures, our model robustly predicts TCR binding preferences and uncovers interfacial “immunological fingerprints” that inform receptor engagement. Through an integrated explainability module, we identify critical contact residues and interaction motifs, thus providing interpretable insights into the determinants of TCR specificity. We further demonstrate the model’s generalizability by analyzing HLA-B^*27–peptide complexes, revealing potential TCR cross-reactivity between self-derived and bacterial peptides—highlighting its utility in probing molecular mimicry. This work establishes a scalable, structure-based approach for decoding T cell recognition and offers a powerful tool for guiding antigen design, vaccine development, and TCR-based immunotherapies.

Keywords: TCR–pMHC recognition, immunological fingerprints, geometric deep learning, structural immunoinformatics, antigen design

Introduction

T cell-mediated immune responses are initiated by the recognition of peptide–major histocompatibility complex (pMHC) molecules by T cell receptors (TCRs) [1, 2]. This TCR–pMHC interaction is a critical determinant of T cell activation and subsequent immune response [3–6]. However, understanding and controlling T cell recognition remains a significant challenge due to the complex nature of TCR binding: being both highly specific and cross-reactive—a duality that gives rise to promiscuous binding behavior [7–9].

Despite extensive research efforts, the precise mechanisms underlying TCR–pMHC interactions remain elusive [10, 11]. This gap in knowledge stems in part from the limited availability of experimentally determined TCR–pMHC complex structures—only a few hundred have been resolved to date [12]. This scarcity of structural data stands in stark contrast to the enormous diversity of the TCR repertoire (~2.5 × 10⁷ unique clonotypes within an individual) [13], underscoring the urgent need to elucidate generalized principles that govern TCR recognition [14, 15]. Current structural studies that aim to uncover recognition rules typically rely on visual inspection and qualitative comparison of local interaction environments among a few selected structures [11, 16]. A recently proposed method, TCRen [17], learns pairwise TCR–peptide contact potentials from available TCR–pMHC complex structures [on the order of a few hundred in the Protein Data Bank (PDB)] and generates an “importance map” that identifies key residues influencing TCR–pMHC binding. Some other works have also analyzed global TCR-binding footprints on pMHC surfaces, focusing either on spatial regions or specific layers of physicochemical properties (e.g. electrostatics) [18, 19]. Despite successes of these methods, they are often limited due to the lack of high-quality 3D complex structures as well as quantitative binding affinity data, and their extracted patterns may be difficult to generalize. Moreover, there remains a lack of integrated methods that simultaneously model spatial, geometric, and chemical properties across structurally diverse interfaces in a learnable and interpretable framework [20, 21].

The molecular surface provides a high-level representation of protein structure, treating the protein as a continuous 3D shape enriched with geometric and chemical properties [22, 23]. Recent advances in geometric deep learning have enabled more powerful and expressive representations of such surfaces, allowing models to learn from protein shape in geodesic space [24, 25]. One notable example is the Molecular Surface Interaction Fingerprinting (MaSIF) framework, which leverages surface-based geometric deep networks to analyze and predict protein interaction patterns [26]. By calculating and integrating local surface curvature, electrostatics, and hydropathy features at each surface point, MaSIF demonstrates how multimodal surface fingerprints can encode functionally meaningful information [27, 28]. Its key innovation lies in enabling end-to-end learning directly on molecular surfaces, allowing generalization across tasks such as interaction site prediction and binding partner identification.

Building on this conceptual foundation, we hypothesize that pMHC interfaces—the subsurfaces surrounding the bound peptide—are also embedded with fingerprint-like patterns of geometric and physicochemical features that reveal immunological information. Specifically, we propose that pMHCs recognized by the same TCR may share subtle interfacial feature patterns that can be effectively captured through high-dimensional analysis with modern machine learning techniques. To test this hypothesis, we built a multimodal deep learning framework to predict the TCR-binding preferences of pMHCs using ‘immunological fingerprinting’ (IMPRINT). On the basis of successful predictions, we also provided detailed explanations to rationalize the results, offering insights that extend beyond previous studies. By integrating structural biology, machine learning, and immunology, our study aims to advance our understanding of the structural basis of T cell recognition.

Methods

Dataset

We curated two datasets of pMHC–TCR complexes to train and evaluate our model, as summarized in Table 1 and Table S1.

Table 1.

Curated dataset of HLA-A^*02–peptide–TCR complexes used for model training and evaluation.

	MHC type	TCR type/category info	Epitope	PDBID
1	HLA-A^*02	Category 1, A6; size, 10; peptide, 9-mer	LGYGFVNYI	3PWP
2			LLFGFPVYV	3D39
3			LLFGFPVYV	3D3V
4			LLFGFPVYV	3QFJ
5			LLFGKPVYV	2GJ6
6			LLFGYAVYV	1QRN
7			LLFGYPRYV	1QSE
8			LLFGYPVAV	1QSF
9			LLFGYPVYV	1AO7
10			MLWGYLQYV	3H9S
11		Category 2, 1E6; size, 9; peptide, 10-mer	ALWGPDPAAA	3UTS
12			ALWGPDPAAA	3UTT
13			AQWGPDPAAA	5HYJ
14			MVWGPDPLYV	5C0A
15			RQFGPDFPTI	5C0B
16			RQFGPDWIVA	5C0C
17			RQWGPDPAAV	5C08
18			YLGGPDFPTI	5C09
19			YQFGPDFPIA	5C07
20		Category 3, DMF5; size, 6; peptide, 9/10-mer	AAGIGILTV	3QDJ
21			AAGIGILTV	6D78
22			ELAGIGILTV	3QDG
23			ELAGIGILTV	6DKP
24			MMWDRGLGMM	6AMU
25			SMLGIGIVPV	6 AM5
26		Category 4, JM22; size, 5; peptide, 9-mer	GILEFVFTL	5HHO
27			GILGFVFTL	1OGA
28			GILGFVFTL	2VLJ
29			GILGFVFTL	2VLK
30			GILGLVFTL	5HHM
31		Category 5, a24b17; size, 4; peptide, 10-mer	EAAGIGILTV	6TMO
32			ELAAIGILTV	4JFD
33			ELAGIGALTV	4JFE
34			ELAGIGILTV	4JFF
35		Category 6, T4H2; size, 3; peptide, 9-mer	ILDQVPFSV	6VMC
36			IMDQVPFSV	6VM9
37			ITDQVPFSV	6VMA
38		Category 7, 868; size, 3; peptide, 9-mer	SLFNTIAVL	5NMG
39			SLYNTIATL	5NMF
40			SLYNTVATL	5NME

Open in a new tab

The table includes seven categories of TCR types, each cocrystallized with distinct epitope peptides. For each category, the number of structures, the length of the recognized epitope, the epitope sequences, and the corresponding PDB IDs are provided.

HLA-A^*02-peptide-TCR dataset

To train and validate the discriminative model, we selected experimentally resolved pMHC–TCR crystal structures involving the HLA-A^*02 allele, which is the most prevalent class I allele among TCR–pMHC complexes available in the PDB (118 out of 358 entries; TCR3d database, version 20250626, https://tcr3d.ibbr.umd.edu). We restricted our selection to TCR categories with at least three structures, yielding a total of 40 complexes spanning seven TCR categories: A6 (n = 10), 1E6 (n = 9), DMF5 (n = 6), JM22 (n = 5), a24b17 (n = 4), and 868 and T4H2 (n = 3 each). All structures were downloaded from the PDB.

HLA-B^*27-peptide-TCR dataset

To assess model generalizability, we constructed a set of four pMHC–TCR complexes involving the HLA-B^*27 allele. Two structures with the B^*27:05 subtype (PDB IDs [29]: 7N2P and 7N2Q) were directly retrieved from the PDB. The remaining two complexes for the B^*27:09 subtype—which differs by a single residue (D116H) from B^*27:05—were modeled as follows: (i) the D116 residue in B^*27:05 was mutated to histidine using PyMOL; (ii) the resulting structure was subjected to a 100 ns molecular dynamics (MD) simulation to ensure conformational relaxation and structural stability assessed via RMSD analysis; and (iii) simulation snapshots were clustered by structural similarity, and the representative structure from the largest cluster was selected to represent the modeled B^*27:09–peptide–TCR complex.

Immunological fingerprint

The MaSIF framework demonstrated that molecular surfaces encode “fingerprints” composed of local geometric and physicochemical patterns that can reveal biomolecular recognition cues [26]. In its original MaSIF-ligand application—a proof-of-concept for classifying ligand-binding pockets—interaction fingerprints were defined over protein surface regions selected by applying distance cutoffs to cofactor atoms in holo structures.

In practical immunological settings, however, only high-resolution TCR-unbound pMHC structures are typically available [10]. To address this limitation, we introduce the concept of immunological fingerprints of pMHC interfaces, defined solely based on peptide proximity and independent of TCR-bound configurations [7, 30].

pMHC surface featurization

Given a pMHC structure in PDB format, surface fingerprints were generated by computing overlapping radial patches on the molecular surface and interpolating feature values onto each patch. Following the MaSIF preprocessing pipeline, we first protonated the structures and generated triangulated surface meshes at 1.0 Å resolution [31]. For each surface vertex, a radial patch was extracted with a geodesic radius of 12 Å. Within each patch, five feature channels were computed: two geometric [32] (shape index and distance-dependent curvature) and three physicochemical (Poisson–Boltzmann electrostatics [33, 34], hydropathy [35], and free electrons and proton donors [36]). Each patch was further embedded into a geodesic polar coordinate system, enabling rotation-invariant feature encoding with respect to the patch center.

Definition of interface regions

The pMHC interface was operationally defined as the set of surface patches centered within 4 Å of any peptide atom, reflecting the peptide-centric nature of TCR–pMHC recognition [30]. For example, in the structure with PDB ID: 1AO7 [37], 296 interface patches were extracted from a total of 5092 surface patches, forming the pool from which training samples were drawn. To further assess robustness, we also varied the sampling pool of surface patches by adjusting the distance cutoff, from 4 to 5 and 6 Å, respectively. Cross-validation on the HLA-A^*02 dataset showed stable predictive performance across these configurations (Fig. S1), suggesting that our model’s learned representations are not sensitive to the precise patch radius or sampling density.

Network architecture

Our discriminative model adopts the geometric deep learning architecture introduced in MaSIF-ligand, originally designed for ligand-binding pocket classification. The model adapts convolutional neural network (CNN) principles to operate on surface-based geodesic grids. Each input consists of 32 randomly sampled surface patches from a given pMHC interface. Each patch includes five surface feature channels. Within each channel, the patch is projected onto a soft polar grid (16 angular × 5 radial bins). Feature maps are processed by a geodesic convolutional layer with 80 filters, followed by rotational max pooling over 16 discrete orientations, ReLU activation, and a fully connected (FC) layer. Outputs from all five feature channels are concatenated to produce an 80-dimensional descriptor for each patch. These 32 80-dimensional descriptor vectors (each representing one patch) are used to compute an 80 × 80 fingerprint matrix by outer product aggregation. The matrix is flattened and passed through a 64-unit FC layer (with ReLU), followed by a 7-unit linear output layer. The model was trained using the Adam optimizer (learning rate = 1 × 10⁻⁴) and optimized via softmax cross-entropy loss.

Discriminative model training, evaluation, and explanation on the HLA-A^*02 dataset

All-test cross-validation: training and evaluation

To validate the ability of our model to predict TCR-binding preferences from pMHC immunological fingerprints, we trained an ensemble of 50 discriminative models using the HLA-A^*02–peptide–TCR dataset under an ‘All-test’ cross-validation scheme. In this setting, all structures in the dataset were eventually used for testing across the ensemble, and the final prediction was obtained by equally weighted integration across models.

The only variation among the 50 models was the specific partitioning of the 40 structures into training and test sets, following an approximate 70:30 split (27 for training, 13 for testing). Due to limited samples in some categories, no separate validation set was employed. We ensured that each split maintained balanced representation of all categories in both training and testing sets to avoid category-level mismatches.

For each model, training involved randomly sampling structures from each category according to a predefined distribution to form its training set. After 100 training epochs, the trained model was used to generate predictions on its corresponding test set. To reduce prediction variance, the pMHC interface of each test structure was sampled 100 times, and the resulting 100 seven-dimensional prediction vectors were averaged to obtain the final prediction.

We evaluated the model ensemble using two complementary metrics:

Prediction accuracy: Defined as the proportion of final predictions for which the category with the highest predicted probability matched the experimentally validated TCR specificity. Accuracy was computed at three levels: (i) dataset-level, across all 40 structures, (ii) category-level, within each TCR category, and (iii) structure-level, for each individual structure.

Discrimination confidence: The model’s discrimination confidence for each structure was defined as the predicted probability assigned to the ground-truth category, averaged over all final predictions for that structure.

Scoring analysis

To interpret high-confidence predictions, we performed a patch-based attribution analysis to quantify the contribution of individual interface patches to TCR discrimination and map this information onto the pMHC surface.

For each structure, we collected all prediction vectors made by the ensemble models. Among them, predictions within the top 10th percentile of confidence (i.e. predicted probability for the ground-truth category) were selected. The frequency with which each interface patch was sampled among these top predictions was normalized to define its discrimination score. These scores were then mapped onto the corresponding patch centers on the pMHC interface to visualize surface regions that most contributed to correct classification.

In addition, we quantified the positional importance of each peptide residue by aggregating scores from nearby patches. Specifically, patches centered within 4 Å of any atom of a given residue were considered associated with that residue (position). The discrimination scores of these patches were averaged and normalized, with the global mean score of all interface patches set to 1 as a reference baseline.

Model generalization and explainability on HLA-B^*27 structures

Training, inference, and interpretability

To test the generalizability of our model, we retrained a single discriminative model using all 40 HLA-A^*02–peptide–TCR structures. The model was trained for 200 epochs and used to perform inference on four HLA-B^*27–peptide–TCR complexes.

For each HLA-B^*27–peptide structure, 10 000 predictions were generated by repeatedly sampling 32 interface patches and feeding them into the model. Each prediction produced a seven-dimensional probability vector. The similarity score of a given structure toward a specific TCR category was defined as the mean predicted probability assigned to that category across all 10 000 predictions.

To interpret the inferred preference toward a specific category (e.g. category 6), we selected the top 10th percentile of predictions with the highest similarity scores for that category. Following the same attribution procedure as in the HLA-A^*02 dataset, we computed and normalized patch-level importance scores, which were further visualized on the pMHC interface.

Benchmarking experiments with state-of-the-art methods

To further substantiate the proposed surface-based framework, we conducted benchmarking comparisons on the HLA-A^*02 dataset against three state-of-the-art methods, including one structure-based approach (TCRen [17]) and two sequence-based pretrained models (TEINet [38] and TEIM-Seq [39]).

Benchmarking for all baseline methods was performed under a unified categorical TCR discrimination setting consistent with this study. For the structure-based method TCRen, experiments were performed under the identical “All-test” cross-validation protocol as our model, comprising 50 ensemble runs with the same training and hold-out PDB splits (training: n = 27; hold out: n = 13 per fold). In each fold, a TCRen statistical potential was derived from residue-level contact information in the training structures and applied to the corresponding hold-out structures to rank cognate TCRs among the seven categorical TCR types. For the sequence-based models TEINet and TEIM-Seq, author-recommended pretrained models were used for inference. Each nonredundant peptide (n = 32) was paired with each of the seven categorical CDR3β sequences, and predicted binding scores were used to rank the cognate TCR, thereby mirroring the same discrimination setting.

A complete reproducibility package for IMPRINT, together with separate benchmarking packages for each baseline method (TCRen, TEINet, and TEIM-Seq), including all scripts, reference inputs, and analysis workflows required to reproduce the reported results, is publicly available at https://github.com/Xiyougailv/IMPRINT/tree/main/data.

Results

A surface-based modeling framework captures discriminative TCR-binding preferences

Modeling framework: surface feature extraction and predictive architecture

Our study introduces a novel approach for analyzing and predicting the TCR-binding preferences of pMHCs based on their interface features. As illustrated in Fig. 1, the process by which TCRs ‘scan’ pMHCs and selectively bind to certain complexes is formulated as ‘immunological fingerprinting’ (IMPRINT). Specifically, we model the pMHC using a combination of physicochemical and geometric features focused on the interface region adjacent to the peptide, referred to as the immunological fingerprints. A multimodal discriminative model trained on experimentally determined TCR–pMHC structures was optimized to predict the TCR-binding preferences of pMHCs based on these interfacial immunological fingerprints. Model performance was assessed by comparing predicted binding preferences with experimentally validated TCR specificities. To facilitate model interpretability, interface regions contributing most significantly to successful discrimination were further analyzed, offering mechanistic insights into the determinants of TCR–pMHC recognition.

Alt text: Schematic illustration of a surface-based framework for TCR-pMHC recognition. The figure shows a pMHC molecular surface with an interface region near the peptide, extraction of surface-based geometric and physicochemical features, and a neural network model that predicts TCR-binding preferences while highlighting high-importance surface regions. — Surface-based discriminative modeling of TCR–pMHC recognition with IMPRINT. Overview of the surface-based conceptual framework for modeling TCR–pMHC recognition through interpretable discriminative analysis of interfacial fingerprints. (Middle) Schematic overview of TCR–pMHC recognition. The TCR is conceptualized as scanning the pMHC surface by sensing an immunological fingerprint—interface features that may indicate potential binding interactions. Illustration created with BioRender.com. (Top) Process for extracting the immunological fingerprint from the pMHC interface. The process begins by obtaining a pMHC structure, either directly or by isolating it from a TCR–pMHC complex. Next, the molecular surface of the pMHC is computed, and regions proximal to the peptide are defined as the interface. Finally, physicochemical and geometric features are interpolated on the surface points within these interface regions. (Bottom) Predictions of TCR-binding preferences generated by a discriminative model with built-in interpretability. Fingerprint fragments (‘patches’) are randomly sampled and collectively fed into a deep neural network for prediction. The correlation between individual prediction outputs and their corresponding patches highlights high-importance regions that contribute most to successful discrimination.

The generation of pMHC immunological fingerprints followed a surface featurization pipeline adapted from the MaSIF framework [26] (Fig. 2a; see Methods). This pipeline comprises four main steps: (i) triangulation of the pMHC surface, (ii) decomposition into radial patches, (iii) computation of point-wise physicochemical and geometric features, and (iv) contextual mapping of multimodal features within overlapping surface patches. For each native pMHC structure, surface points within 4 Å of any peptide atom were identified, and patches centered on these points were defined as interface patches (typically hundreds of such interface patches, see more in Methods). These feature-rich interface patches constitute ‘fragments’ segmented from an immunological fingerprint and are individually processed by deep neural networks (Fig. 2b, left).

Alt text: Diagram of a geometric deep learning pipeline for modeling pMHC interfaces. The figure illustrates surface mesh construction, extraction of local surface patches, calculation of geometric and physicochemical features, and stochastic sampling of interface patches that are aggregated to generate ensemble predictions of TCR-binding preferences. — Geometric deep learning pipeline for pMHC interface modeling. Illustration of the technical pipeline for pMHC surface featurization and ensemble modeling using geometric deep learning with stochastic patch sampling. (a) Surface featurization pipeline for a pMHC extracted from a TCR–pMHC complex. Four main steps are involved: (i) the pMHC surface is triangulated into a discrete mesh; (ii) around each mesh vertex, a radial patch is extracted with geodesic radius of r = 12 Å; (iii) within each patch, two geometric and three physicochemical features are calculated and interpolated onto the surface points; (iv) geodesic polar coordinates are mapped to represent the relative spatial positions of features within the patch. (b) the model architecture supports interpretable predictions through a sampling-based stochastic modeling scheme. For each pMHC, surface patches centered within 4 Å of any peptide atom are defined as interface patches (typically hundreds of such interface patches, see more in Methods). From these, 32 patches are randomly selected and fed into a geometric deep network trained to predict TCR-binding preferences. To improve robustness and account for the variability introduced by sampling, each pMHC interface is sampled 100 times, resulting in 100 combinations of 32 patches and 100 corresponding prediction vectors. The final output is derived by aggregating these predictions via either averaging or majority voting.

We implemented a geometric deep learning model based on a geodesic convolutional neural network architecture to identify discriminative patterns within the immunological fingerprint of pMHC interfaces (Fig. 2b). To improve prediction robustness and enhance interpretability, we adopted a patch-sampling-based stochastic learning framework. For each pMHC, a fixed number of interface patches, 32 in our current setup, were randomly sampled and input into the network, where they were encoded into vectorized descriptors. These descriptors were assembled into a fingerprint matrix and passed through discriminative layers to produce a prediction vector. To account for the stochasticity of patch selection, each pMHC was sampled 100 times, resulting in 100 prediction vectors. The final prediction was obtained by aggregating these outputs using either vector averaging or majority voting.

Cross-validation confirms model’s capacity for T cell receptor discrimination

Our deep discriminative model demonstrated robust performance in predicting the TCR-binding preferences of pMHC complexes. We evaluated the model’s predictive capability using human leukocyte antigens HLA-A^*02 as our showcase, which is the most prevalent and polymorphic MHC allele family in humans. A curated dataset of TCR–pMHC structures with HLA-A^*02 allele was constructed thanks to its prevalence in experimentally resolved complexes (though not enough still; more below). The supervised dataset was built by selecting TCR types with at least three available structures from PDB, yielding 40 HLA-A^*02–peptide–TCR complexes spanning seven TCR categories (Table 1; see Methods). To ensure reliable evaluation, we implemented an iterative cross-validation strategy termed “All-test” (Fig. 3a; see Methods). In this approach, we trained an ensemble of 50 models, each on a randomized subset of the dataset, while testing on the held-out structures. The structural distribution across TCR categories was preserved within both training and test sets for every split, ensuring balanced representation (27 training structures, 13 test structures per iteration).

Alt text: Performance evaluation of a TCR-pMHC discrimination model using ensemble cross-validation. The figure includes category-wise sample distributions, classification accuracy, and confusion patterns across TCR categories, and confidence scores for individual structures derived from aggregated ensemble predictions. — Cross-validation of predictive accuracy and confidence on HLA-A^*02 structures. Performance evaluation using ensemble cross-validation across seven TCR categories (see Table 1) reveals robust classification accuracy and interpretable confidence scores. (a) Sample size distribution across seven TCR categories in the curated dataset. Each category was split ~7:3 into training and test structures (27 and 13 structures per iteration). In the “All-test” cross-validation strategy, each of the 50 ensemble models was trained on randomly sampled training sets respecting this distribution, and evaluated on the corresponding held-out structures. (b) Discrimination accuracy and confusion matrix analysis. Accuracy is defined as the proportion of correct predictions among all predictions for structures within each category. Confusion values represent the average probability that a structure from a given category is misclassified into another category. (c) Discrimination confidence for each structure across the 40 complexes. For each structure, the final confidence score is computed by averaging the prediction vectors from all ensemble models that treated it as a test instance, and extracting the probability assigned to its ground-truth TCR category in the averaged vector.

The model achieved an average discrimination accuracy of 0.80 across the 40 complexes, markedly exceeding the expected random baseline of 0.14. Confusion matrix analysis revealed differential prediction performance across TCR categories (Fig. 3b). Three categories—1E6, A6, and T4H2—exhibited superior distinguishability, with accuracies around 0.90, likely due to richer reference information provided by their relatively larger sample sizes. Conversely, performance was lower for categories such as DMF5 and 868, with DMF5 reaching only 0.45 accuracy, suggesting reduced learnability from sparse training data, not too surprisingly.

We further examined the model’s discrimination confidence for each of the 40 structures, reflecting the degree of certainty associated with each categorization decision. The results (Fig. 3c) showed distinct confidence distributions across structures, with 28 structures scoring above 0.70. Notably, nearly all 1E6 structures achieved confidence scores above 0.90. These consistently high scores—despite the randomized combinations of training structures used by different ensemble models—suggest a strong and conserved pattern in the pMHC interface for this TCR category. In contrast, structures with lower confidence likely exhibit more unique or less conserved interfacial characteristics.

Collectively, these results demonstrate the model’s ability to capture informative patterns within the immunological fingerprint for TCR-binding discrimination, even under structural variability and limited dataset size. The observed robustness establishes a foundation for detailed mechanistic investigations of TCR–pMHC recognition paradigms.

Patch-level model interpretation reveals mechanistic insights into T cell receptor specificity

To elucidate immunologically meaningful mechanisms underlying the model’s discriminative decisions, we implemented a comprehensive patch-based analysis framework to interpret ensemble predictions over all 40 HLA-A^*02–peptide–TCR complexes (see Table 1). For each complex, pMHC interface patches were scored based on their contribution to high-confidence predictions (see Methods). Following normalization, these discrimination scores were spatially mapped onto the molecular surface at the centers of the corresponding interface patches (Fig. S2). Additionally, patches were hierarchically clustered according to their spatial proximity to individual peptide residues, and the mean normalized score within each cluster was assigned to the corresponding peptide position.

Figure 4 illustrates how discrimination scores can be leveraged to interpret TCR–pMHC binding specificity. As a representative example, we focused on the 1E6 TCR, which achieved the highest discrimination accuracy among the seven categories. Across all nine intracategory 1E6 structures, the positional importance profiles consistently exhibited elevated scores at peptide positions 4–6 (Fig. 4a). Notably, these positions align with a conserved “GPD” motif shared by the peptides in this category (Fig. 4b), suggesting a recurrent structural feature underpinning 1E6 recognition. To further investigate this pattern, we examined a representative 1E6 complex (PDB ID: 3UTS [40]), where the projected discrimination scores revealed a concentrated high-importance region surrounding the central segment of the peptide (Fig. 4c). Structural inspection showed that this region corresponds to a topological ‘bulge’ formed by the GPD residues—particularly Pro5—resulting in a structural protrusion that facilitates complementary interactions with an ‘aromatic cap’ formed by TCR residues Tyr97α and Trp97β (Fig. 4d).

Alt text: Visualization of patch-level interpretability for a specific TCR category. The figure shows discrimination score profiles along peptide positions, sequence logos of recognized peptides, surface maps highlighting high-importance interface regions, and structural views of key TCR-peptide interactions contributing to binding specificity. — Patch-level interpretability highlights interaction motifs in 1E6 TCR binding. Patch-based importance analysis reveals conserved peptide positions and structural features contributing to 1E6 TCR specificity. (a) Discrimination score profiles mapped along peptide positions for nine HLA-A^*02–peptide–TCR structures assigned to the 1E6 category. A reference score of 1.0 indicates the average contribution across all interface patches. (b) Sequence logos of the nine peptides recognized by 1E6, highlighting conserved and variable positions across the category. Created using Seq2Logo [30]. (c) Visualization of discrimination scores mapped onto the molecular surface of the pMHC complex (PDB ID: 3UTS). High-scoring regions indicate strong contributions to model discrimination, whereas low-scoring regions indicate minimal contribution. (d) Structural depiction of key interactions between the 1E6 TCR and the peptide. Tyr97α and Trp97β engage the GPD motif centered around Pro5. Remaining peptide, MHC, and TCR regions are shown as surface or cartoon representations to provide structural context.

Together, these analyses demonstrate that our patch-based interpretability framework effectively resolves the structural basis of TCR–pMHC specificity, providing a mechanistic lens into the molecular logic of immune recognition. This capacity not only reinforces the biological credibility of the model’s predictions but also supports its broader application to uncovering recognition principles across diverse immunogenetic contexts.

Model generalization across HLA alleles demonstrates robust inference capability

Building on the model’s validated ability to detect discriminative patterns within HLA-A^*02 pMHC interfaces, we next evaluated its capacity to generalize across HLA alleles by resolving subtle structural differences in novel pMHC inputs. Figure 5 presents a case study employing zero-shot inference to gain mechanistic insights into TCR–pMHC recognition modulated by allelic polymorphism. The example involves four HLA-B^*27–peptide–TCR complexes (Table S1; see Methods), which are immunologically relevant to inflammatory disorders such as ankylosing spondylitis (AS) [29]. A key distinction lies in a single-residue polymorphism (D116H) between the disease-associated allele HLA-B^*27:05 and the nondisease-associated HLA-B^*27:09, located at the base of the peptide-binding groove adjacent to peptide position 9 (P9) (Fig. 5a). Notably, the AS-derived TCR AS4.3 cross-reactively recognizes both a self-peptide (GQVMVVAPR, self-GQV) and a bacterial peptide (LRVMLAPF, bacterial-LRV) when presented by B^*27:05, whereas only bacterial-LRV is recognized when presented by B^*27:09 [29].

Alt text: Structural and surface-based analysis of TCR recognition across two HLA-B*27 alleles. The figure compares peptide-bound structures with a single MHC residue substitution, shows model inference results and surface feature distributions, and illustrates residue-level contact networks highlighting allele-dependent interactions. — Model generalization enables interpretation of cross-reactivity in HLA-B^*27. Zero-shot inference on HLA-B^*27 complexes uncovers discriminative and mechanistically interpretable features of allele-dependent TCR recognition. (a) Structural alignment of self-derived GQV peptide (left) and bacterial LRV peptide (right), each bound to the two functionally distinct HLA-B^*27 alleles. The corresponding single amino acid substitution (D116H) in the MHC binding groove is shown in stick representation for both alleles. Peptides are shown in cartoon representation, with position 9—adjacent to the substituted residue—displayed in stick representation. (b) Inference results for the HLA-B^*27–peptide interfaces, evaluated using the discriminative model retrained on all 40 HLA-A^*02–peptide–TCR complexes. (c) (Left) Visualization of patch-level differentiation scores with respect to category 6, overlaid on the B^*27:09–GQV interface. (Right) Spatial distribution of the surface charge feature, highlighting a distinctive positively charged interface region adjacent to peptide position 9. (d) Residue-level contact networks illustrating interactions between HLA residue 116 and peptide position 9 across four HLA-B^*27 pMHC structures. Interface regions around peptide P9 are annotated according to charge characteristics. From left to right: (first row) B^*27:05–GQV(self), B^*27:05–LRV(bacteria), (second row) B^*27:09–GQV(self), B^*27:09–LRV(bacteria).

We then applied our model to analyze these HLA-B^*27 pMHC interfaces using inference from a discriminative model retrained on all 40 HLA-A^*02–peptide–TCR complexes (Fig. 5b; see Methods). The prediction results showed strong intra-allelic consistency for B^*27:05, with category 6 similarity scores of 0.63 (self-GQV) and 0.83 (bacterial-LRV), followed by weaker contributions from category 5 (0.19 and 0.12, respectively). Notably, for the self-GQV peptide presented by the protective allele B^*27:09—differing from B^*27:05 by a single residue D116H—we observed a pronounced shift in the similarity profile: category 6 dropped from 0.63 to 0.21, while category 5 increased from 0.19 to 0.59. This shift suggests a categorically distinct interfacial character. In contrast, bacterial-LRV presentations remained consistent across alleles, indicating a preserved interfacial fingerprint for this pathogen-derived peptide.

To interpret these results, we performed a detailed structural analysis across the four HLA-B^*27–peptide complexes to investigate the origin of the categorical shift uniquely observed in B^*27:09–GQV. Patch-level attribution analysis (Fig. 5c, Fig. S3) identified the interface region near peptide position 9 (P9)—adjacent to the polymorphic MHC residue 116—as the most influential in driving category 6 inference across all complexes. Further feature analysis revealed that electrostatic potential was the most distinguishing property in this region. Residue-level contact network analysis (Fig. 5d) subsequently revealed that in B^*27:09–GQV, the positively charged Arg at P9 forms a sterically and electrostatically unfavorable interaction with His116, which replaces the negatively charged Asp116 found in B^*27:05. This local incompatibility contrasts sharply with the other three complexes, where either charge complementarity (Arg–Asp in B^*27:05–GQV) or neutral packing (Phe–Asp in B^*27:05–LRV; Phe–His in B^*27:09–LRV) facilitates favorable interfaces.

Overall, these results highlight the model’s sensitivity to fine-grained geometric and physicochemical perturbations at the pMHC interface and underscore its potential as a structural probe for allele- and peptide-specific TCR recognition.

Benchmarking against state-of-the-art methods validates model superiority

Finally, to further substantiate our surface-based framework, we conducted direct, side-by-side comparisons with three state-of-the-art methods. We first evaluated the structure-based approach TCRen [17], originally designed to rank epitopes for a given TCR. Although both methods aim to characterize TCR–pMHC recognition, they differ conceptually: TCRen learns pairwise TCR–peptide contact potentials from full TCR–pMHC complexes, whereas our model derives multimodal geometric and physicochemical fingerprints solely from the pMHC interface, without requiring any TCR structural input. These distinctions indicate that the two frameworks encode complementary biological information. For a fair comparison, we reimplemented TCRen on our curated HLA-A^*02 dataset to rank cognate TCRs among seven categorical TCR types. TCRen achieved competitive ranking performance (Fig. S4) and produced coherent residue-level importance maps (Fig. S5). Notably, both approaches consistently highlighted biologically meaningful regions such as central peptide positions, while also revealing complementary patterns—contact-centric versus surface-context signals—underscoring their distinct yet mutually informative perspectives.

We next evaluated two sequence-based pretrained models, TEINet [38] and TEIM-Seq [39], on the same TCR-discrimination task. Both models were pretrained on large-scale CDR3β–epitope sequence pairs to learn binding-associated sequence motifs. TEINet achieved Top-1 and Top-3 ranking accuracies of 0.35 and 0.78, respectively (Fig. S6), whereas TEIM-Seq achieved 0.48 and 0.75 (Fig. S7). Despite their strong sequence-level discrimination, our surface-based model substantially outperformed both methods, achieving 0.80 Top-1 discrimination accuracy under the identical evaluation setting.

Collectively, these comparisons show that while sequence-only models benefit from large-scale pretraining and TCRen captures detailed contact-level information, our framework delivers superior performance by leveraging multimodal surface representations that encode interfacial geometry, physicochemical context, and cross-residue organization. These findings reinforce the value of surface-based structural modeling for understanding and predicting TCR–pMHC recognition.

Discussion

The pMHC immunological fingerprints encapsulate the fundamental molecular determinants governing TCR recognition, thereby offering valuable mechanistic insights into the structural basis of immune recognition. Our current computational framework IMPRINT deconvolutes pMHC interface fingerprints and represents a proof-of-concept approach for the systematic dissection of TCR–pMHC molecular interaction principles. By introducing a deep learning discriminative model that leverages high-resolution structural data, we have developed a robust computational pipeline to correlate specific pMHC interface features with TCR-binding propensities. The model’s architecture, which incorporates multimodal feature extraction and pattern recognition algorithms, demonstrates particular utility in delineating the molecular determinants that influence TCR–pMHC cross-reactivity profiles. This model holds considerable promise for integration into structural biology pipelines, where it may serve as a discriminative filter to prioritize immunologically relevant pMHC surfaces generated by structure prediction methods [17, 42–45].

Although the model demonstrated substantial accuracy in predicting TCR-binding preferences within HLA-A^*02-peptide-TCR complexes, the most prevalent family—achieving classification accuracies ranging from 0.45 to 0.91 across different TCR categories—the uneven distribution of structural examples across TCR categories introduces potential sources of systematic bias in performance metrics. This distributional asymmetry raises concerns regarding potential overfitting, particularly within sparsely populated categories. Our systematic decomposition of pMHC interface patch significance revealed that predictive success was predominantly determined by holistic interfacial characteristics rather than discrete regional determinants. A particularly noteworthy observation is that while peptide regions contribute to the prediction, they do not ‘dominate’ the decision as one might have anticipated. This empirical finding underscores the need for comprehensive incorporation of pMHC interface in future predictive models, wherein both peptide and MHC components are considered to reflect the intricate molecular interplay that characterizes these immunological interfaces [38, 46–49].

The successful application of our discriminative model to unseen HLA-B^*27–peptide interfaces demonstrates its generalizability beyond the training HLA-A^*02 allele and its capacity to resolve subtle interfacial variations across diverse immunogenetic contexts. Notably, the model effectively distinguished between homologous pMHC complexes differing by a single MHC residue (D116H from HLA-B^*27:05 to B^*27:09), producing category-level shifts in predicted similarity scores that aligned with established TCR recognition outcomes. These results highlight the model’s ability to resolve fine-grained, mechanistically interpretable distinctions across pMHC interfaces, enabling accurate prediction and explanation of TCR recognition outcomes in complex immunogenetic contexts. By systematically incorporating multimodal features of the pMHC surface—including shape, charge, and hydropathy—our framework enables not only biologically meaningful prediction but also mechanistic interpretation. Together, these capabilities position the model as a structurally grounded tool for decoding immune recognition and a versatile component in next-generation pipelines for TCR engineering, antigen design, and precision immunotherapy [50–53].

In addition to these demonstrated capabilities, the framework offers several promising avenues for future development. First, incorporating TCR-side surface fingerprints—such as CDR loop geometries and electrostatic complementarity—could enable direct modeling of receptor–ligand complementarity, thereby extending the current surface-based paradigm toward bidirectional recognition analysis. Second, as high-quality cross-allele structural data become increasingly available, adopting cross-allele training paradigms may enhance generalization across diverse MHC contexts and enable polymorphism-aware prediction [17]. Finally, integrating higher-resolution structural descriptors and dynamic sampling schemes could provide a more detailed and physically grounded understanding of immune recognition principles [54–57]. Together, these extensions are expected to broaden both the biological scope and the translational potential of surface-based immunological fingerprinting.

Key Points

We introduce a surface-based geometric deep learning model that learns multimodal immunological fingerprints from peptide–MHC interfaces to predict TCR–pMHC interactions.
The model achieves high predictive accuracy on HLA-A^*02–restricted pMHCs and generalizes to unseen HLA-B^*27 alleles, uncovering key interfacial determinants of immune specificity and resolving subtle polymorphisms across diverse immunogenetic contexts.
This interpretable and scalable approach enables structural decoding of TCR recognition, with potential applications in antigen design, vaccine development, and TCR-based immunotherapies.

Supplementary Material

IMPRINT_SI_final_bbag048

imprint_si_final_bbag048.docx^{(25.4MB, docx)}

supplementary_materials_bbag048

supplementary_materials_bbag048.docx^{(25.4MB, docx)}

Acknowledgements

We thank Qinglu Zhong and Hong Zhou for helpful discussions.

Contributor Information

Chun Shang, College of Physics, College of Life Sciences, and Institute of Quantitative Biology, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China; Shanghai Institute for Advanced Study, Zhejiang University, 799 Dangui Road, Shanghai 201203, China.

Kevin C Chan, Department of Biosciences and Bioinformatics, School of Science, Xi’an Jiaotong–Liverpool University, 111 Ren’ai Road, Suzhou 215123, China.

Ruhong Zhou, College of Physics, College of Life Sciences, and Institute of Quantitative Biology, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China; Shanghai Institute for Advanced Study, Zhejiang University, 799 Dangui Road, Shanghai 201203, China; The First Affiliated Hospital, School of Medicine, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China; Department of Chemistry, Columbia University, New York, NY 10027, United States.

Author contributions

Ruhong Zhou (Conceived the idea and designed the research, Analyzed the data and wrote the manuscript), Chun Shang (Developed the method and wrote the software for immunological fingerprinting, Performed structure modeling and molecular dynamics simulations of the pMHC complexes, Analyzed the data and wrote the manuscript), and Kevin C. Chan (Analyzed the data and wrote the manuscript). All authors participated in discussions and revisions of the manuscript.

Funding

This work was partially supported by the National Key R&D Program of China (2024YFA1306400, 2021YFA1201200, 2024YFA1307500), the National Natural Science Foundation of China (U1967217), the National Center of Technology Innovation for Biopharmaceuticals (NCTIB2022HS02010), Shanghai Artificial Intelligence Lab (P22KN00272), the National Independent Innovation Demonstration Zone Shanghai Zhangjiang Major Projects (ZJZX2020014), the Starry Night Science Fund of Zhejiang University Shanghai Institute for Advanced Study (SN-ZJU-SIAS-003), and Zhejiang University Global Partnership Fund (188170 + 194452505).

Conflicts of interest

None declared.

Data availability

Source code is available at https://github.com/Xiyougailv/IMPRINT.

References

1. Davis MM, Bjorkman PJ. T-cell antigen receptor genes and T-cell recognition. Nature 1988;335:744. 10.1038/335744b0 [DOI] [PubMed] [Google Scholar]
2. van der Merwe PA, Dushek O. Mechanisms for T cell receptor triggering. Nat Rev Immunol 2011;11:47–55. 10.1038/nri2887 [DOI] [PubMed] [Google Scholar]
3. Garcia KC, Teyton L, Wilson IA. Structural basis of T cell recognition. Annu Rev Immunol 1999;17:369–97. [DOI] [PubMed] [Google Scholar]
4. Garcia KC, Adams EJ. How the T cell receptor sees antigen—a structural view. Cell 2005;122:333–6. [DOI] [PubMed] [Google Scholar]
5. Rudolph MG, Stanfield RL, Wilson IA. How TCRS bind MHCS, peptides, and coreceptors. Annu Rev Immunol 2006;24:419–66. [DOI] [PubMed] [Google Scholar]
6. Krogsgaard M, Davis MM. How T cells ‘see’ antigen. Nat Immunol 2005;6:239–45. [DOI] [PubMed] [Google Scholar]
7. Rudolph MG, Wilson IA. The specificity of TCR/pMHC interaction. Curr Opin Immunol 2002;14:52–65. [DOI] [PubMed] [Google Scholar]
8. Singh NK, Riley TP, Baker SCB et al. Emerging concepts in TCR specificity: rationalizing and (maybe) predicting outcomes. J Immunol 2017;199:2203–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Sewell AK. Why must T cells be cross-reactive? Nat Rev Immunol 2012;12:669–77. 10.1038/nri3279 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. La Gruta NL, Gras S, Daley SR et al. Understanding the drivers of MHC restriction of T cell receptors. Nat Rev Immunol 2018;18:467–78. 10.1038/s41577-018-0007-5 [DOI] [PubMed] [Google Scholar]
11. Szeto C, Lobos CA, Nguyen AT et al. TCR recognition of peptide–MHC-I: rule makers and breakers. Int J Mol Sci 2020;22:68. 10.3390/ijms22010068 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Berman HM, Westbrook J, Feng Z et al. The Protein Data Bank. Nucleic Acids Res 2000;28:235–42. 10.1093/nar/28.1.235 [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Arstila TP, Casrouge A, Baron V et al. A direct estimate of the human αβ T cell receptor diversity. Science 1999;286:958–61. [DOI] [PubMed] [Google Scholar]
14. Birnbaum ME, Mendoza JL, Sethi DK et al. Deconstructing the peptide-MHC specificity of T cell recognition. Cell 2014;157:1073–87. 10.1016/j.cell.2014.03.047 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Adams JJ, Narayanan S, Birnbaum ME et al. Structural interplay between germline interactions and adaptive recognition determines the bandwidth of TCR-peptide-MHC cross-reactivity. Nat Immunol 2016;17:87–94. 10.1038/ni.3310 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Cole DK, Bulek AM, Dolton G et al. Hotspot autoimmune T cell receptor binding underlies pathogen and insulin peptide cross-reactivity. J Clin Invest 2016;126:2191–204. 10.1172/JCI85679 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Karnaukhov VK, Shcherbinin DS, Chugunov AO et al. Structure-based prediction of T cell receptor recognition of unseen epitopes using TCRen. Nat Comput Sci 2024;4:510–21. 10.1038/s43588-024-00653-0 [DOI] [PubMed] [Google Scholar]
18. Adams JJ, Narayanan S, Liu B et al. T cell receptor signaling is limited by docking geometry to peptide-major histocompatibility complex. Immunity 2011;35:681–93. 10.1016/j.immuni.2011.09.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Antunes DA, Rigo MM, Freitas MV et al. Interpreting T-cell cross-reactivity through structure: implications for TCR-based cancer immunotherapy. Front Immunol 2017;8:1210. 10.3389/fimmu.2017.01210 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Lee CH, Salio M, Napolitani G et al. Predicting cross-reactivity and antigen specificity of T cell receptors. Front Immunol 2020;11:565096. 10.3389/fimmu.2020.565096 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Ghoreyshi ZS, George JT. Quantitative approaches for decoding the specificity of the human T cell repertoire. Front Immunol 2023;14:1228873. 10.3389/fimmu.2023.1228873. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Richards FM. Areas, volumes, packing, and protein structure. Annu Rev Biophys Bioeng 1977;6:151–76. [DOI] [PubMed] [Google Scholar]
23. Duhovny D, Nussinov R, Wolfson HJ. Efficient unbound docking of rigid molecules. In: Guigó R, Gusfield D, (eds.), Algorithms in Bioinformatics, Vol. 2452. Berlin, Heidelberg: Springer, 2002, 185–200. [Google Scholar]
24. Bronstein MM, Bruna J, LeCun Y et al. Geometric deep learning: going beyond Euclidean data. IEEE Signal Process Mag 2017;34:18–42. [Google Scholar]
25. Atz K, Grisoni F, Schneider G. Geometric deep learning on molecular representations. Nat Mach Intell 2021;3:1023–32. [Google Scholar]
26. Gainza P, Sverrisson F, Monti F et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat Methods 2020;17:184–92. 10.1038/s41592-019-0666-6 [DOI] [PubMed] [Google Scholar]
27. Gainza P, Wehrle S, van Hall-Beauvais A et al. De novo design of protein interactions with learned surface fingerprints. Nature 2023;617:176–84. 10.1038/s41586-023-05993-x [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Marchand A, Buckley S, Schneuing A et al. Targeting protein–ligand neosurfaces with a generalizable deep learning tool. Nature 2025;639:522–31. 10.1038/s41586-024-08435-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Yang X, Garner LI, Zvyagin IV et al. Autoimmunity-associated T cell receptors recognize HLA-B^*27-bound peptides. Nature 2022;612:771–7. 10.1038/s41586-022-05501-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Felix NJ, Allen PM. Specificity of T-cell alloreactivity. Nat Rev Immunol 2007;7:942–53. [DOI] [PubMed] [Google Scholar]
31. Sanner MF, Olson AJ, Spehner J-C. Reduced surface: an efficient way to compute molecular surfaces. Biopolymers 1996;38:305–20. [DOI] [PubMed] [Google Scholar]
32. Yin S, Proctor EA, Lugovskoy AA et al. Fast screening of protein surfaces using geometric invariant fingerprints. Proc Natl Acad Sci 2009;106:16622–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Dolinsky TJ, Czodrowski P, Li H et al. PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res 2007;35:W522–5. 10.1093/nar/gkm276 [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Baker NA, Sept D, Joseph S et al. Electrostatics of nanosystems: application to microtubules and the ribosome. Proc Natl Acad Sci 2001;98:10037–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982;157:105–32. [DOI] [PubMed] [Google Scholar]
36. Kortemme T, Morozov AV, Baker D. An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein–protein complexes. J Mol Biol 2003;326:1239–59. [DOI] [PubMed] [Google Scholar]
37. Garboczi DN, Ghosh P, Utz U et al. Structure of the complex between human T-cell receptor, viral peptide and HLA-A2. Nature 1996;384:134–41. [DOI] [PubMed] [Google Scholar]
38. Jiang Y, Huo M, Cheng Li S. TEINet: a deep learning framework for prediction of TCR–epitope binding specificity. Brief Bioinform 2023;24:bbad086. 10.1093/bib/bbad086 [DOI] [PubMed] [Google Scholar]
39. Peng X, Lei Y, Feng P et al. Characterizing the interaction conformation between T-cell receptors and epitopes with deep learning. Nat Mach Intell 2023;5:395–407. [Google Scholar]
40. Bulek AM, Cole DK, Skowera A et al. Structural basis for the killing of human beta cells by CD8+ T cells in type 1 diabetes. Nat Immunol 2012;13:283–9. 10.1038/ni.2206 [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Thomsen MCF, Nielsen M. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res 2012;40:W281–7. 10.1093/nar/gks469 [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Mikhaylov V, Brambley CA, Keller GLJ et al. Accurate modeling of peptide-MHC structures with AlphaFold. Structure 2024;32:228–241.e4. 10.1016/j.str.2023.11.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
43. Marzella DF, Parizi FM, van Tilborg D et al. PANDORA: a fast, anchor-restrained modelling protocol for peptide: MHC complexes. Front Immunol 2022;13:878762. 10.3389/fimmu.2022.878762 [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Motmaen A, Dauparas J, Baek M et al. Peptide-binding specificity prediction using fine-tuned protein structure prediction networks. Proc Natl Acad Sci 2023;120:e2216697120. 10.1073/pnas.2216697120 [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Evans R, O’Neill M, Pritzel PA et al. Protein complex prediction with AlphaFold-Multimer. 2021. 10.1101/2021.10.04.463034 [DOI]
46. Blevins SJ, Pierce BG, Singh NK et al. How structural adaptability exists alongside HLA-A2 bias in the human αβ TCR repertoire. Proc Natl Acad Sci 2016;113:E1276–E1285. 10.1073/pnas.1522069113 [DOI] [PMC free article] [PubMed] [Google Scholar]
47. Fodor J, Riley BT, Borg NA et al. Previously hidden dynamics at the TCR–peptide–MHC interface revealed. J Immunol 2018;200:4134–45. [DOI] [PubMed] [Google Scholar]
48. McMaster B, Thorpe CJ, Rossjohn J et al. Quantifying conformational changes in the TCR:pMHC-I binding interface. Front Immunol 2024;15:1491656. 10.3389/fimmu.2024.1491656 [DOI] [PMC free article] [PubMed] [Google Scholar]
49. Wu LC, Tuot DS, Lyons DS et al. Two-step binding mechanism for T-cell receptor recognition of peptide–MHC. Nature 2002;418:552–6. [DOI] [PubMed] [Google Scholar]
50. Wang Y, Singh NK, Spear TT et al. How an alloreactive T-cell receptor achieves peptide and MHC specificity. Proc Natl Acad Sci U S A 2017;114:E4792–801. [DOI] [PMC free article] [PubMed] [Google Scholar]
51. Chen J, Zhao B, Lin S et al. TEPCAM: prediction of T-cell receptor–epitope binding specificity via interpretable deep learning. Protein Sci 2024;33:e4841. 10.1002/pro.4841 [DOI] [PMC free article] [PubMed] [Google Scholar]
52. T RR, Demerdash ONA, Smith JC. TCR-H: explainable machine learning prediction of T-cell receptor epitope binding on unseen datasets. Front Immunol 2024;15:1426173. 10.3389/fimmu.2024.1426173 [DOI] [PMC free article] [PubMed] [Google Scholar]
53. Pang Z, Lu MM, Zhang Y et al. Neoantigen-targeted TCR-engineered T cell immunotherapy: current advances and challenges. Biomark Res 2023;11:104. 10.1186/s40364-023-00534-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
54. Klebanoff CA, Chandran SS, Baker BM et al. T cell receptor therapeutics: immunological targeting of the intracellular cancer proteome. Nat Rev Drug Discov 2023;22:996–1017. 10.1038/s41573-023-00809-z [DOI] [PMC free article] [PubMed] [Google Scholar]
55. Ott PA, Hu Z, Keskin DB et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature 2017;547:217–21. 10.1038/nature22991 [DOI] [PMC free article] [PubMed] [Google Scholar]
56. Zhou Z, Chen J, Lin S et al. GRATCR: epitope-specific T cell receptor sequence generation with data-efficient pre-trained models. IEEE J Biomed Health Inform 2025;29:2271–83. 10.1109/JBHI.2024.3514089 [DOI] [PubMed] [Google Scholar]
57. Chu Y, Zhang Y, Wang Q et al. A transformer-based model to predict peptide–HLA class I binding and optimize mutated peptides for vaccine design. Nat Mach Intell 2022;4:300–11. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

IMPRINT_SI_final_bbag048

imprint_si_final_bbag048.docx^{(25.4MB, docx)}

supplementary_materials_bbag048

supplementary_materials_bbag048.docx^{(25.4MB, docx)}

Data Availability Statement

Source code is available at https://github.com/Xiyougailv/IMPRINT.

[ref1] 1. Davis MM, Bjorkman PJ. T-cell antigen receptor genes and T-cell recognition. Nature 1988;335:744. 10.1038/335744b0 [DOI] [PubMed] [Google Scholar]

[ref2] 2. van der Merwe PA, Dushek O. Mechanisms for T cell receptor triggering. Nat Rev Immunol 2011;11:47–55. 10.1038/nri2887 [DOI] [PubMed] [Google Scholar]

[ref3] 3. Garcia KC, Teyton L, Wilson IA. Structural basis of T cell recognition. Annu Rev Immunol 1999;17:369–97. [DOI] [PubMed] [Google Scholar]

[ref4] 4. Garcia KC, Adams EJ. How the T cell receptor sees antigen—a structural view. Cell 2005;122:333–6. [DOI] [PubMed] [Google Scholar]

[ref5] 5. Rudolph MG, Stanfield RL, Wilson IA. How TCRS bind MHCS, peptides, and coreceptors. Annu Rev Immunol 2006;24:419–66. [DOI] [PubMed] [Google Scholar]

[ref6] 6. Krogsgaard M, Davis MM. How T cells ‘see’ antigen. Nat Immunol 2005;6:239–45. [DOI] [PubMed] [Google Scholar]

[ref7] 7. Rudolph MG, Wilson IA. The specificity of TCR/pMHC interaction. Curr Opin Immunol 2002;14:52–65. [DOI] [PubMed] [Google Scholar]

[ref8] 8. Singh NK, Riley TP, Baker SCB et al. Emerging concepts in TCR specificity: rationalizing and (maybe) predicting outcomes. J Immunol 2017;199:2203–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] 9. Sewell AK. Why must T cells be cross-reactive? Nat Rev Immunol 2012;12:669–77. 10.1038/nri3279 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] 10. La Gruta NL, Gras S, Daley SR et al. Understanding the drivers of MHC restriction of T cell receptors. Nat Rev Immunol 2018;18:467–78. 10.1038/s41577-018-0007-5 [DOI] [PubMed] [Google Scholar]

[ref11] 11. Szeto C, Lobos CA, Nguyen AT et al. TCR recognition of peptide–MHC-I: rule makers and breakers. Int J Mol Sci 2020;22:68. 10.3390/ijms22010068 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref12] 12. Berman HM, Westbrook J, Feng Z et al. The Protein Data Bank. Nucleic Acids Res 2000;28:235–42. 10.1093/nar/28.1.235 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] 13. Arstila TP, Casrouge A, Baron V et al. A direct estimate of the human αβ T cell receptor diversity. Science 1999;286:958–61. [DOI] [PubMed] [Google Scholar]

[ref14] 14. Birnbaum ME, Mendoza JL, Sethi DK et al. Deconstructing the peptide-MHC specificity of T cell recognition. Cell 2014;157:1073–87. 10.1016/j.cell.2014.03.047 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] 15. Adams JJ, Narayanan S, Birnbaum ME et al. Structural interplay between germline interactions and adaptive recognition determines the bandwidth of TCR-peptide-MHC cross-reactivity. Nat Immunol 2016;17:87–94. 10.1038/ni.3310 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] 16. Cole DK, Bulek AM, Dolton G et al. Hotspot autoimmune T cell receptor binding underlies pathogen and insulin peptide cross-reactivity. J Clin Invest 2016;126:2191–204. 10.1172/JCI85679 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref17] 17. Karnaukhov VK, Shcherbinin DS, Chugunov AO et al. Structure-based prediction of T cell receptor recognition of unseen epitopes using TCRen. Nat Comput Sci 2024;4:510–21. 10.1038/s43588-024-00653-0 [DOI] [PubMed] [Google Scholar]

[ref18] 18. Adams JJ, Narayanan S, Liu B et al. T cell receptor signaling is limited by docking geometry to peptide-major histocompatibility complex. Immunity 2011;35:681–93. 10.1016/j.immuni.2011.09.013 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref19] 19. Antunes DA, Rigo MM, Freitas MV et al. Interpreting T-cell cross-reactivity through structure: implications for TCR-based cancer immunotherapy. Front Immunol 2017;8:1210. 10.3389/fimmu.2017.01210 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] 20. Lee CH, Salio M, Napolitani G et al. Predicting cross-reactivity and antigen specificity of T cell receptors. Front Immunol 2020;11:565096. 10.3389/fimmu.2020.565096 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] 21. Ghoreyshi ZS, George JT. Quantitative approaches for decoding the specificity of the human T cell repertoire. Front Immunol 2023;14:1228873. 10.3389/fimmu.2023.1228873. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref22] 22. Richards FM. Areas, volumes, packing, and protein structure. Annu Rev Biophys Bioeng 1977;6:151–76. [DOI] [PubMed] [Google Scholar]

[ref23] 23. Duhovny D, Nussinov R, Wolfson HJ. Efficient unbound docking of rigid molecules. In: Guigó R, Gusfield D, (eds.), Algorithms in Bioinformatics, Vol. 2452. Berlin, Heidelberg: Springer, 2002, 185–200. [Google Scholar]

[ref24] 24. Bronstein MM, Bruna J, LeCun Y et al. Geometric deep learning: going beyond Euclidean data. IEEE Signal Process Mag 2017;34:18–42. [Google Scholar]

[ref25] 25. Atz K, Grisoni F, Schneider G. Geometric deep learning on molecular representations. Nat Mach Intell 2021;3:1023–32. [Google Scholar]

[ref26] 26. Gainza P, Sverrisson F, Monti F et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat Methods 2020;17:184–92. 10.1038/s41592-019-0666-6 [DOI] [PubMed] [Google Scholar]

[ref27] 27. Gainza P, Wehrle S, van Hall-Beauvais A et al. De novo design of protein interactions with learned surface fingerprints. Nature 2023;617:176–84. 10.1038/s41586-023-05993-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref28] 28. Marchand A, Buckley S, Schneuing A et al. Targeting protein–ligand neosurfaces with a generalizable deep learning tool. Nature 2025;639:522–31. 10.1038/s41586-024-08435-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] 29. Yang X, Garner LI, Zvyagin IV et al. Autoimmunity-associated T cell receptors recognize HLA-B^*27-bound peptides. Nature 2022;612:771–7. 10.1038/s41586-022-05501-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] 30. Felix NJ, Allen PM. Specificity of T-cell alloreactivity. Nat Rev Immunol 2007;7:942–53. [DOI] [PubMed] [Google Scholar]

[ref31] 31. Sanner MF, Olson AJ, Spehner J-C. Reduced surface: an efficient way to compute molecular surfaces. Biopolymers 1996;38:305–20. [DOI] [PubMed] [Google Scholar]

[ref32] 32. Yin S, Proctor EA, Lugovskoy AA et al. Fast screening of protein surfaces using geometric invariant fingerprints. Proc Natl Acad Sci 2009;106:16622–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] 33. Dolinsky TJ, Czodrowski P, Li H et al. PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res 2007;35:W522–5. 10.1093/nar/gkm276 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] 34. Baker NA, Sept D, Joseph S et al. Electrostatics of nanosystems: application to microtubules and the ribosome. Proc Natl Acad Sci 2001;98:10037–41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref35] 35. Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982;157:105–32. [DOI] [PubMed] [Google Scholar]

[ref36] 36. Kortemme T, Morozov AV, Baker D. An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein–protein complexes. J Mol Biol 2003;326:1239–59. [DOI] [PubMed] [Google Scholar]

[ref37] 37. Garboczi DN, Ghosh P, Utz U et al. Structure of the complex between human T-cell receptor, viral peptide and HLA-A2. Nature 1996;384:134–41. [DOI] [PubMed] [Google Scholar]

[ref38] 38. Jiang Y, Huo M, Cheng Li S. TEINet: a deep learning framework for prediction of TCR–epitope binding specificity. Brief Bioinform 2023;24:bbad086. 10.1093/bib/bbad086 [DOI] [PubMed] [Google Scholar]

[ref39] 39. Peng X, Lei Y, Feng P et al. Characterizing the interaction conformation between T-cell receptors and epitopes with deep learning. Nat Mach Intell 2023;5:395–407. [Google Scholar]

[ref40] 40. Bulek AM, Cole DK, Skowera A et al. Structural basis for the killing of human beta cells by CD8+ T cells in type 1 diabetes. Nat Immunol 2012;13:283–9. 10.1038/ni.2206 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref41] 41. Thomsen MCF, Nielsen M. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res 2012;40:W281–7. 10.1093/nar/gks469 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref42] 42. Mikhaylov V, Brambley CA, Keller GLJ et al. Accurate modeling of peptide-MHC structures with AlphaFold. Structure 2024;32:228–241.e4. 10.1016/j.str.2023.11.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref43] 43. Marzella DF, Parizi FM, van Tilborg D et al. PANDORA: a fast, anchor-restrained modelling protocol for peptide: MHC complexes. Front Immunol 2022;13:878762. 10.3389/fimmu.2022.878762 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref44] 44. Motmaen A, Dauparas J, Baek M et al. Peptide-binding specificity prediction using fine-tuned protein structure prediction networks. Proc Natl Acad Sci 2023;120:e2216697120. 10.1073/pnas.2216697120 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref45] 45. Evans R, O’Neill M, Pritzel PA et al. Protein complex prediction with AlphaFold-Multimer. 2021. 10.1101/2021.10.04.463034 [DOI]

[ref46] 46. Blevins SJ, Pierce BG, Singh NK et al. How structural adaptability exists alongside HLA-A2 bias in the human αβ TCR repertoire. Proc Natl Acad Sci 2016;113:E1276–E1285. 10.1073/pnas.1522069113 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref47] 47. Fodor J, Riley BT, Borg NA et al. Previously hidden dynamics at the TCR–peptide–MHC interface revealed. J Immunol 2018;200:4134–45. [DOI] [PubMed] [Google Scholar]

[ref48] 48. McMaster B, Thorpe CJ, Rossjohn J et al. Quantifying conformational changes in the TCR:pMHC-I binding interface. Front Immunol 2024;15:1491656. 10.3389/fimmu.2024.1491656 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref49] 49. Wu LC, Tuot DS, Lyons DS et al. Two-step binding mechanism for T-cell receptor recognition of peptide–MHC. Nature 2002;418:552–6. [DOI] [PubMed] [Google Scholar]

[ref50] 50. Wang Y, Singh NK, Spear TT et al. How an alloreactive T-cell receptor achieves peptide and MHC specificity. Proc Natl Acad Sci U S A 2017;114:E4792–801. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref51] 51. Chen J, Zhao B, Lin S et al. TEPCAM: prediction of T-cell receptor–epitope binding specificity via interpretable deep learning. Protein Sci 2024;33:e4841. 10.1002/pro.4841 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref52] 52. T RR, Demerdash ONA, Smith JC. TCR-H: explainable machine learning prediction of T-cell receptor epitope binding on unseen datasets. Front Immunol 2024;15:1426173. 10.3389/fimmu.2024.1426173 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref53] 53. Pang Z, Lu MM, Zhang Y et al. Neoantigen-targeted TCR-engineered T cell immunotherapy: current advances and challenges. Biomark Res 2023;11:104. 10.1186/s40364-023-00534-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref54] 54. Klebanoff CA, Chandran SS, Baker BM et al. T cell receptor therapeutics: immunological targeting of the intracellular cancer proteome. Nat Rev Drug Discov 2023;22:996–1017. 10.1038/s41573-023-00809-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref55] 55. Ott PA, Hu Z, Keskin DB et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature 2017;547:217–21. 10.1038/nature22991 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref56] 56. Zhou Z, Chen J, Lin S et al. GRATCR: epitope-specific T cell receptor sequence generation with data-efficient pre-trained models. IEEE J Biomed Health Inform 2025;29:2271–83. 10.1109/JBHI.2024.3514089 [DOI] [PubMed] [Google Scholar]

[ref57] 57. Chu Y, Zhang Y, Wang Q et al. A transformer-based model to predict peptide–HLA class I binding and optimize mutated peptides for vaccine design. Nat Mach Intell 2022;4:300–11. [Google Scholar]

PERMALINK

Decoding TCR recognition via geometric deep learning of immunological fingerprints

Chun Shang

Kevin C Chan

Ruhong Zhou

Abstract

Introduction

Methods

Dataset

Table 1.

HLA-A*02-peptide-TCR dataset

HLA-B*27-peptide-TCR dataset

Immunological fingerprint

pMHC surface featurization

Definition of interface regions

Network architecture

Discriminative model training, evaluation, and explanation on the HLA-A*02 dataset

All-test cross-validation: training and evaluation

Scoring analysis

Model generalization and explainability on HLA-B*27 structures

Training, inference, and interpretability

Benchmarking experiments with state-of-the-art methods

Results

A surface-based modeling framework captures discriminative TCR-binding preferences

Modeling framework: surface feature extraction and predictive architecture

Figure 1.

Figure 2.

Cross-validation confirms model’s capacity for T cell receptor discrimination

Figure 3.

Patch-level model interpretation reveals mechanistic insights into T cell receptor specificity

Figure 4.

Model generalization across HLA alleles demonstrates robust inference capability

Figure 5.

Benchmarking against state-of-the-art methods validates model superiority

Discussion

Key Points

Supplementary Material

Acknowledgements

Contributor Information

Author contributions

Funding

Conflicts of interest

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

HLA-A^*02-peptide-TCR dataset

HLA-B^*27-peptide-TCR dataset

Discriminative model training, evaluation, and explanation on the HLA-A^*02 dataset

Model generalization and explainability on HLA-B^*27 structures