Abstract
Background
The scarcity of available structural data makes characterizing the binding of T-cell receptors (TCRs) to peptide-Major Histocompatibility Complexes (pMHCs) very challenging. The recent surge in sequencing data makes TCRs an ideal target for protein structure modeling. Through these 3D models, researchers can potentially identify key motifs on the TCR’s binding regions. Furthermore, computational methods can be employed to pair a TCR structure with a pMHC, leading to predictions of docked TCRpMHC structures. However, going from sequence to predicted 3D TCRpMHC complexes requires a non-trivial amount of steps and specialized immunoinformatics expertise.
Results
We developed a Python tool named TRain (T-cell Receptor Automated ImmunoiNformatics) to streamline this process by: (1) converting single-cell sequencing data into full TCR amino acid sequences; (2) efficiently submitting TCR amino acid sequences to existing TCR-specific modeling pipelines; (3) pairing modeled TCR structures with existing crystal structures of pMHC complexes in a non-biased manner before docking; (3) automating the preparation and submission process of TCRs and pMHCs for docking using the RosettaDock tool; and (4) providing scripts to analyze the predicted TCRpMHC interface. We illustrate the basic functionality of TRain with a case study, while further information can be found in a dedicated manual.
Conclusions
We introduced an open-source tool that streamlines going from full TCR sequence information to predicted 3D TCRpMHC complexes, using well-established tools. Analyzing these predicted complexes can provide deeper insights into the binding properties of TCRs, and can help shed light on one of the key steps in adaptive immune responses.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12859-025-06074-8.
Keywords: TCR modeling, protein docking, Immunoinformatics
Background
A TCR is a membrane-anchored heterodimer composed of an alpha (or gamma) and beta (or delta) chain. The extracellular portions of each chain contain both a variable and a constant region. The TCR chains’ composition results from somatic gene recombination known as V(D)J recombination. The alpha chain’s variable region results from a variable (V) segment paired with a joining (J) segment. The variable region of the beta chain is the result of a variable (V) segment, a diversity (D) segment, and a joining (J) segment assembled together. As these segments are recombined, additional nucleotides are inserted, leading to considerable sequence diversity [1, 2]. The gene segments result in an immunoglobulin fold containing four loops per chain, referred to as complementarity determining region (CDR) 1, 2, 2.5, and 3, respectively [3].
TCRs bind small peptides displayed on the surfaces of cells. These peptides are bound to Major Histocompatibility Complexes (MHCs) and peptide-MHC complexes are known as pMHCs. There are two classes of MHCs: MHC Class I and Class II. The former acts as a window into the expressed proteome of every nucleated cell. In contrast, MHC Class II displays peptides that are phagocytized by professional antigen-presenting cells such as dendritic cells, mono-nucleated cells, and B-cells [4].
The complex comprising a TCR bound to a pMHC is referred to as a TCRpMHC complex. Figures 1A provides a structural overview of this complex, showing a TCR (comprised of an alpha and a beta chain) inserted into the membrane of the T-cell with CD3 delta, epsilon, gamma chains in complex. The pMHC is displayed on the membrane of another cell and shown in complex with supporting chain beta-2 microglobulin. This Class I MHC has a loaded peptide in its binding cleft. Figure 1C shows the location of the different CDR regions in the TCR footprint, as well a top view of the pMHC. The different CDR regions play an important role in binding to the peptide and the MHC, especially the hypervariable CDR3 region [5, 6].
Fig. 1.
Structure of the TCRpMHC complex Panel A highlights the TCR and antigen interaction between a TCR and an pMHC, inserted in their respective membranes. Panel B displays an exploded view of the TCRpMHC complex. The chains displayed are TCR Alpha (orange), TCR Beta (purple), MHC (blue), Beta-2 Microglobulin (gray), and peptide (pink). Panel C is a rotated view of the binding interfaces of the TCR and the pMHC. The white labels mark the CDR 1–3 loops of the alpha and beta chains. The black labels mark the residues of the peptide; residues not labeled have side chains oriented inside the binding cleft of the MHC. Panel D is a representation of the TCRpMHC complex based on the positioning in the crystal structure 3GSN [7]
Characterizing the binding of TCRs to pMHCs is essential for understanding one of the key steps in adaptive immune responses. Unfortunately, structural data on TCRpMHC complexes is currently very limited, with under 200 unique experimentally determined TCRpMHC structures available [8]. In contrast, over the last few years there has been an increase in the availablility of single-cell sequencing data. Single-cell sequencing data (as opposed to data generated by bulk sequencing) is necessary for reconstructing full TCRs, as it allows to determine the pairing between the alpha and the beta chains.
The increased availability of single-cell data makes it possible to fill the dearth of 3D structures of TCRs and TCRpMHC complexes by building protein structure models. Using structural models of TCRs, we can investigate the TCR in isolation or in complex with the pMHC, identifying TCR regions that are conserved across receptors binding to the same antigen. Additionally, we can produce models of TCR/pMHC complexes by performing protein docking, carry out further analyses on the docked TCRpMHC complex, and characterize the binding interface. These models can be helpful for hypothesis generation in the absence of actual crystal structures.
In this paper we present TRain (T-cell Receptor Automated ImmunoiNformatics), a Python package comprising multiple scripts designed to streamline the building of TCRs and TCRpMHC complexes from full TCR sequences, using existing tools. TRain has the following capabilities:
Using single-cell sequencing data (consisting of gene segments and the amino acid sequence of the CDR3 region) to produce the full amino acid sequence of TCRs.
Submitting TCR amino acid sequences to existing TCR-specific modeling pipelines in a high-throughput manner.
Pairing modeled TCR structures with existing crystal structures of pMHC complexes in an unbiased way prior to performing docking.
Automating the laborious process of preparing and submitting TCR and pMHCs for docking with the tool RosettaDock.
Producing several analytical figures and tables which provide insight into docked structures and predicted interfaces.
TRain relies on either the Rosetta suite [9] or TCRModel2 [10] to build TCR/pMHC models. As such, TRain aims to streamline the use of existing modeling and docking tools while providing utility functions to study the resulting complexes; it is not intended to be a new modeling or docking tool. The rest of this paper contains a brief description of the various components of TRain, as well as a case study to illustrate possible uses. For further details on our tool, we encourage the user to consult a comprehensive manual available with the source code.
Implementation
Components of TRain
The TRain suite consists of five components and was developed in a modular way, enabling users to replace individual components if they so desire. Each step produces a standard file type that can be passed to the next step of the pipeline, which is illustrated in full in Fig. 2.
Fig. 2.
TRain Pipeline Flowchart. There are five different components in the TRain pipeline, starting with the input step, where TCR chain segments are assembled into complete amino acid sequences for the TCR alpha and beta chains. This step yields two FASTA files as outputs, one for each chain. In the modeling step, these FASTA files are used to build a 3D structure of the TCR. This modeling is performed using the TCRmodel program, which is part of the Rosetta suite [11]. In the pairing step, the PDB files produced in the modeling step are used to pair the TCRs with pMHC structures derived from crystal structures, setting the stage for TCRpMHC docking. In the docking step, the PDBs from the pairing step are docked using the RosettaDock protocol (version 4.0 [12]). The final analysis step allows the user to explore the TCRpMHC interface
Assembling full TCR sequences
TCR sequencing data is commonly stored as individual gene segments and a corresponding CDR3 amino acid sequence. The program SeqConductor.py takes as input a .xlsx or .csv file with columns corresponding to the six segments and a sequence ID column. TCR gene segments VA, JA, VB, and JB are aligned to their corresponding CDR3 amino acid sequences.
Alignments are produced with Align.PairwiseAligner in the Biopython package [13]. Overlap of amino acids can be limited to only a few residues, most commonly for the VA-CDR3a overlap. In cases of poor overlap where the alignment would imply that the CDR3a segment starts before the V segment, a sliding window alignment is used. The sliding window approach corrects for cases where overlap may be only two amino acids. In cases where only a single amino acid overlaps, the CDR3 segment is appended directly to the end of the V segment. Validation was manually performed on a large set of single-cell sequencing data. In rare cases where shallow overlap occurs, a message is reported to the user of the tool.
Output is in the form of two fasta files. A fasta file is created for each chain with headers corresponding to the chain ID provided in the input file or generated by SeqConductor.py. Alternatively, a CSV table with information about each alignment can be created.
Prior to submitting the sequences for modeling of the TCR chains, the constant regions are appended to the end of the assembled variable region. Appending constant regions prevents poor alignments in the final portions of the sequences in the modeling programs. Both the gene segments and constant regions are pulled from IMGT/GENE-DB [14].
3D modeling of the TCR
There are five published TCR-specific modeling programs [10, 11, 15–17], three of which are only accessible through web servers. In contrast, TCRmodel is distributed with the Rosetta suite, which consists of protein modeling and protein structure analysis programs. The TRain program ModelEngine.py utilizes the local version of TCRmodel with the added feature of being able to submit jobs in parallel. Rosetta is also used for conducting TCRpMHC docking and analysis for TRain. Instead of TCRmodel, a user can also use the output from one of the other TCR-modeling web servers. The output from TCRmodel and the other modeling tools is only the variable regions of the TCR. Support for TCRmodel2 [10] has also been integrated in TRain. TCRmodel2 has several additional dependencies but allows users to model full TCRpMHC complexes directly in a modified implementation of AlphaFold2 [18]. However, it is important to note that predictive performance of AlphaFold2 on discriminating the effect of point mutations has been shown to be poor [19].
Pairing the TCR with the pMHC
To pair modeled TCRs with pMHCs in a reproducible manner, the TurnTable.py program in the TRain suite performs multiple steps to position the two components. First, the TCR is centered to coordinates 0,0,0 of the PDB file. Then the TCR principal axes of inertia are aligned along the x, y, z axes. Next, the program ensures that the TCR interaction interface is in the lower part of the structure. The alpha chain is positioned on the left of the structure, while the beta chain is on the right.
A reference TCRpMHC PDB is then superimposed on the newly positioned TCR. The reference TCRpMHC PDB contains a pMHC 10Å away from the TCR interface with the TCR and pMHC axes aligned to prevent bias during docking. This differs from previous TCR docking protocols such as TCRFlexDock, which pre-aligned the interfaces in the “straddle” position across the pMHC at a 45-degree angle [20].
In the final steps, the submitted pMHC is superimposed on the reference pMHC, and the reference structure is removed. Then, the PDB atoms and amino acids are renumbered, adhering to the requirements of the Rosetta scripts for docking.
TCRpMHC docking
The docking protocol carried out by TCRcoupler.py follows the steps described in the RosettaDock v4.0 publication [12]. Compared to the previous 3.2 version, RosettaDock v4.0 features an improved scoring method that allows for ensemble docking. In the TRain pipeline, ensembles are produced for the TCR only and not for the pMHC, to avoid large computational overheads. Ensembles are produced with three different relaxation methods: Normal Modes relax, the standard Rosetta Relax protocol, and Rosetta Backrub [9, 21, 22]. As described by Marze et al. [12], 100 ensembles are generated, with 40 produced with Normal Mode relax, 30 with Rosetta’s standard Relax protocol, and 30 with the Backrub approach. The previous method described in RosettaDock 3.2 is also available and produces docked structures more rapidly, since it relies on rigid structures only.
Rosetta is compiled with the optional mpi parameter to allow parallel relaxing, docking, and refinement steps. By default, 100 TCR ensembles are produced with only one pMHC relaxed structure. These structures then undergo 10,000 docking runs. The TCR with the best interface score is then submitted for 100 steps of refinement. Example runs were performed on a compute node with four Intel Xeon Platinum 8176 M max turbo freq. 3.8GHz CPUs, utilizing 64 cores of the available cores. Performing 10,000 docking runs took on average 3 h.
A validation and clustering step are performed after docking, both of which are optional parameters. The goal of validation and clustering is to ensure that near-impossible binding poses are removed and that a user can select an alternative pose that may better describe the interaction, based on information external to the docking method. The validation step verifies that the TCR orientation over the pMHC is at a suitable angle. Validation is performed by ensuring that an atom from the alpha chain is closest to the N-terminus of the peptide and conversely the C-terminus for the beta chain. This aligns with internal testing on all solved / TCR structures as well as previous publications [5, 23].
From the top 200 scoring poses that pass validation, a pairwise distance matrix is generated using RMSD as the distance metric. From this distance matrix, embedding into a two-dimensional space is performed using Multidimensional Scaling (MDS). Spectral clustering is performed on the MDS coordinates, and the eigengap heuristic is used to determine the optimal number of clusters [24, 25]. Clustering allows for potential good scoring poses that may look distinctly different from the top pose to be highlighted for the user.
As a result of utilizing Rosetta for docking, chain numbering was converted to be sequential across chains. To convert the numbering back to the original numbering scheme, PostCoupler.py can be used. A reference TCR PDB structure can be passed to PostCoupler.py and based on alignment, the numbering will be updated. The procedure also works if the modeled TCR chain has a few mutations with respect to the reference.
Analysis
DataDepot.py facilitates insight into the docked TCRpMHC structure. To gather information about what amino acids are interacting at the interface, the Rosetta script Residue Energy Breakdown is used. The value collected from Residue Energy Breakdown is the total score or Rosetta Energy Units (REU) for the pairwise analysis of each amino acid interacting at the interface. The REU is the result of a weighted sum calculated with van der Waals attraction/repulsion, solvation, electrostatic potential, hydrogen bonds (side-chain and backbone), disulfide statistical energies, disulfide geometry potential, omega dihedral of backbone, Dunbrack rotamer probability, and amino acid reference energy [26]. The output from Residue Energy Breakdown is converted into a table and a heatmap that provides a quick visualization of the interactions at the interface. The heatmap can also be produced with the distance values of the TCR amino acids with respect to the peptide in the pMHC.
Another metric of the interaction interface is produced with the Rosetta script Interface Analyzer. Interface Analyzer calculates binding energies, buried interface surface areas, and other metrics pertaining to the interface of the TCR and pMHC. A score is calculated for the interface of the alpha chain to the pMHC and also for the beta chain to the pMHC. The output is a table of the submitted TCRpMHCs and their use of the alpha or beta chain in interacting with the pMHC.
DataDepot.py also facilitates plotting of metrics that RosettaDock reports. These metrics include interface score (I_sc), total score, interface RMSD (Irmsd), and fraction of native contacts (Fnat). The interface score evaluates the interaction energy at the binding interface, providing crude insight into the strength and specificity of the predicted binding. The total score combines contributions from all energy terms in the Rosetta energy function, offering an overall assessment of the docking pose. Interface RMSD, on the other hand, measures the structural deviation of interface residues between the predicted and native complexes, providing an additional indicator of the geometric accuracy of the docking model. Fnat quantifies the fraction of native contacts captured in the docking model if a native comparison is provided. Detailed descriptions of these metrics can be found in the supplementary information accompanying RosettaDock v4.0 documentation [12]. All metrics, except for Fnat, should target lower values for preferred docking poses, while Fnat should ideally approach 1.0, reflecting high similarity to the native contacts.
Results: sample case study
TRain provides users with a host of functions aimed at making TCRpMHC docking prediction and analysis more efficient in order to model TCRpMHC interactions. Here, we showcase a brief analysis which utilizes TRain to assess the strength of potential TCRpMHC binding pairs. We point out that the goal of this case study is to illustrate how to utilize TRain from modeling the TCR to analyzing the docked interface, and not to showcase the reliability of protein docking for predicting TCR/pMHC binding. We selected a well-characterized, publicly available TCR with known binding specificity and present the results of pairing and docking this receptor with its true binding epitope in addition to an epitope with distinctly different binding preferences (a ‘negative’ or ‘false’ epitope).
The JM22 TCR is a widely studied TCR with a defined “TRAV27*01”, “TRAJ42*01”, “TRBV19*01”, and “TRBJ2-7*01” variable and joining segment profile with established specificity for the HLA-A*02-restricted influenza A virus matrix peptide (IAV M1) (58–66). The solved crystal structure of the JM22 TCR with the IAV M1 epitope ‘GILGFVFTL’ and HLA-A*02 is captured in PDB 1OGA. JM22 has a number of close homologs in publicly available data, one of which was selected as the TCR of interest for this case study. The ’negative’ epitope used in this study is the Melan-A/MART-1 melanoma tumor antigen ’ELAGIGILTV’. MART-1 and its solved TCRpMHC structure PDB 3HG1 were selected for use as a likely true negative pair with the JM22 homolog due to the dissimilarity of its binding component sequences (epitope, CDR3a, and CDR3b) to those of JM22 and IAV M1, differing variable segment preference (“TRAV12-2*01”) in known binding pairs, and the lack of M1/MART-1 cross-reactivity reported in literature [27].
The representative high-confidence JM22 homolog sequence for use in this study was identified from the database VDJdb by selecting paired-chain sequences specific to the epitope sequence “GILGFVFTL” with the JM22 variable and joining segments (“TRAV27*01”, “TRAJ42*01”, “TRBV19*01”, “TRBJ2-7*01”), distinct CDR3 sequences, and a minimal confidence score of 3 [28]. The V/J segments and CDR3 sequences of this TCR, a single point-mutation homolog of the JM22 reference, are shown in Fig. 3A. The full sequence of this receptor was assembled with SeqConductor.py and subsequently modeled with ModelEngine.py. The modeled TCR was paired with the IAV M1 pMHC from the JM22 reference crystal structure (PDB 1OGA) as well as the pMHC from the MART-1 crystal structure (PDB 3HG1) using TurnTable.py. To investigate the reproducibility of TRain results, the resulting TCR-1OGA and TCR-3HG1 paired structures were docked using TCRcoupler.py in a series of five runs, each run producing 10,000 rigidly docked structures and 100 refined docking poses per model.
Fig. 3.
Investigating biologically relevant binding poses with TRain. The sequence selected for ModelEngine.py modeling is obtained in the form of segments + CDR sequences, with the single AA difference from JM22 highlighted in white (A). This TCR is paired with the pMHC from crystal structures 1OGA and 3HG1 using TurnTable.py and docked in a series of five TCRcoupler.py runs. The RMSD values between the top scoring pose and the top 200 scoring poses (with the first point having a value of 0Å as it represents the RMSD between the top-scoring pose and itself) (B) and RMSD MSD clusters (C) show more consistent predicted binding poses for the biologically relevant TCR-1OGA pair, shown here for the first out of five total runs. Plots for the other four runs are available in the github repository). Using DataDepot.py, top binding pose structures for both TCR-pMHC pairs were assessed for the model quality metric ’Fnat’ along with interface RMSD (D), interface score (E), and total score (F)
In each docking run, TCRcoupler.py calculates docking pose RMSD from the highest-scoring structure for the top 200 binding poses, along with RMSD MDS clustering plots showing the similarity of these binding poses to one another. Here, these plots are used to evaluate predicted binding pose consistency; RMSD plots for one representative run of the JM22 homolog are shown in Fig. 3B and C. DataDepot.py provides a host of analysis functions; here, it is used to investigate several metrics of model quality and similarity to native structures, including “Fnat” (similarity to native structure binding interface), “I_sc” (interface score), “Irms” (interface RMSD), and “total_score” (overall Rosetta model score). The scores for the top 100 refined docking poses of the JM22 homolog paired with the 1OGA and 3HG1 pMHCs in each of the five docking runs are shown in Fig. 3D, E, and F.
Across each of the five RosettaDock docking runs, predicted TCRpMHC models for this TCR yield higher quality and similarity scores, along with more consistent binding poses assessed using RMSD, when paired with the true binding epitope (IAV M1) than with the negative epitope (MART-1). In addition to predicting lower-scoring models for the biologically unlikely pair, predicted binding poses were highly variable, producing more highly dispersed (dissimilar) poses in RMSD MDS spectral clustering. Taken together, these results show poorer quality complexes when the JM22 homolog TCR is paired with a highly unlikely binding pair (MART-1) than when it is paired with its well-established binding pair (IAV M1), as is expected. Of note, this case study serves solely as a demonstration of TRain’s end-to-end workflow. Further work is required to establish whether TCR-pMHC models can be used to infer binding, and TRain’s capabilities could support such research efforts.
Discussion and conclusions
TRain is designed to streamline TCR modeling and docking to a pMHC with the use of existing tools. The well-known limitations associated with 3D structure prediction and docking apply. For example, if the quality of the TCR model is poor, the docked TCRpMHC will also be of low quality [29]. Users are encouraged to check the TCR models for errors using tools like ProSA [30] before proceeding further with the analysis, as normally done in structure predictions tasks. Additional factors may influence the quality of the results, such as MHC class and allele variations, which could impact TCR docking. The immunoinformatics community has started exploring the influence of MHC alleles and peptide/MHC binding predictions [31]. Expanding these efforts to address the TCR/pMHC interaction would be highly beneficial.
TRain offers several opportunities for expansion with its potential for additional features and applications. As TRain facilitates the application of several tools, additional features specific to each tool can be readily incorporated. The open-source nature of the software allows users to tailor it according to their needs. One significant enhancement would be the integration of a graphical user interface (GUI), which would substantially lower the entry barrier for new users. With recent advances during the development of TRain, such as TCRmodel2, users can now model pMHCs [10], further broadening its utility. Additional areas of exploration could include ensemble generation with Molecular Dynamics, scoring selection of TCRpMHCs after docking runs based on RMSD clustering, and restraining the docking angle of the TCRpMHC interface based on known ranges.
It is our hope that TRain will facilitate the investigation of various critical issues in immunology that can now be probed thanks to the widespread availability of sequencing data by providing a pipeline through which this data can be transformed from sequence to predicted and docked structures. A prime example of this tool’s utility could be the examination of known cross-reactive TCRs [32–34]. While these TCRs are extensively documented in the literature, their study has been constrained by a scarcity of structural data. TRain could be used for streamlining the generation of structural models and generating hypotheses of cross-reactive binding events.
We introduced TRain, a tool designed to streamline TCR modeling and docking. TRain effectively integrates existing tools and essential steps to facilitate the computational docking of TCRpMHC complexes. The case study featuring the public TCR JM22 illustrates a possible use of TRain as an exploratory tool to explore TCR/pMHC complexes from a structural perspective. It is our hope that the use of TRain in future studies may help benchmark more rigorously existing modeling and docking tools, and possibly shed further light on the complex interaction between T-cells and their targets.
Availability and requirements
Project name: TRain
Project home page: https://github.com/Aseamann/TRain
Operating systems: Linux, MacOS
Programming language: Python
Other requirements: Rosetta, TCRmodel2
License: GPL3.0
Any restrictions to use by non-academics: None
Supplementary Information
Acknowledgements
The authors would like to thank Ishwor Thapa for providing infrastructure support.
Author Contributions
A.S. wrote the bulk of the source code and wrote parts of the manuscript. M.B. contributed to the source code, tested the program, developed the case study, and wrote parts of the manuscript. R.E. tested the program and contributed to the source code. A.G. and L.S. provided feedback on the tool and the case study. D.G. conceived and supervised the project, contributed to the source code, and wrote parts of the manuscript. All authors reviewed the manuscript.
Funding
NIH R01 AI159314-04 to A.G, L.S, and D.G.
Data Availability
This manuscript does not report data generation or analysis. All data needed for running the quick start guide is available at:https://github.com/Aseamann/TRain
Declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interest
The authors declare that they have no Conflict of interest.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Glusman G, Rowen L, Lee I, Boysen C, Roach JC, Smit AF, Hood L. Comparative genomics of the human and mouse T cell receptor loci. Immunity. 2001;15(3):337–49. [DOI] [PubMed] [Google Scholar]
- 2.Ehrlich R, Glynn E, Singh M, Ghersi D. Computational methods for predicting key interactions in T cell–mediated adaptive immunity. Annu Rev 2024; [DOI] [PubMed]
- 3.Dash P, Fiore-Gartland AJ, Hertz T, Wang GC, Sharma S, Souquette A, Thomas PG. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature. 2017;547(7661):89–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Abbas AK, Lichtman AH, Pillai S. Basic immunology: functions and disorders of the immune system. Elsevier Health Sciences (2015)
- 5.Ehrlich R, Ghersi D. Analyzing t cell receptor alpha/beta usage in binding to the pMHC. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM),2017; pp. 83–87
- 6.Ehrlich R, Kamga L, Gil A, Luzuriaga K, Selin LK, Ghersi D. SwarmTCR: a computational approach to predict the specificity of T cell receptors. BMC Bioinformatics. 2021;22(1):422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gras S, Saulquin X, Reiser J-B, Debeaupuis E, Echasserieau K, Kissenpfennig A, Legoux F, Chouquet A, Le Gorrec M, Machillot P, Neveu B, Thielens N, Malissen B, Bonneville M, Housset D. Structural bases for the affinity-driven selection of a public TCR against a dominant human cytomegalovirus epitope. J Immunol. 2009;183(1):430–7. [DOI] [PubMed] [Google Scholar]
- 8.Leem J, de Oliveira SHP, Krawczyk K, Deane CM. STCRDab: The structural t-cell receptor database. Nucleic Acids Res. 2018;46:D406–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Fleishman SJ, Leaver-Fay A, Corn JE, Strauch EM, Khare SD, Koga N, Baker D. Rosettascripts: A scripting language interface to the Rosetta Macromolecular modeling suite. PLoS ONE. 2011;6(6): e20161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yin R, Ribeiro-Filho HV, Lin V, Gowthaman R, Cheung M, Pierce BG. TCRmodel2: high-resolution modeling of t cell receptor recognition using deep learning. Nucleic Acids Res. 2023;51:W569–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gowthaman R, Pierce BG. TCRmodel: high resolution modeling of T cell receptors from sequence. Nucleic Acids Res. 2018;46(W1):W396–401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Marze NA, Roy Burman SS, Sheffler W, Gray JJ. Efficient flexible backbone protein–protein docking for challenging targets. Bioinformatics. 2018;34(20):3461–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, DeHoon MJ. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Giudicelli V, Chaume D, Lefranc MP. IMGT/GENE-DB: A comprehensive database for human and mouse immunoglobulin and T cell receptor genes. Nucleic Acids Res. 2005;33:D256–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Klausen MS, Anderson MV, Jespersen MC, Nielsen M, Marcatili P. LYRA, a webserver for lymphocyte receptor structural modeling. Nucleic Acids Res. 2015;43(W1):W349–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Schritt D, Li S, Rozewicki J, Katoh K, Yamashita K, Volkmuth W, Standley DM. Repertoire builder: high-throughput structural modeling of B and T cell receptors. Mol Syst Design Eng. 2019;4(4):761–8. [Google Scholar]
- 17.Wong WK, Marks C, Leem J, Lewis AP, Shi J, Deane CM. TCRBuilder: multi-state T-cell receptor structure prediction. Bioinformatics. 2020;36(11):3580–1. [DOI] [PubMed] [Google Scholar]
- 18.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Pak MA, Markhieva KA, Novikova MS, Petrov DS, Vorobyev IS, Maksimova ES, Ivankov DN. Using AlphaFold to predict the impact of single mutations on protein stability and function. Plos One. 2023;18(3): e0282689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Pierce BG, Weng Z. A flexible docking approach for prediction of T cell receptor–peptide–MHC complexes. Protein Sci. 2013;22(1):35–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tyka MD, Keedy DA, André I, DiMaio F, Song Y, Richardson DC, Baker D. Alternate states of proteins revealed by detailed energy landscape mapping. J Mol Biol. 2011;405(2):607–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Smith CA, Kortemme T. Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. J Mol Biol. 2008;380(4):742–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wang J, Reinherz EL. The structural basis of αβ t-lineage immune recognition: TCR docking topologies, mechanotransduction, and co-receptor function. Immunol Rev. 2012;250(1):102–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.von Luxburg U. A tutorial on spectral clustering. Stat Comput. 2007;17(4):395–416. [Google Scholar]
- 25.Hale ML, Thapa I, Ghersi D. FunSet: an open-source software and web server for performing and displaying gene ontology enrichment analysis. BMC Bioinformatics. 2019;20(1):359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Alford RF, Leaver-Fay A, Jeliazkov JR, O’Meara MJ, DiMaio FP, Park H, Gray JJ. The Rosetta all-atom energy function for macromolecular modeling and design. J Chem Theory Comput. 2017;13(6):3031–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Cole DK, Yuan F, Rizkallah PJ, Miles JJ, Gostick E, Price DA, Sewell AK. Germ line-governed recognition of a cancer epitope by an immunodominant human T-cell receptor. J Biol Chem. 2009;284(40):27281–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bagaev DV, Vroomans RM, Samir J, Stervbo U, Rius C, Dolton G, Shugay M. VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium. Nucleic Acids Res. 2020;48(D1):D1057–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Berman H, Henrick K, Nakamura H. Announcing the worldwide protein data bank. Nat Struct Mol Biol. 2003;10(12):980–980. [DOI] [PubMed] [Google Scholar]
- 30.Wiederstein M, Sippl MJ. Prosa-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 2007;35:W407–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Glynn E, Ghersi D, Singh M. Towards equitable mhc binding predictions: Computational strategies to assess and reduce data bias. bioRxiv; 2024;
- 32.Sewell AK. Why must t cells be cross-reactive? Nat Rev Immunol. 2012;12(9):669–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Aslan N, Watkin LB, Gil A, Mishra R, Clark FG, Welsh RM, Ghersi D, Luzuriaga K, Selin LK. Severity of acute infectious mononucleosis correlates with cross-reactive influenza CD8 t-cell receptor repertoires. mBio. 2017;8(6):e01841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Mischke J, Klein S, Seamann A, Prinz I, Selin L, Ghersi D, Kraft AR. Cross-reactive T cell response exists in chronic lymphocytic choriomeningitis virus infection upon pichinde virus challenge. Viruses. 2022;14(10):2293. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
This manuscript does not report data generation or analysis. All data needed for running the quick start guide is available at:https://github.com/Aseamann/TRain



