Abstract
Formation of major histocompatibility (MHC)-peptide-T cell receptor (TCR) complexes is central to initiation of an adaptive immune response. These complexes form through initial stabilization of the MHC fold via binding of a short peptide, and subsequent interaction of the TCR to form a ternary complex, with contacts made predominantly through the complementarity-determining region (CDR) loops of the TCR. Stimulation of an immune response is central to cancer immunotherapy. This approach depends on identification of the appropriate combinations of MHC molecules, peptides, and TCRs to elicit an antitumor immune response. This prediction is a current challenge in computational biochemistry. In this chapter, we introduce a predictive method that involves generation of multiple peptides and TCR CDR 3 loop conformations, solvation of these conformers in the context of the MHC-peptide-TCR ternary complex, extraction of parameters from the generated complexes, and use of an AI model to evaluate the potential for the assembled ternary complex to support an immune response.
Keywords: MHC, Peptide, T cell receptor, Structure, Solvation, Prediction, Binding, Interaction, Conformation, Computation, Machine learning
1. Introduction
T lymphocytes are an integral component of the adaptive immune system in its response against pathogens, allergens, and cancer cells, while maintaining self-tolerance. Whether it is fighting a viral infection like the SARS-CoV-2 or keeping the growth of malignant clones at bay, T cell activation is an essential step of the proper immune response. T cells arise from the bone marrow (BM) and travel to the thymus for maturation and selection before they are released to the periphery [1]. T cell infiltration of tumors is correlated with better prognosis, and cancer immunotherapies, including immune checkpoint blockade, adoptive cellular therapy, and cancer vaccines, have proven to be effective therapeutic strategies [2]. However, only a subset of patients develops an antitumor immune response.
Cancer somatic mutations that alter amino acid coding sequences result in mutated peptides (neoantigens) that can be expressed, processed, and presented on the surface of the cancer cell, and subsequently recognized by T cells. Because normal tissues do not possess these somatic mutations, neoantigen-specific T cells are not affected by central and peripheral tolerance, and do not induce normal tissue destruction. Therefore, neoantigens present ideal targets for T cell-based cancer immunotherapy. However, the ability to identify which neoantigen elicits a T cell immune response and which T cell responds to these neoantigens remains an area of highly needed research. Developing computational tools that predict candidate cancer neoantigens and associated T cell clones would facilitate engineering of more effective immunotherapies to treat patients presenting the corresponding neoantigens.
The capacity of T cells to initiate immune responses is determined by the ability of T cell receptors (TCRs) to recognize a wide variety of antigens and neoantigens. The TCR, a protein expressed on the surface of T cells, recognizes peptides (antigens) bound to major histocompatibility complex (MHC) molecules, which are presented on the surface of antigen presenting cells (APCs). The TCR complex of most T cells comprises α- and β-glycoprotein chains, while a small T cell population has γ- and δ-chains. Each unique TCR signature is generated by a complex rearrangement process of an array of variable (V), diversity (D), and joining (J) exons to select one of each that will recombine into a functional receptor with complementarity-determining regions 3 (CDR3) responsible for the interaction between the TCR and the peptide-MHC [3–5]. The CDR3 increases repertoire diversity via somatic recombination and facilitates recognition of the huge variety of antigens by the immune system. Efforts to characterize the TCR repertoire in health and disease have increased dramatically in recent years due to advances in sequencing technologies [6]. Generated data have provided important insights into the fundamental understanding of the composition and dynamics of the immune repertoire.
TCR sequencing combined with mutational profiling in patients with different malignancies presents a great opportunity to be leveraged in the development of computational tools that can predict MHC-peptide-TCR interactions. Therefore, computational predictions associated with MHC-peptide-TCR complexes have become increasingly important, and can be performed using sequence-based or structure-based approaches [7]. Current neoantigen-prediction algorithms that use a sequence-based approach, such as NetMHCpan-4.0 [8], are rapid and are based on patterns in peptide sequences with established affinity for MHC molecules [9]. Algorithms such as DynaPred [10] use a combined sequence- and structure-based approach, in which peptides whose sequences indicate a non-binder in the first step are not subjected to structural analysis. For potential binding peptides, interactions between the MHC-peptide complex and TCR are predicted, including consideration of the conformation of the bound peptide and the impact of solvation on the peptide-protein interaction.
We have developed a computational method for prediction of peptide binding to MHC and subsequent interaction of the TCR with the MHC-peptide complex. In this method, we introduce the peptide into an “empty” MHC-TCR “complex,” in contrast to the events that occur biologically. Using this approach, we calculate interactions of the peptide with the proteins, including those mediated by water molecules that are predicted using WATGEN5 [11]. By varying the conformation of the central region of the peptide and the regions of the TCR CDR3 alpha and beta loops closest to the peptide, we generate multiple possible three-dimensional models for the MHC-peptide complex. An analysis of interactions in these models, coupled with a machine learning approach based on model building using X-ray structures, is then used to identify potential binding peptides and CDR3 loops that will support an immune response. In the following sections, we describe our approach to building of the initial model and the related approach aimed at prediction, first for the peptide only and then with inclusion of CDR3 loops. In doing so, we note that the model building approach can be used independently from our model to create a user-specific model that could then be used for predictions involving a particular subset of peptides and TCRs.
2. Software
The important data sources and software used in model building and prediction are shown in Table 1. This table also includes a brief description of the input and output for each software, and serves as an introduction to the workflow described in detail below.
Table 1.
Software and databases used in model building and prediction with associated input and output
| Software/databases | Source | Input | Output |
|---|---|---|---|
| TCR3d | https://tcr3d.ibbr.umd.edu/ | List of PDB IDs for Class I MHC-peptide-TCR complexes | |
| RCSB PDB | https://www.rcsb.org/ | List of PDB IDs for Class I MHC-peptide-TCR Complexes | PDB files |
| AddProt | https://github.com/ihaworth2/MHC-peptide-TCR/ | PDB File | Multiple peptide and/or TCR CDR3 loop conformations |
| SolPrep | https://github.com/ihaworth2/MHC-peptide-TCR/ | Peptide and/or TCR CDR3 loop conformations | Input files for solvation |
| Solvate (WATGEN5) | https://github.com/ihaworth2/MHC-peptide-TCR/ | MHC+TCR and peptide files, with multiple peptide and/or TCR CDR3 loop conformers | Interaction and solvation parameters |
| KNIME | https://hub.knime.com/ | Interaction and solvation parameters | Model building and prediction of binding peptides and TCR CDR3 loops |
3. Methods
The workflow used in model building and prediction is shown in Fig. 1. Model building refers to the machine learning model built in order to carry out the predictions. We start by taking X-ray structures of peptides or CDR3 loops that we term as “binding conformations,” as these peptide and TCR conformations bind to each other and elicit an immune response. We then modify the torsion angles and generate stereochemically allowed conformations of these structures. These are labeled as “non-binding conformations” because it is assumed that the conformational changes will reduce binding. Solvation of these structures is key in understanding the parameters that influence binding. Evaluation of the interactions of the peptide with the MHC and TCR, including entropy changes that occur due to displacement of water molecules, permits establishment of parameters that define the characteristics of “binding” peptide and CDR3 conformations. These parameters are used to develop a machine learning model in KNIME [12] for subsequent prediction of binding for MHC-peptide-TCR complexes.
Fig. 1.

Workflow depicting model building (orange) and prediction (green) for MHC-peptide-TCR complexes
The prediction section (see Fig. 1) uses almost the same approach as the model building. The binding properties of peptides and TCR CDR3 loops without crystallographic structures can be predicted by initially imposing their sequences on X-ray structure templates. Variations of these peptides and CDR3 loops can be generated in a similar fashion to give a set of possible conformations. For peptides, we also have some preliminary evidence that binding conformations may have a greater number of intramolecular hydrophobic interactions. Selection of conformations that show these trends and incorporation of related parameters in the machine learning model assists in prediction of binding conformations of the peptide and CDR3 loop.
3.1. Model Building for Peptides
Antigenic peptides associated with MHC class I molecules are commonly 9–13 amino acids residues in length. The terminal amino acids are referred to as the anchor residues because they are embedded in the MHC molecule and are usually hydrophobic in nature. The middle amino acids are largely responsible for binding and interacting with the CDR3 loops of the TCR; therefore, variations in the sequence and conformation of the central region of the peptide can affect binding of the peptide to the MHC and TCR. Similarly, the sequence and conformation of the central region of the TCR CDR3 loops also influence complex formation. For these reasons, the prediction method described below focuses on conformer variation in these regions. We use the complex formed by HLA-A2 0201 with Tax peptide from T Lymphotropic Virus Type 1 and a TCR (PDB ID: 1AO7) [13] to illustrate the method.
- Define the dataset
- Define a dataset of MHC Class I complexes (see Note 1).
- Retrieve PDB files of these complexes (https://www.rcsb.org/) [15]. These are the X-ray structures of these complexes that are referred to as “binding conformations.” Delete small molecule ligands/duplicate chains (see Note 2).
- Extract the peptide chain with 9 amino acids (usually chain C; 9–13 amino acids) from the complex and save it as a separate PDB file. This will act as a ligand during solvation. Add hydrogens and modify terminal charges in Chimera [16] or other software.
- Generate “non-binding conformations” of the peptide (AddProt):
- Save the peptide file in the Peptide folder.
- Within the complex-specific folder, modify the in_peptide.txt file (see Fig. 2a):
- Line 1: Run variables (see Note 3).
- Line 2: Insert the name of the peptide PDB file on the second line.
- Line 3: Insert the sequence of the peptide; replace the middle 3 or 4 amino acids (for odd and even peptide sequence length, respectively) with dashes “---”.
- Line 4: Change the numbers depending on the length of the sequence and number of amino acid residues varied (see Note 4).
- Line 5: Enter the central amino acid sequence to be varied conformationally.
- Line 6+: Change the increments of the phi and psi angles of the amino acids as desired (see Note 5).
- Line 7 (last line): Errors allowed in generation of the conformations (see Note 6).
- Use run.bat (see https://github.com/ihaworth2/MHC-peptide-TCR/) to run AddProt and generate “_vary_peptide.pdb” with the computed conformers (see Fig. 2b).
- Preparation of files for Solvate (SolPrep)
-
For X-ray structures (binding conformations):Run the peptide pdb file (1ao7_peptide.pdb) from step 1c (see Subheading 3.1) through SolPrep to generate a file format acceptable by Solvate.
-
For peptide conformation variation (non-binding conformations):Run the pdb file (1ao7_vary_peptide.pdb) generated by AddProt through SolPrep to select a specific number of conformations and prepare a file for Solvate (see Note 7).
-
- Solvation
- Open the Solvate GUI (see Fig. 3); specify the path of the folder that contains the input files and where the output files will be generated under “Path.”
- The “Run Name” should contain the folder name.
- The “Protein File” should contain the file name of the protein pdb file (MHC-p-TCR complex).
-
Specify the chains of the MHC and TCR part of the protein separated by commas in the “Chains” section. Do not include the peptide chain. The chains are usually A, B, D, E.Specify the peptide pdb file name depending on whether it is the X-ray structure or AddProt generated conformers in the “Ligand Poses Filename” folder (see Note 8).
- Keep the “Delimiter” as “TER.”
- Keep all other sections blank.
- Click “Run.”
- Solvation parameters and intermolecular interactions
- The Solvate run generates the “Global Results Analysis Compare” spreadsheet that contains solvation parameters and intermolecular interactions. The data for each structure is in the Poses folder.
- From the “Global Results Analysis Compare” spreadsheet, collect data (see Fig. 5a) for desired columns (see Note 10). Add a column “Binding” on the right. This column contains binary data. “1” can be used to denote all “binding conformations” and “0” denotes “non-binding conformations.” The model built in KNIME is based on these data.
- Calculate the grand average of hydropathicity index (GRAVY) score [17] using online tools or in Excel.
- Data analysis and KNIME model building
- Open KNIME (software available for download at https://hub.knime.com/) [12].
- Import the KNIME workflow (see Fig. 5b) provided at https://github.com/ihaworth2/MHC-peptide-TCR/
- In the Excel Reader node, specify the file path for the dataset sheet from step 5.
- Right click and “Execute” all the nodes.
- The output data in the Gradient Boosted Trees Predictor node gives the predictions for binding and non-binding conformations in the last column.
- The confusion matrix in the Scorer node will give a number of correctly and incorrectly predicted binders and non-binders.
Fig. 2.

(a) Input for AddProt generation of conformations of the central three amino acids of a peptide. (b) Structural output of AddProt-generated conformations (shown in red)
Fig. 3.

The Solvate GUI, showing the setup for solvation of an MHC-peptide-TCR complex
Fig. 4.

A result from Solvate (WATGEN5), showing solvation of an MHC-peptide-TCR complex. The TCR CDR3 loops are at the top of the figure. The MHC molecule is in the lower part of the figure. The peptide is shown in yellow. Solvation proceeds through addition of “layers” of water molecules: Layer 1 (blue), layer 2 (orange), layer 3 (green), layer 4 (pink), layer 5 (purple)
Fig. 5.

(a) Parameters used in KNIME (see Note 10). (b) KNIME workflow. A complete explanation is given at https://github.com/ihaworth2/MHC-peptide-TCR/
3.2. Prediction for Peptides
- Imposition of sequences on structural templates and generation of conformations (AddProt)
- Edit in_peptide.txt as described in step 2 of Subheading 3.1, with minor changes (see Fig. 6):
- Line 2: Indicating the pdb file of the template peptide (see Fig. 6).
- Line 3: Indicate the sequence of the peptide for prediction.
- Use run.bat (see https://github.com/ihaworth2/MHC-peptide-TCR/) to run AddProt and generate “_vary_peptide.pdb” with the computed conformers of the peptide sequence.
- Preparation of files for Solvate (SolPrep)
- In a preliminary step, run the peptide pdb file through the code to calculate the intramolecular hydrophobic interactions. Run “_vary_peptide.pdb” generated by AddProt through SolPrep to calculate intramolecular hydrophobic interactions (see Note 11) and select conformations.
- Run “_vary_peptide.pdb” (after selection in a) through SolPrep to select a specific number of conformations and prepare a file for Solvate. Refer to step 3 in Subheading 3.1.
- Solvation
- Solvate conformations using WATGEN5 in Solvate. Refer to step 4 in Subheading 3.1.
- Data analysis and KNIME prediction
- Make predictions in KNIME using the model built in Subheading 3.1.
- For data entry and output, refer to step 6 in Subheading 3.1.
Fig. 6.

Input for AddProt generation of conformations of the central three amino acids of a peptide in the prediction step
3.3. Model Building for TCR CDR3 Loops
Human TCRs contain an α chain and a β chain. The CDR regions in these chains are largely responsible for recognition of MHC-peptide complexes. In this section, we summarize our predictive method using conformational variation of the CDR3 loop in the α chain, with the particular example of the 1AO7 complex. The CDR3α loop in this complex has a length of 13 amino acids from (commonly) an N-terminal cysteine to a C-terminal phenylalanine. This is a typical length and sequence definition for such loops [9].
- Define the dataset
- Most of this procedure is similar to step 1 in Subheading 3.1, with some parts that are specific to the CDR3 loop.
- Extract the CDR3 loop (α is normally chain D, β is chain E) (see Note 2) and save “_CDR3a loop.pdb” in a separate pdb file, which will be used in the next step.
- Generate “non-binding conformations” of the CDR3 loop (AddProt):
- Open the folder for AddProt which contains all the necessary files.
- Modify in_peptide.txt (here, peptide refers to the CDR3 loop) file (see Fig. 7a). Most of this procedure is similar to that in step 2 in Subheading 3.1, with some CDR3 loop-specific changes:
- Line 2: Insert the name of the pdb file containing the CDR3 loop.
- Line 4: Change the numbers to match the length of the CDR3 loop (see Note 5).
- Use run.bat (see https://github.com/ihaworth2/MHC-peptide-TCR/) to run AddProt and generate “_CDR3a_vary.pdb” with different conformations of the loop (see Fig. 7b).
- Preparation of files for Solvate (SolPrep)
- For X-ray structure/Binding conformation: Refer to step 3a in Subheading 3.1.
-
For CDR3 loop conformation variation (non-binding conformations): Refer to step 3b in Subheading 3.1, with the following modificationsCoordinates for each CDR3 loop conformer in “_vary_peptide.pdb” from AddProt are inserted into the original X-ray structure to replace the original loop structure of CDR3. This file is saved as “_CDR3a vary 1.pdb.”The following files are included in the correct folder for Solvate: “_CDR3a vary 1.pdb” and “_P.pdb” (original X-ray peptide) or “_vary_peptide.pdb” (peptide conformers from AddProt). Refer to steps 2–4 in Subheading 3.1.
- Solvation, parameter work-up, and model building
- Refer to steps 4–6 in Subheading 3.1. These procedures are identical to the analysis of peptide conformers.
Fig. 7.

(a) Input for AddProt generation of conformations of the central three amino acids of a CDR3 loop. (b) Structural output of AddProt-generated conformations (shown in red)
3.4. Prediction for TCR CDR3 Loops
The predictive procedure for CDR3 loops is identical to that for the peptide in Subheading 3.2.
4. Notes
We use the TCR3d database (https://tcr3d.ibbr.umd.edu/) [14] as a basis for choice of MHC-peptide-TCR complexes and a source of sequences.
The typical organization of the PDB file is MHC (chains A and B), peptide (chain C), and TCR (chains D and E).
The numbers on the first line do not need to be changed. Further information on these flags is given at https://github.com/ihaworth2/MHC-peptide-TCR/
In order (ignoring “0.0 0”): Number of the amino acid to which the N terminus of the varied sequence is attached, the number of amino acids to be varied, and number of the amino acid to which the C terminus of the varied sequence is attached. Refer to Fig. 2a.
These are the psi(−1), phi(1), psi(1) to phi(n) torsional angles (start, finish, increment) for the n amino acids for which the conformation is being varied. Refer to Fig. 2a. A rapid run uses increments of 30°, but better results (in a reasonable time) are achieved with increments of 10 or 15°.
Errors used in determining the “fit” of generated conformers. In order: The allowed ± error in the joining peptide bond (assume 1.3 Å), joining bond angles (assume 120°), and joining omega angle (assume 180°), and the threshold clash distance. Some variation may be needed, but the numbers shown in Fig. 2a are good defaults.
In trial runs, selection of 10 conformations is appropriate. This can be increased once the model is established, but at some time expense.
More details at https://github.com/ihaworth2/MHC-peptide-TCR/
Addition of water “layers” is crucial for complete solvation of the MHC-peptide-TCR interface. We usually default to ten layers to ensure complete solvation, although typically only about seven are required. The calculation will “fail” once no more water molecules can be added, but this does not indicate overall failure of a run.
Solvate produces many different parameters that define the interface. These can be selected as needed. We show only the key parameters we have used in Fig. 5a [11]. Prot-Lig HB: Hydrogen bonding of protein to peptide; Prot-Lig HF: Hydrophobic contact of protein to peptide; Actual SWB: Single water bridges; Absolute Displaced Waters: Number of water molecules displaced by peptide; contact SWB: Further displaced water molecules.
The cutoff distance for the intramolecular hydrophobic interactions is 4 Å.
Acknowledgments
Supported by NIH grant R01CA248381.
References
- 1.Kumar BV, Connors TJ, Farber DL (2018) Human T cell development, localization, and function throughout life. Immunity 48(2):202–213. 10.1016/j.immuni.2018.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Waldman AD, Fritz JM, Lenardo MJ (2020) A guide to cancer immunotherapy: from T cell basic science to clinical practice. Nat Rev Immunol 20(11):651–668. 10.1038/s41577-020-0306-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Krangel MS (2009) Mechanics of T cell receptor gene rearrangement. Curr Opin Immunol 21(2):133–139. 10.1016/j.coi.2009.03.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Morris GP, Allen PM (2012) How the TCR balances sensitivity and specificity for the recognition of self and pathogens. Nat Immunol 13(2):121–128. 10.1038/ni.2190 [DOI] [PubMed] [Google Scholar]
- 5.Brenner MB, McLean J, Dialynas DP et al. (1986) Identification of a putative second T-cell receptor. Nature 322(6075):145–149. 10.1038/322145a0 [DOI] [PubMed] [Google Scholar]
- 6.Zewde M, Kiyotani K, Park JH et al. (2018) The era of immunogenomics/immunopharmacogenomics. J Hum Genet 63(8):865–875. 10.1038/s10038-018-0468-1 [DOI] [PubMed] [Google Scholar]
- 7.Giles JB, Brill DA, Chavoya A et al. (2018) Algorithms for the prediction of peptides binding to major histocompatibility complex (MHC) molecules. In: Taylor JC (ed) Advances in chemistry research, vol 46. Nova Science Publishers, New York, pp 59–94 [Google Scholar]
- 8.Jurtz V, Paul S, Andreatta M et al. (2017) NetMHCpan-4.0: improved peptide-MHC class i interaction predictions integrating eluted ligand and peptide binding affinity data. J Immunol 199:3360–3368 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Dash P, Fiore-Gartland AJ, Hertz T et al. (2017) Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature 547:89–93 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Antes I, Siu SW, Lengauer T (2006) DynaPred: a structure and sequence based method for the prediction of MHC class I binding peptide sequences and conformations. Bioinformatics 22(14):e16–e24. 10.1093/bioinformatics/btl216 [DOI] [PubMed] [Google Scholar]
- 11.Morningstar-Kywi N, Wang K, Asbell TR et al. (2022) Prediction of water distributions and displacement at protein-ligand interfaces. J Chem Inf Model 62(6):1489–1497. 10.1021/acs.jcim.1c01266 [DOI] [PubMed] [Google Scholar]
- 12.Berthold MR et al. (2007) KNIME: the Konstanz information miner. In: Studies in classification, data analysis, and knowledge organization (GfKL 2007). Springer. ISBN: 978-3-540-78239-1. https://www.knime.com/faq#q1_1. Accessed 10 Oct [Google Scholar]
- 13.Garboczi DN, Ghosh P, Utz U et al. (1996) Structure of the complex between human T-cell receptor, viral peptide and HLA-A2. Nature 384(6605):134–141. 10.1038/384134a0 [DOI] [PubMed] [Google Scholar]
- 14.Gowthaman R, Pierce BG (2019) TCR3d: the T cell receptor structural repertoire database. Bioinformatics 35(24):5323–5325. 10.1093/bioinformatics/btz517 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Berman HM, Westbrook J, Feng Z et al. (2000) The protein data bank. Nucleic Acids Res 28: 235–242. https://www.rcsb.org. Accessed 9 Oct 2022 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Pettersen EF, Goddard TD, Huang CC et al. (2004) UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem 25(13):1605–1612. 10.1002/jcc.20084 [DOI] [PubMed] [Google Scholar]
- 17.Fuchs S. GRAVY Calculator. https://www.gravy-calculator.de/ Accessed 10 Oct
