Skip to main content
Immunology logoLink to Immunology
. 2018 Nov 22;156(2):187–198. doi: 10.1111/imm.13020

PRBAM: a new tool to analyze the MHC class I and HLA‐DR anchor motifs

Anna Mestre‐Ferrer 1, Erika Scholz 1, Jepi Humet‐Alsius 2, Iñaki Alvarez 1,
PMCID: PMC6328992  PMID: 30408168

Summary

Major histocompatibility complex (MHC) genes are highly polymorphic, which makes each MHC molecule different regarding their peptide repertoire, so they can bind and present to T lymphocytes. The increasing importance of immunopeptidomics and its use in personalized medicine in different fields such as oncology or autoimmunity demand the correct analysis of the peptide repertoires bound to human leukocyte antigen type 1 (HLA‐I) and HLA‐II molecules. Purification of the peptide pool by affinity chromatography and individual peptide sequencing using mass spectrometry techniques is the standard protocol to define the binding motifs of the different MHC‐I and MHC‐II molecules. The identification of MHC‐I binding motifs is relatively simple, but it is more complicated for MHC‐II. There are some programs that identify the anchor motifs of MHC‐II molecules. However, these programs do not identify the anchor motif correctly for some HLA‐II molecules and some anchor motifs have been deduced using subjective interpretation of the data. Here, we present a new software, called PRBAM (Peptide Repertoire‐Based Anchor Motif) that uses a new algorithm based on the peptide–MHC interactions and, using peptide lists obtained by mass spectrometry sequencing, identifies the binding motif of MHC‐I and HLA‐DR molecules. PRBAM has an easy‐to‐use interface, and the results are presented in graphics, tables and peptide lists. Finally, the fact that PRBAM uses a new algorithm makes it complementary to other existing programs.

Keywords: antigen presentation, bioinformatics, MHC/HLA, peptidome, proteomics


Abbreviations

DCs

dendritic cells

DMP

deviation from the mean in the proteome

HLA

human leukocyte antigen

MHC‐I

class I MHC molecule

MHC‐II

class II MHC molecule

MHC

major histocompatibility complex

MS

mass spectrometry

pHLA‐DR

peptide–HLA‐DR complexes

ROC

receiver operating characteristic

β2m

β 2‐microglobulin

Introduction

Major histocompatibility complex (MHC; human leukocyte antigen, HLA, in humans) is a gene‐dense DNA region that contains some of the most polymorphic genes in the genome. The association of several HLA alleles with different human diseases is well known.1 Some of the genes present in the MHC encode two types of heterodimeric glycoproteins: MHC class I (MHC‐I), composed of an α (or heavy) chain and an invariant subunit called β 2‐microglobulin (β 2m), the latter encoded outside the MHC; and MHC class II molecules (MHC‐II), composed of α and β subunits encoded by genes located in the class II region of the MHC. The MHC is polygenic with several genes encoding classical MHC molecules: A, B and C for class I and DR, DP and DQ for class II. MHC‐I are expressed on the surface of almost all nucleated cells, whereas MHC‐II are only present in antigen‐presenting cells: dendritic cells (DCs), macrophages, B cells and also in thymic epithelial cells. MHC‐I molecules principally bind peptides generated during the catabolism of endogenous proteins located in the cytosol and nucleus and present them to CD8+ T lymphocytes. MHC‐II proteins mainly present peptides generated from the degradation of proteins of the endocytic pathway to CD4+ T lymphocytes.

The MHC‐I α chain and the MHC‐II α and β chains contain a short intracellular part, a transmembrane domain and a large extracellular segment. The two most distal extracellular domains of the heavy chain form the peptide binding site, which in MHC‐II is composed by the most distal domains of the α and β chains. Both class I and class II binding sites are structurally very similar, forming a groove with a floor composed of eight anti‐parallel β strands and two α helices acting as the walls of the structure. The ends of the MHC‐II groove are open, and the peptides normally extend outside the cleft, whereas in MHC‐I both ends are closed and the N and C termini of the peptide are anchored in the groove. Natural ligands of MHC‐I molecules are 8‐ to 13‐mer in length, most of them 9‐mer. The central part of the MHC‐I peptide ligands longer than nine residues bulges out the groove, maintaining the N and C termini anchored to the binding site.2, 3, 4 MHC‐II molecules normally have an amino acid number ranging from 12 to 18, with an average residue number of 15–16, although longer peptides are not uncommon. The core anchored to the binding site comprises nine residues, and it is usually located in the central part of the peptide. MHC‐II molecules can bind peptide families (or nested sets) that are comprised of peptides derived from the same protein, with a core interacting with the binding groove and N‐ and C‐terminal extensions at both ends.

The MHC binding groove contains some cavities or pockets, with different physicochemical features, where the side chains of specific residues of the peptide interact with the side chains of specific residues of the MHC molecule. The anchor residues in contact with MHC molecules define the binding motif of a specific MHC allotype. Normally, side chains of residues with certain physicochemical features can be accommodated to certain pockets. Hence, some pockets only accept hydrophobic residues, others have preference for basic or acidic residues, others will allow only small polar residues, etc. Nevertheless, the pockets are highly variable in terms of stringency. Therefore, some pockets can bind only one or two amino acids while others almost lack any restriction.

As the pockets usually allow different amino acids and the MHC‐II ligands are longer than the binding core, several theoretical peptide cores of a specific MHC‐II peptide ligand can theoretically interact with the binding groove of a specific MHC‐II molecule. Hence, some peptides can bind an MHC‐II molecule using several core registers.5 However, most of the crystal structures containing a specific MHC‐II molecule with a unique ligand showed that the peptide usually interacts with the MHC‐II allotype using a unique core.6, 7

Most of the HLA‐II–peptide interactions have been studied with HLA‐DR molecules. The HLA‐DR α chain is conserved and all polymorphic residues of the binding site are located in the β1 domain. The peptide anchor residues to HLA‐DR are normally those located in the P1, P4, P6 and P9 core positions. Importantly, the P1 pocket allows aliphatic and aromatic hydrophobic residues.8

Binding assays have been used in different seminal studies to define the anchor motifs of HLA‐DR molecules.9, 10 These binding assays often use peptides that are identical except for some residues in specific positions. Analyzing the affinity of each peptide, it is possible to define the anchor motif of a specific HLA‐DR molecule. However, these experiments require the use of many different peptides. Ideally, a total of 209 (512 000 000 000) peptides should be used to cover all the binding possibilities and makes it impossible to be performed. This approach can be used in combination with tetramer technology, as used to define the HLA‐DRB1*14:01 binding motif.11 Peptide libraries have been used for more than two decades,12 in which random peptides were cloned, using phage vectors, and their binding screened with the MHC molecule in different rounds of selection to describe the binding motif. However, the HLA‐I binding motif obtained using this approach has been reported to differ in the C‐terminal residue from that obtained from eluted peptides.13 ‘In‐pool’ Edman sequencing of the peptides eluted from HLA‐I molecules have been successfully used.14 However, this technology is difficult to use for MHC‐II molecules because the position of the core regarding the first peptide residue can change between different peptides.

Another method to identify the peptide binding motifs of MHC‐I and MHC‐II molecules is analysis by mass spectrometry (MS) of individual peptides contained in the repertoires of a specific allotype. These repertoires are normally obtained after the purification by affinity chromatography of the MHC molecules and ulterior elution of the peptide pools bound to them. In humans, the most extended cell model to purify these peptide pools are lymphoblastoid cells, which are normally homozygous for the MHC molecule of interest. For HLA‐I, the cell line HMy2.C1R (with very low expression of HLA‐I) transfected with the gene encoding the molecule of interest has also been used.15, 16, 17 In addition, for HLA‐DR, BLS cells (with no expression of HLA‐II) transfected with the HLA‐DRA and HLA‐DRB1 genes have been used.18, 19 ‘In‐pool’ Edman sequencing and MS peptide sequencing have the advantage of using the real peptide repertoires found in nature. Indeed, MS peptide sequencing is the unique technique (together to Edman degradation for some high‐abundance peptides) that uses individual natural peptide ligands. MS peptide sequencing is now the more useful method to identify peptides presented by MHC molecules. With this technology, several hundred or thousands of different sequences can be obtained and it is important to analyze them correctly.

There are some bioinformatics tools that calculate the theoretical affinity of peptides to MHC molecules, including PROPRED,20 RANKPEP21, 22 and SVRMHC23 TEPITOPE,24 TEPITOPEpan.25 Probably, the most efficient tools are the NetMHC 4.0 (http://www.cbs.dtu.dk/services/NetMHC/) and NetMHCIIpan 3.2 (www.cbs.dtu.dk/services/NetMHCIIpan-3.2) for class I and class II, respectively. NetMHCIIpan 3.2 is an actualization of previous versions constructed using an extended data set of quantitative MHC–peptide binding affinity data obtained from the Immune Epitope Database covering HLA‐DR, HLA‐DQ, HLA‐DP and H‐2 mouse molecules.26 NetMHCIIpan contains a single universal network based on artificial neural networks trained on > 100 000 quantitative peptide binding affinity measurements from the Immune Epitope Database27 and it can predict peptide binding affinities for all MHC molecules of known protein sequence.

Although the evident power of these tools, the use of real MHC ligands and new algorithms can complement the use of binding assays data. Hence, it was described that NetMHCIIpan 3.1 assigned, for most of the peptides eluted from HLA‐DRB1*15:01, a higher theoretical affinity to HLA‐DRB5*01:01 than to the molecule from which they were eluted. A new algorithm based on the anchor motifs deduced from the peptide repertoires was developed, giving a core assignation similar to that obtained with NetMHCIIpan and a better allele assignation.19 Therefore, data from new algorithms based on MHC peptidomes can be complementary to those provided from bioinformatics tools based in binding assays data.

Here, we describe a new software, called PRBAM (Peptide Repertoire‐Based Anchor Motif) to define the anchor motif of MHC‐I and HLA‐DR molecules based on the MS‐sequenced peptide repertoires associated with them. The program is based on the statistical analysis of the peptide lists obtained by MS after the purification of the peptide pools bound to MHC‐I or HLA‐DR molecules. The program comprises two modules, one for MHC‐I and the other for HLA‐DR. For MHC‐I, the analysis generates the anchor motifs of 8‐, 9‐, 10‐, 11‐, 12‐ and 13‐mer. To find the HLA‐DR core, the program processes the data using consecutive cycles of refinement based on the relative abundance of the residues present in each core positions found in the previous cycle. Hence, each cycle refines the previous one, generating a final frequency distribution of each amino acid in each anchor position. The program generates different informative screens: (i) a graphical representation of the abundance of each amino acid in each core position regarding their abundance in the proteome; (ii) the previous information in table format; (iii) in the case of HLA‐DR, a list with the selected core for each peptide and the corresponding score.

Materials and methods

PRBAM is a web application written in PHP 5.3, HTML and JavaScript on backend and frontend, respectively. The application core was based on codeigniter v2.1.4 (http://www.codeigniter.com), which is a web framework that uses the MVC (Model‐View‐Controller) design pattern to organize the code. codeigniter was created by EllisLab (http://www.ellislab.com) and it is now a project of the British Columbia Institute of Technology (http://www.bcit.ca/cas/computing). From a backend dependencies perspective, report generation is the most complete part of PRBAM, because it needs three basic components to create them. The PHP libraries used to do this task are the following: (i) phpexcel (http://www.codeplex.com/PHPExcel) for excel spreadsheet creation; (ii) jpgraph (http://jpgraph.net) for graph plotting; and (iii) FPDF (http://www.fpdf.org) for PDF file creation. On the frontend side, the entire application used jquery v1.8.3 (http://jquery.com) and bootstrap v3.1.1 (http://getbootstrap.com) as an HTML, CSS and JavaScript framework. For chart plotting flot (http://www.flotcharts.org) was used, which is a JavaScript plotting library for jquery.

The program algorithm is described in the Results section.

Statistical analysis

The statistical analysis was made as previously described.28 Briefly, the frequency of each residue at each peptide position (fobs) was compared with the frequency of the same amino acid in the database (fexp) under the null hypothesis that fobs ≤ fexp. Preliminary P‐values for each residue and position were calculated assuming a binomial distribution with P = fexp. Definitive P‐values were obtained by subjecting preliminary P‐values to multiple testing correction as follows:

P=1(1P)20k

For MHC‐I, k is the number of residues of each peptide in the set tested (i.e. k = 9 in the 9‐mer set). P‐values below 0·05 were considered statistically significant. For HLA‐DR, k has always a value of 9, as it is the number of residues considered in the peptide binding core.

Results

Description of the program

PRBAM is a new bioinformatics tool located as a web application (www.prbam.org) comprising two modules: MHC‐I and HLA‐DR. The interface is simple and easy to use. The user has to select which module (MHC‐I or HLA‐DR) they want to use depending on the MHC molecule from which the peptide pool was eluted, upload a peptide list in txt or fasta format and click the SUBMIT button. A diagram of the program is shown in Fig. 1.

Figure 1.

Figure 1

Graphical representation of the PRBAM software. In the ‘Home page’, the user must choose between the MHC‐I and HLA‐DR modules, load a file in txt or fasta format containing the list with the peptide ligands of the corresponding MHC‐I or HLA‐DR molecule, and press the Submit button. For MHC‐I molecules, PRBAM calculates the Deviation from the Mean in the Proteome (DMP) values for each residue in each position of peptides from 8 to 13 residues, showing the data in graphical and table format with the corresponding score. The HLA‐DR module calculates the DMP values and also shows them as a picture and a table. Furthermore, the program generates a list of the binding cores selected for each peptide and the score of the selected binding core. Finally, in both modules, the program generates a list containing the non‐analyzed peptides.

  1. MHC‐I MODULE: This module identifies the anchor motif of a specific MHC‐I allotype. The lists can contain MHC‐I peptide ligands of any length, although only peptides with a length ranging from 8 to 13 residues will be analyzed. The program classifies them into a maximum of seven lists: six lists grouping the peptides with a specific length ranging from 8 to 13 residues and an additional one with the non‐analyzed peptides (shorter than 8‐mer or longer than 13‐mer). For each list of peptides ranging from 8 to 13 residues, the program generates a table with the relative abundance of each amino acid in each peptide position. This list is used to define the ratio between the experimental abundance of each amino acid residue in each position with the corresponding abundance in the human proteome. This ratio is named Deviation from the Mean in the Proteome (DMP). The data of the human proteome were those described previously.28 There are two different output results for each peptide length analysis: (i) charts showing amino acid frequencies in DMP; (ii) frequency table in DMP with statistically significant results highlighted (see Materials and Methods). Also, a non‐analyzed peptide list is shown. This methodology was previously used to define the anchor motifs of several HLA‐B27 subtypes29 and HLA‐B*40:01.28

  2. HLA‐DR MODULE: In contrast to MHC‐I ligands, peptides bound to any specific HLA‐DR molecule can usually have different core possibilities that could theoretically accommodate to the corresponding binding groove. However, for most of the peptides only one 9‐residue core is anchored to the binding site. To systematize and define the binding motifs of the HLA‐DR molecules a methodology based on the knowledge of how peptides interact with HLA‐DR molecules was used. Hence, some premises were considered: (i) all of the HLA‐DR molecules bind a hydrophobic residue at P1 position; (ii) the main peptide anchor residues of their ligands are located in the P1, P4, P6 and P9 positions of the anchor core; (iii) in addition to these positions, other ones can modulate the binding of the peptide and additional anchor positions can exist in some molecules; (iv) a high enough number of HLA‐DR ligands should allow the identification of the binding pattern of the peptides to a specific HLA‐DR molecule; (v) consecutive refinement cycles should make the anchor motif more clear and exact. Hence, the algorithm to assign the anchor motif for HLA‐DR requires several steps (Fig. 2):

    1. The first step of the analysis is the generation of a list containing all the possible anchor cores for each of the peptides in the list. To do that, the program selects all the possible 9‐mer from the peptide sequence. The unique restriction is the presence at P1 of one of the next hydrophobic residues: Tyr, Phe, Trp, Leu, Ile, Val, Met, Ala.

    2. The program generates a matrix with the frequency distribution of each residue in each position. This matrix is corrected with the abundance of each amino acid in the human proteome. This correction generates a table indicating the DMP of each residue in each core position (Matrix 1).

    3. PRBAM generates the Filter 1, which is made by considering the sum of the DMP values of the P4, P6 and P9 of the putative cores (Core List 1). These positions were chosen as they are the reported anchor positions for most of the HLA‐DR molecules. The P1 position was not included in this analysis as it is selected as a requisite during the selection of the cores. The positions P2, P3, P5, P7 and P8 were not included in this step because these positions are not as important as the selected positions in the interaction with the HLA‐DR molecule and, at this initial step, they could introduce noise to the analysis. The core with the higher score for each peptide is assigned as the putative binding core.

    4. The list of the core sequences selected in the previous step (Core List 2) is the source to get a new frequency distribution table and, from this, a DMP table called Matrix 2 is created.

    5. A new filter is obtained using the core list generated in step 1 (Core List 1) and the Matrix 2. The score of the Filter 2 is obtained as follows: as P1 position was not biased among the selected amino acids, and positions P2, P3, P5, P7 and P8 may affect the peptide binding and be selected in some way, P1 to P9 residues were chosen to get the new core score. Hence, the score is calculated as the sum of the relative frequencies of P1–P9 positions. The program generates a new core list (Core List 3).

    6. Steps 4 and 5 are repeated while the frequency matrix changes, with a limit of 100 cycles. Each consecutive cycle refines the results obtained in the previous cycle. The 100 cycles limit avoids the possibility of entering a diallelus in which the results can change indefinitely between two consecutive cycles.

    7. The program uses the cores obtained in the last cycle to generate the final frequency distribution and the corresponding DMP values, which is used to select the anchor core of each peptide ligand and to define the binding motif of the corresponding HLA‐DR molecule.

    8. Once the matrix with the 9‐mer cores is obtained, the statistical analysis is made as in the MHC‐I module.

    9. The data are shown as in the MHC‐I module. In addition, the selected core for each peptide is also indicated. There are three different output results: (i) charts showing amino acid frequencies in DMP; (ii) frequency table in DMP with statistically significant results highlighted; (iii) peptide list with each core and its corresponding final score. A non‐analyzed peptide list is also shown.

    Figure 2.

    Figure 2

    Algorithm used by the HLA‐DR module of PRBAM. All 9‐mer containing in P1 position Tyr, Phe, Trp, Leu, Ile, Val, Met or Ala are listed (Core list 1). The frequency distribution and DMP values are calculated for each amino acid in each core position and showed as Matrix 1. The sums of the P4, P6 and P9 DMP values are calculated creating the Filter 1, which allows the selection of the core with the higher score for each peptide. These peptides with the higher score compose the Core list 2, and are used to generate the Matrix 2. Then, the Core list 1 is applied to the Matrix 2 to generate the Filter 2 with the sum of the DMP values of the P1 to P9 positions. The Filter 2 allows the selection of the cores with the highest score, what generates the Core list 3, and the corresponding Matrix 3. This is repeated until no change is obtained in the Core list.

Data obtained in both modules can be exported as a compressed .zip file. For MHC‐I the generated report contains the following data: (i) a .pdf file with the input list, charts of all amino acid frequencies in DMP grouped according to length and the non‐analyzed peptide list; (ii) an .xlsx file with a sheet for each peptide length containing the DMP values; (iii) all generated charts as .png files; (iv) input data. For HLA‐DR the generated report includes the following data: (i) a .pdf file containing the peptide list with their corresponding cores and scores, charts of all amino acid frequencies in DMP and the non‐analyzed peptide list; (ii) an .xlsx file containing a table with the DMP values; (iii) all generated charts as .png files; and (iv) input data.

Program functionality test

The good functionality of the program was tested with previously published data. For the MHC‐I module we used the list of HLA‐B*40:01 natural ligands previously published by Marcilla et al.28 The program generated exactly the same results that had been previously described (data not shown) as it was just a systematization of the analysis made previously.28, 29

To test the correct anchor motif assignation using the HLA‐DR module, we considered at first a list of peptides eluted previously in our laboratory from HLA‐DRB1*10:01 (DR10).30 DR10 anchors aromatic residues (principally Tyr and Phe) in the P1 core position and, in lower quantities, other hydrophobic residues, mainly Ile and Leu. In P4, the most abundant residues are Leu and Ile. The preferred amino acids in P6 are Pro, Ala and Val. Finally, Asn, Ser and Asp are the major anchor residues in P9. In addition, P8 seems to be another position with restrictions, having a preference by basic residues and Tyr.30 The program identified the same anchor motif with identical anchor residues with slight differences in the DMP values (Fig. 3). To confirm the reliability of the program, a second independent peptide pool eluted from HLA‐DR1031 was analyzed with PRBAM. The result was similar to that obtained with the first data set (Fig. 3), demonstrating the proper functioning of the program.

Figure 3.

Figure 3

Test of PRBAM functionality. A first data set reported previously30 was used as proof‐of‐concept. The analysis showed a result (gray bars) similar to the anchor motif obtained previously (black bars). A second data set31 was used to test the program reliability. The anchor motif identified the same anchor residues (white bars). The final DMP values of each amino acid residue for each core position are shown.

To perform additional analysis, five peptide lists containing HLA‐DR peptide ligands were chosen: first, a data set gathering together the previous lists of HLA‐DR10 ligands, and four peptide pools obtained from HLA‐DRB1*01:01 (DR1),31 HLA‐DRB1*08:01 (DR8),32 DRB1*15:01 (DR2b)19 and DRB5*01:01 (DR2a).19 Different methodologies had been used to define the binding motif of these HLA‐DR molecules in these reports. As said above, PRBAM has a limit of 100 cycles of refinement. We studied if this number of steps is enough to obtain equilibrium. The analysis of the lists containing the peptide ligands of DR1, DR8, DR10, DR2b and DR2a generated changes in the residues, and consequently in the cores assigned during the first steps of the analysis. The final result was obtained, in cycle 6 for DR1, cycle 7 for DR8, cycle 5 for DR10, cycle 6 for DR2b and cycle 5 for DR2a (data not shown).

Comparison with NetMHCIIpan

Probably, the most used bioinformatics tool for the analysis of MHC‐II immunopeptidomes is NetMHCIIpan. This is a powerful program that predicts peptide binding to HLA‐DR, ‐DQ and ‐DP molecules, identifying the anchor core. It is based on binding assays of an extremely large number of peptides to different MHC‐II molecules and neural networks.

To evaluate the assignation of the binding cores by PRBAM, we compared the selected cores with those assigned by NetMHCIIpan from the immunopeptidomes of DR1, DR8, DR10, DR2b and DR2a. As shown in the Supplementary material (Fig. S1), the anchor motifs obtained using NetMHCIIpan and PRBAM were similar for all HLA‐DR allotypes considered. The degree of similarity of the cores chosen by both programs were evaluated. For DR1, 80·5% of the cores were identical, 58·3% for DR8, 82·3% for DR10, 76·7% for DR2b and 65·1% for DR2a (see Supplementary material, Table S1). It is important to note that most of the peptides with different cores between PRBAM and NetMHCIIpan are peptides with low affinity. An exception occurs with DR8, with the lower similarity for the chosen cores, where some cores are different among the peptides with relatively high affinity.

A Receiver Operating Characteristics (ROC) curve was carried out with the data obtained with the two peptidomes of DR10. Both peptide lists were analyzed with NetMHCIIpan and PRBAM (a matrix generated with the list made putting together both DR10 datasets was used). The affinity values (arbitrary score for PRBAM and estimated IC50 for NetMHCIIpan) were compared in ROC curves. With small differences, the curves were near the diagonal indicating that, as expected, the differences in the affinity are low when two different data sets from the same HLA‐DR molecule are analyzed with NetMHCIIpan (Fig. 4a) and PRBAM (Fig. 4b).

Figure 4.

Figure 4

Receiver operating characteristics (ROC) curves of the affinity data obtained with two different peptidomes from DR10. (a) IC 50 data for each peptide of the two DR10‐derived peptidomes obtained with NetMHCIIpan are represented in a ROC curve. (b). The score data obtained for each peptide of the two DR10‐derived peptidomes with the PRBAM matrix are represented in a ROC curve. The PRBAM matrix data were obtained with a peptide list comprising the combination of both DR10 peptidomes. The area under the curve is indicated in both panels.

On the other hand, it can be assumed that the peptides eluted from a specific HLA‐DR molecule have more affinity for it than for any other HLA‐DR allotype. To evaluate whether this is detected by both programs the average affinity of all peptides presented in each peptidome was calculated. In all cases, PRBAM calculated a higher affinity for the HLA‐DR molecule from the peptide pools obtained in comparison with any other allotype (Fig. 5, left panels). However, with NetMHCIIpan, DR1 and DR10 were the molecules with the highest average affinity independently of the origin of the peptidomes (Fig. 5, right panels).

Figure 5.

Figure 5

Average affinity of the peptidomes associated with different HLA‐DR molecules. The average affinities of the peptidomes eluted from DR1, DR8, DR10, DR2b and DR2a were calculated with PRBAM (left panel) and NetMHCIIpan (right panel). The PRBAM matrix for a specific HLA‐DR molecule was obtained using the corresponding peptide pool eluted from this molecule. Then, this matrix was used to identify the binding score for each peptide of the other peptide pools. For NetMHCIIpan, each peptide pool was analyzed for each HLA‐DR allotype. Then, the average affinities were calculated for each peptide pool and HLA‐DR molecule with both programs.

In order to evaluate the potential of both programs to distinguish the affinities between different allotypes, ROC curve analysis was performed for the different data sets obtained from the five alleles. Hence, each peptidome derived from each HLA‐DR molecule was analyzed with PRBAM and NetMHCIIpan, and the score (PRBAM) or IC50 (NetMHCIIpan) values were defined for each peptide–DR complex and represented as a ROC curve. As shown in the Supplementary material (Fig. S2), PRBAM was able to discriminate between the affinity for the molecule from whom they were eluted and the rest of the molecules. This was similar for DR1 and DR10 with NetMHCIIpan (the Supplementary material, Fig. S2a and c). However, in the case of NetMHCIIpan for DR8, DR2b and DR2a, ROC curves representing the different theoretical affinities were close to the diagonal in many cases, indicating that this program could not show significant differences in the affinity of the peptides eluted from these molecules to them or to other allotypes (see Supplementary material, Fig. S2b, d and e).

Discussion

The identification of binding cores for MHC molecules is of great importance for the prediction and identification of new T‐cell targets in different diseases (e.g. cancer, autoimmune diseases). In the case of MHC class I molecules, it is relatively direct and the anchor motifs of many molecules are known. Hence, the identification of putative epitopes, derived from protein candidates, presented to T cells in patients with a specific HLA‐I haplotype is relatively easy. In the case of HLA‐DR, the identification of the anchor motif is much more complex because it is not easy to select the correct peptide register that is interacting with the HLA‐DR molecule. Hence, the correct binding motifs of many HLA‐DR alleles abundant in the population remain still unclear, not well defined or deduced from a few peptides in a subjective way.

PRBAM is a new tool developed, using a rationale and automatic methodology based on the interactions of peptide with MHC molecules, to decipher the anchor motifs of specific MHC‐I and HLA‐DR molecules from eluted and MS‐sequenced peptides. The program is easy to use and direct. The entry format is a peptide list in txt or fasta format and the output consists of graphics and tables indicating the variation in the frequency of any amino acid residue in any of the interacting core positions. The abundance of each residue is normalized with the abundance of this specific amino acid in the proteome. The resulting parameter, named DMP, is more consistent than the experimental relative abundance of an amino acid as it represents the real increase of an amino acid in relation to the expected abundance, avoiding the selection of some residues as favored when they are simply more abundant in the human proteome. To our knowledge, this is the first program to make this correction, although it has been made manually for some HLA‐I28, 29 and HLA‐DR peptide repertoires.19, 30

PRBAM contains two independent modules: MHC‐I and HLA‐DR. The MHC‐I module analyzes individually peptides from 8 to 13 residues. Longer peptide ligands bulge outside the binding groove, making the correlation of the anchor positions more difficult for peptides with different lengths. Most of the peptides bound to MHC‐I molecules are 9‐mer and the abundance of 8‐, 10‐ and 11‐, 12‐ and 13‐mer can also be relevant. Peptides shorter than 7‐mer or longer than 13‐mer are not considered because their abundance is very low among the HLA‐I ligands.

The automatic analysis of the HLA‐DR repertoires is more complex because the program must choose the binding core from among the different 9‐mer contained in the peptide sequence. To solve it, the algorithm is based on consecutive cycles of analysis, using the matrix generated in the previous cycles, which allows the refinement of the motif generated in each cycle.

The program selects cores with one of the next amino acids in P1: Tyr, Phe, Trp, Leu, Ile, Val, Met or Ala. These residues have been previously described to interact in the P1 pocket of the studied HLA‐DR molecules.8, 9, 10, 33 The residues forming this pocket are highly conserved. The unique polymorphism reported to have influence in the residues bound in P1 is a dimorphism at the position 86 of the DRβ chain. Hence, with some exceptions in a few, very infrequent, alleles, the DRβ86 position contains Gly or Val, which influences the P1 specificity.34, 35, 36 The alleles containing Gly86 have a predominance of Phe1, Trp1 and Tyr1, compared with the alleles containing Val86, which have a preference for smaller hydrophobic residues.37, 38, 39 However, the presence of Gly86 or Val86 does not exclude the binding of peptides with any of the hydrophobic residues at the core P1 position. Hence, peptides with Phe1, Trp1 or Tyr1 can bind to those allotypes containing Val86 and peptides with small hydrophobic residues can bind to the HLA‐DR molecules with Gly86. It is possible that other residues may interact with the HLA‐DR molecule in the core P1 position, but the number of peptides with a non‐hydrophobic residue in P1 must be very low and should not alter the binding motif.

The first step of the analysis is of great importance. Usually, there are several putative cores for each peptide and, usually, only one is correct. Therefore, the first frequency distribution is generated from a list of 9‐mer containing many ‘wrong’ cores. To avoid the noise coming from them, at this first step only the sum of the frequencies of positions P4‐P6‐P9 was considered. These positions are the usual anchors for HLA‐DR molecules. Hence, the use of these limited positions should introduce less error in this first step than if all core positions were considered. A wrong selection of the core at this first step would probably amplify errors during the next steps. As other positions can somehow modulate the binding of the peptide, during the next refinement cycles all peptide core positions are considered. The program was able to identify the anchor motifs of DR1, DR8, DR10, DR2b and DR2a in a similar way to NetMHCIIpan, probably the program of reference, confirming the correct design of the algorithm.

The rationale of the program assumes that the anchor motif can be refined with consecutive cycles of analysis. We have used a maximum of 100 cycles. This seems to be enough, as in the peptide lists used, no change in the residue usage was obtained further than cycle 7. The limit of 100 cycles is due to the possibility that one change can be alternatively obtained in consecutive cycles, making it impossible to arrive at the final assignation of a residue (for example, if in a cycle the program chooses a Tyr, the next cycle a Phe, the next cycle a Tyr, Phe, Tyr, etc.).

Unlike other programs, PRBAM uses an algorithm based on the knowledge of the features of the HLA‐DR‐peptide interactions (i.e. the presence of hydrophobic residues at P1, or the assignation of P1‐P4‐P6‐P9 as the putative anchor residues). In addition, the program normalizes the data obtained in each step with the relative abundance of the amino acid residues in the human proteome. These points of the analysis make PRBAM a powerful tool regarding other bioinformatics tools.

The α chains of HLA‐DP and HLA‐DQ are polymorphic and the residue restriction in P1 is more variable than in HLA‐DR. As a result, HLA‐DQ7 (HLA‐DQA1*03:01/DQB1*03:01) and HLA‐DQ8 (HLA‐DQA1*03:01/DQB1*03:02) are able to bind acid residues at P1.40, 41 Hence, although PRBAM can probably be used for many HLA‐II molecules other than HLA‐DR, we prefer to be conservative and consider its use only for MHC‐I and HLA‐DR.

It is important to emphasize the difficulty of defining the anchor motif for many HLA‐DR molecules when using homozygous cells due to the fact that, in some haplotypes, different HLA‐DRβ genes are expressed. Some haplotypes, such as those containing the molecules DR1, DR8 or DR10, only express the gene product encoded in the DRB1 locus. On the other hand, some DRB1 alleles are simultaneously expressed together with DRB3, DRB4 or DRB5 alleles. Hence, DR3, DR5 and DR6 are expressed together with DRB3, DR4, DR7 and DR9 with DRB4, and DR15 and DR16 with DRB5. The presence of the products of two genes implies that the peptides identified from these cells after the purification of the peptide–HLA‐DR (pHLA‐DR) complexes using a non‐allele‐specific monoclonal antibody (such as the most used L243 and B8.11.2) come from two different HLA‐DR molecules. Regardless of the algorithm used by any program, for those HLA‐DR haplotypes expressing two different molecules new experimental strategies that allow the purification of each HLA‐DR repertoire separately are required.

The comparison with NetMHCIIpan showed that both programs are similar in the identification of the anchor motifs. However, the identification of the binding cores for individual peptides differed in some of them. For DR1 and DR10, the coincidence was > 80%, for DR2b > 75% and for DR2a it was > 65%, whereas for DR8 it was < 60%. Therefore, although the anchor motif can be similar using NetMHCIIpan and PRBAM, different binding cores are obtained for some individual peptides. In our opinion, PRBAM identifies some of these cores better than NetMHCIIpan. For example, Asp is favored in P9 for the peptides bound to DR8. Many of the peptides eluted from DR8 with different binding cores obtained with PRBAM and NetMHCIIpan present an Asp in the core selected by PRBAM and other amino acids with NetMHCIIpan. These data strongly suggest that PRBAM shows an advance in the selection of the binding cores for some HLA‐DR allotypes. The average affinity calculated for DR1 and DR10 by NetMHCIIpan was higher than for DR2b and DR2a (Fig. 5). Finally, the peptides eluted from DR8 presented the lowest theoretical affinity for DR8 than the other peptidomes for the corresponding allotype (Fig. 5). In fact, NetMHCIIpan calculated a higher affinity for DR1 and DR10 of all peptidomes independently from the HLA‐DR molecule that was eluted. In the case of PRBAM, the higher average affinity of the peptidomes was always for the HLA‐DR molecule from which the peptides were identified. In our opinion it can affect the binding cores selected by NetMHCIIpan, which is excellent for the alleles with a higher theoretical affinity but probably is less efficient for DR8 and other alleles. In a previous work19 the binding motifs of DR2a and DR2b, both present in the HLA‐DR15 haplotype, were identified. When the theoretical binding affinity of the peptides eluted from DR2a was calculated with NetMHCIIpan, 89% of the peptides presented a higher theoretical affinity to DR2a in comparison with DR2b. However, the same analysis for the DR2b‐eluted peptides showed that only 43% of the peptides had a higher theoretical affinity for DR2b than for DR2a.Hence, NetMHCIIpan overestimates the binding affinity for some alleles and underestimates it for others. In the presented manuscript, that can be seen for DR1 and DR10, which present the higher affinity for all the analyzed peptide pools, regardless of the HLA‐DR molecule from which they were eluted (Fig. 5). In addition, the theoretical binding affinity of many peptides eluted from DR8 calculated with NetMHCIIpan had low affinity. The average binding affinity is higher than 3422 μm, which is normally considered as being non‐binding. This can be also be seen with the ROC curves, where those obtained for DR1 and DR10 discriminate very well regarding the other peptidomes, but in the case of DR8 all the curves are close to the diagonal, indicating that NetMHCIIpan cannot discriminate the affinity of the peptides bound to DR8 regarding other HLA‐DR molecules (see Supplementary material, Fig. S2).

In conclusion, PRBAM is an easy‐to‐use online software covering the essential requirement of a program for the analysis of the high number of MHC peptide ligands identified by MS to obtain the binding motif of specific MHC molecules. The program uses for HLA‐DR, unlike other statistical approaches, a rationale based on the features of the peptide–MHC interaction. Finally, the use of this software may generate a systematic interpretation of peptide lists obtained by MS and can be complementary to other programs based on experimental binding affinities such as NetMHCIIpan.

Author contributions

IA conceived and designed the project. AMF, ES and IA performed the data analysis. JHA programmed the software.

Disclosures

The authors declare that they have no conflict of interests.

Supporting information

Figure S1. Comparison of the peptide binding motifs obtained with PRBAM and NetMHCIIpan.

Figure S2. Receiver operating characteristics curves of the affinity data obtained with the peptidomes eluted from DR1, DR8, DR10, DR2b and DR2a.

Table S1. Peptidomes eluted from DR1, DR8, DR10, DR2b and DR2a.

Acknowledgements

This work was supported by Grant SAF2015‐66399‐R from the Spanish Ministry of Economy and Competitiveness.

References

  • 1. Trowsdale J, Knight JC. Major histocompatibility complex genomics and human disease. Annu Rev Genomics Hum Genet 2013; 14:301–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Guo HC, Jardetzky TS, Garrett TP, Lane WS, Strominger JL, Wiley DC. Different length peptides bind to HLA‐Aw68 similarly at their ends but bulge out in the middle. Nature 1992; 360:364–6. [DOI] [PubMed] [Google Scholar]
  • 3. Speir JA, Stevens J, Joly E, Butcher GW, Wilson IA. Two different, highly exposed, bulged structures for an unusually long peptide bound to rat MHC class I RT1‐Aa. Immunity 2001; 14:81–92. [DOI] [PubMed] [Google Scholar]
  • 4. Tynan FE, Borg NA, Miles JJ, Beddoe T, El‐Hassen D, Silins SL et al High resolution structures of highly bulged viral epitopes bound to major histocompatibility complex class I. Implications for T‐cell receptor engagement and T‐cell immunodominance. J Biol Chem 2005; 280:23900–9. [DOI] [PubMed] [Google Scholar]
  • 5. Mohan JF, Unanue ER. Unconventional recognition of peptides by T cells and the implications for autoimmunity. Nat Rev Immunol 2012; 12:721–8. [DOI] [PubMed] [Google Scholar]
  • 6. Hennecke J, Wiley DC. Structure of a complex of the human α/β T cell receptor (TCR) HA1.7, influenza hemagglutinin peptide, and major histocompatibility complex class II molecule, HLA‐DR4 (DRA*0101 and DRB1*0401): insight into TCR cross‐restriction and alloreactivity. J Exp Med 2002; 195:571–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Stern LJ, Brown JH, Jardetzky TS, Gorga JC, Urban RG, Strominger JL et al Crystal structure of the human class II MHC protein HLA‐DR1 complexed with an influenza virus peptide. Nature 1994; 368:215–21. [DOI] [PubMed] [Google Scholar]
  • 8. Southwood S, Sidney J, Kondo A, del Guercio MF, Appella E, Hoffman S et al Several common HLA‐DR types share largely overlapping peptide binding repertoires. J Immunol 1998; 160:3363–73. [PubMed] [Google Scholar]
  • 9. Geluk A, van Meijgaarden KE, Southwood S, Oseroff C, Drijfhout JW, de Vries RR et al HLA‐DR3 molecules can bind peptides carrying two alternative specific submotifs. J Immunol 1994; 152:5742–8. [PubMed] [Google Scholar]
  • 10. Sette A, Sidney J, Oseroff C, del Guercio MF, Southwood S, Arrhenius T et al HLA DR4w4‐binding motifs illustrate the biochemical basis of degeneracy and specificity in peptide‐DR interactions. J Immunol 1993; 151:3163–70. [PubMed] [Google Scholar]
  • 11. James EA, Moustakas AK, Berger D, Huston L, Papadopoulos GK, Kwok WW. Definition of the peptide binding motif within DRB1*1401 restricted epitopes by peptide competition and structural modeling. Mol Immunol 2008; 45:2651–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Hammer J, Takacs B, Sinigaglia F. Identification of a motif for HLA‐DR1 binding peptides using M13 display libraries. J Exp Med 1992; 176:1007–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Davenport MP, Smith KJ, Barouch D, Reid SW, Bodnar WM, Willis AC et al HLA class I binding motifs derived from random peptide libraries differ at the COOH terminus from those of eluted peptides. J Exp Med 1997; 185:367–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Rojo S, Garcia F, Villadangos JA, Lopez de Castro JA. Changes in the repertoire of peptides bound to HLA‐B27 subtypes and to site‐specific mutants inside and outside pocket B. J Exp Med 1993; 177:613–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Alvarez I, Marti M, Vazquez J, Camafeita E, Ogueta S, Lopez de Castro JA. The Cys‐67 residue of HLA‐B27 influences cell surface stability, peptide specificity, and T‐cell antigen presentation. J Biol Chem 2001; 276:48740–7. [DOI] [PubMed] [Google Scholar]
  • 16. Marti M, Alvarez I, Montserrat V, Lopez de Castro JA. Large sharing of T‐cell epitopes and natural ligands between HLA‐B27 subtypes (B*2702 and B*2705) associated with spondyloarthritis. Tissue Antigens 2001; 58:351–62. [DOI] [PubMed] [Google Scholar]
  • 17. Ramos M, Alvarez I, Garcia‐del‐Portillo F, Lopez de Castro JA. Minimal alterations in the HLA‐B27‐bound peptide repertoire induced upon infection of lymphoid cells with Salmonella typhimurium . Arthritis Rheum 2001; 44:1677–88. [DOI] [PubMed] [Google Scholar]
  • 18. Mohme M, Hotz C, Stevanovic S, Binder T, Lee JH, Okoniewski M et al HLA‐DR15‐derived self‐peptides are involved in increased autologous T cell proliferation in multiple sclerosis. Brain 2013; 136:1783–98. [DOI] [PubMed] [Google Scholar]
  • 19. Scholz EM, Marcilla M, Daura X, Arribas‐Layton D, James EA, Alvarez I. Human leukocyte antigen (HLA)‐DRB1*15:01 and HLA‐DRB5*01:01 present complementary peptide repertoires. Front Immunol 2017; 8:984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Singh H, Raghava GP. ProPred: prediction of HLA‐DR binding sites. Bioinformatics 2001; 17:1236–7. [DOI] [PubMed] [Google Scholar]
  • 21. Reche PA, Glutting JP, Reinherz EL. Prediction of MHC class I binding peptides using profile motifs. Hum Immunol 2002; 63:701–9. [DOI] [PubMed] [Google Scholar]
  • 22. Reche PA, Glutting JP, Zhang H, Reinherz EL. Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles. Immunogenetics 2004; 56:405–19. [DOI] [PubMed] [Google Scholar]
  • 23. Wan J, Liu W, Xu Q, Ren Y, Flower DR, Li T. SVRMHC prediction server for MHC‐binding peptides. BMC Bioinformatics 2006; 7:463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Sturniolo T, Bono E, Ding J, Raddrizzani L, Tuereci O, Sahin U et al Generation of tissue‐specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nat Biotechnol 1999; 17:555–61. [DOI] [PubMed] [Google Scholar]
  • 25. Zhang L, Chen Y, Wong HS, Zhou S, Mamitsuka H, Zhu S. TEPITOPEpan: extending TEPITOPE for peptide binding prediction covering over 700 HLA‐DR molecules. PLoS ONE 2012; 7:e30483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Jensen KK, Andreatta M, Marcatili P, Buus S, Greenbaum JA, Yan Z et al Improved methods for predicting peptide binding affinity to MHC class II molecules. Immunology 2018; 154:394–406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Vita R, Overton JA, Greenbaum JA, Ponomarenko J, Clark JD, Cantrell JR et al The immune epitope database (IEDB) 3.0. Nucleic Acids Res 2015; 43:D405–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Marcilla M, Alpizar A, Lombardia M, Ramos‐Fernandez A, Ramos M, Albar JP. Increased diversity of the HLA‐B40 ligandome by the presentation of peptides phosphorylated at their main anchor residue. Mol Cell Proteomics 2014; 13:462–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Lopez de Castro JA, Alvarez I, Marcilla M, Paradela A, Ramos M, Sesma L et al HLA‐B27: a registry of constitutive peptide ligands. Tissue Antigens 2004; 63:424–45. [DOI] [PubMed] [Google Scholar]
  • 30. Alvarez I, Collado J, Daura X, Colome N, Rodriguez‐Garcia M, Gallart T et al The rheumatoid arthritis‐associated allele HLA‐DR10 (DRB1*1001) shares part of its repertoire with HLA‐DR1 (DRB1*0101) and HLA‐DR4 (DRB*0401). Arthritis Rheum 2008; 58:1630–9. [DOI] [PubMed] [Google Scholar]
  • 31. Scholz E, Mestre‐Ferrer A, Daura X, Garcia‐Medel N, Carrascal M, James EA et al A comparative analysis of the peptide repertoires of HLA‐DR molecules differentially associated with rheumatoid arthritis. Arthritis Rheumatol 2016; 68:2412–21. [DOI] [PubMed] [Google Scholar]
  • 32. Muixi L, Gay M, Munoz‐Torres PM, Guitart C, Cedano J, Abian J et al The peptide‐binding motif of HLA‐DR8 shares important structural features with other type 1 diabetes‐associated alleles. Genes Immun 2011; 12:504–12. [DOI] [PubMed] [Google Scholar]
  • 33. Vogt AB, Kropshofer H, Kalbacher H, Kalbus M, Rammensee HG, Coligan JE et al Ligand motifs of HLA‐DRB5*0101 and DRB1*1501 molecules delineated from self‐peptides. J Immunol 1994; 153:1665–73. [PubMed] [Google Scholar]
  • 34. Busch R, Hill CM, Hayball JD, Lamb JR, Rothbard JB. Effect of natural polymorphism at residue 86 of the HLA‐DR β chain on peptide binding. J Immunol 1991; 147:1292–8. [PubMed] [Google Scholar]
  • 35. Demotz S, Barbey C, Corradin G, Amoroso A, Lanzavecchia A. The set of naturally processed peptides displayed by DR molecules is tuned by polymorphism of residue 86. Eur J Immunol 1993; 23:425–32. [DOI] [PubMed] [Google Scholar]
  • 36. Newton‐Nash DK, Eckels DD. Differential effect of polymorphism at HLA‐DR1 β‐chain positions 85 and 86 on binding and recognition of DR1‐restricted antigenic peptides. J Immunol 1993; 150:1813–21. [PubMed] [Google Scholar]
  • 37. Brown JH, Jardetzky TS, Gorga JC, Stern LJ, Urban RG, Strominger JL et al Three‐dimensional structure of the human class II histocompatibility antigen HLA‐DR1. Nature 1993; 364:33–9. [DOI] [PubMed] [Google Scholar]
  • 38. Dessen A, Lawrence CM, Cupo S, Zaller DM, Wiley DC. X‐ray crystal structure of HLA‐DR4 (DRA*0101, DRB1*0401) complexed with a peptide from human collagen II. Immunity 1997; 7:473–81. [DOI] [PubMed] [Google Scholar]
  • 39. Jardetzky TS, Brown JH, Gorga JC, Stern LJ, Urban RG, Strominger JL et al Crystallographic analysis of endogenous peptides associated with HLA‐DR1 suggests a common, polyproline II‐like conformation for bound peptides. Proc Natl Acad Sci U S A 1996; 93:734–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Godkin A, Friede T, Davenport M, Stevanovic S, Willis A, Jewell D et al Use of eluted peptide sequence data to identify the binding characteristics of peptides to the insulin‐dependent diabetes susceptibility allele HLA‐DQ8 (DQ 3.2). Int Immunol 1997; 9:905–11. [DOI] [PubMed] [Google Scholar]
  • 41. Sidney J, Oseroff C, del Guercio MF, Southwood S, Krieger JI, Ishioka GY et al Definition of a DQ3.1‐specific binding motif. J Immunol 1994; 152:4516–25. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1. Comparison of the peptide binding motifs obtained with PRBAM and NetMHCIIpan.

Figure S2. Receiver operating characteristics curves of the affinity data obtained with the peptidomes eluted from DR1, DR8, DR10, DR2b and DR2a.

Table S1. Peptidomes eluted from DR1, DR8, DR10, DR2b and DR2a.


Articles from Immunology are provided here courtesy of British Society for Immunology

RESOURCES