Summary
Several mechanisms exist to avoid or suppress inflammatory T‐cell immune responses that could prove harmful to the host due to targeting self‐antigens or commensal microbes. We hypothesized that these mechanisms could become evident when comparing the immunogenicity of a peptide from a pathogen or allergen with the conservation of its sequence in the human proteome or the healthy human microbiome. Indeed, performing such comparisons on large sets of validated T‐cell epitopes, we found that epitopes that are similar with self‐antigens above a certain threshold showed lower immunogenicity, presumably as a result of negative selection of T cells capable of recognizing such peptides. Moreover, we also found a reduced level of immune recognition for epitopes conserved in the commensal microbiome, presumably as a result of peripheral tolerance. These findings indicate that the existence (and potentially the polarization) of T‐cell responses to a given epitope is influenced and to some extent predictable based on its similarity to self‐antigens and commensal antigens.
Keywords: bioinformatics, epitopes, T‐cell recognition
Abbreviations
- HMP
Human Microbiome Project
- IFN‐γ
interferon‐γ
- PBMC
peripheral blood mononuclear cell
- SFC
spot‐forming cell
Introduction
Several methods are available that can accurately predict the binding of peptides to MHC class I and class II molecules.1, 2, 3, 4, 5 Binding to an MHC molecule is an essential, though not sufficient, criterion for a peptide to be recognized by T cells as an immune target. Other factors play a role, such as the ability of an MHC class I binding peptide to be processed from its source protein,6, 7, 8, 9, 10, 11 and the amino acid composition of the peptide, which has been linked to immunogenicity presumably by some residues being more visible to the T‐cell receptor.12, 13 However, there are significant factors influencing T‐cell immunogenicity beyond the factors mentioned above that remain unknown. This is particularly evident for MHC class II restricted epitopes, where binding predictions correlate well with measured binding affinities,5 but when attempting to predict immunogenic peptides the performance is far from perfect.14 We have recently demonstrated that a combination of HLA class II binding predictions selecting a set of the top 20% candidate peptides will cover 50% of the immune response.15 Although this is practically very useful, it also demonstrates that mechanisms beyond MHC binding affinity shape immune recognition patterns.14
One effect that is expected to influence the immunogenicity of a given peptide is the suppression of immune responses that could be harmful to the host. T cells reacting to peptides conserved in the human proteome are expected to be deleted by negative selection during T‐cell maturation. In addition, it has been postulated that inflammatory T cells reactive to peptides found in commensal microorganisms will be suppressed by regulatory T cells – a mechanism called peripheral tolerance. However, the extent to which these mechanisms imprint on the T‐cell immune repertoire and thereby impact the immune recognition of peptides from pathogens or allergens has not been systematically analysed or quantified.
In this study, we examined if there is evidence for tolerization in immune recognition patterns by correlating the immune responses of peptides from bacterial pathogens and allergens with the sequence conservation of these peptides in the human proteome and in proteins identified in the Human Micro‐biome Project (HMP). We find that there is evidence for both.
Materials and methods
Peptide immunogenicity data set assembly
Peptides from three independent studies were used. The first data set consisted of 15‐mer peptides from Mycobacterium tuberculosis antigens tested for recognition in interferon‐γ (IFN‐γ) ELISPOT assays using peripheral blood mononuclear cells (PBMCs) from individuals latently infected with M. tuberculosis.16, 17 The second data set consisted of 16‐mer peptides overlapping antigens included in the Bordetella pertussis vaccine, tested for recognition in IFN‐γ ELISPOT assays by PBMCs from previously vaccinated individuals (Bancroft et al., manuscript under review). The third data set consisted of 15‐mer peptides contained in antigens encoded in cockroach proteins and tested for recognition in interleukin‐5 (IL‐5) ELISPOTs by PBMCs from allergic individuals.18, 19 All assays were performed in triplicates and included media‐stimulated cells as a background control. Individual experiment were considered positive if the number of spot‐forming cells (SFC) above background was > 20 per million input cells and the number of IFN‐γ‐producing cells after peptide stimulation was significantly above background based on a Student's t‐test P‐value < 0·05 and a stimulation index > 2·0. This combination of positivity criteria is commonly used in our laboratory, including in the studies cited above. To classify a peptide overall as positive in the donor cohort, the SFC from individual experiments that met the positivity criteria were added up, and the total SFC as well as the number of positive experiments were used as positivity cut offs as specified in the Results section.
Human proteome and microbiome sequence data set assembly
Protein sequences from the human proteome were downloaded from UniProt (www.uniprot.org)20 using the query: keyword: ‘Complete proteome’ AND organism: ‘Homo sapiens (Human) [9606]’. The sequences were downloaded using the ‘Download’ option, choosing ‘Download all’ and Format: fasta (canonical & isoform). Protein sequences from the human gut microbiome were retrieved from the HMP.21, 22 The data were downloaded from the HMP Data Analysis and Coordination Center (www.hmpdacc.org/HMRGD), where annotated reference genomes could be found. The complete set of annotated reference genomes was downloaded as protein sequences in fasta format, by choosing ‘Download all’ in protein multifasta (PEP) format with the body site specified as ‘gastrointestinal tract’.
Quantifying peptide similarity to protein sequences
For a given peptide of length N, we define the similarity score of that peptide with sequence a as the highest score for an equal length amino acid stretch b in the set of target proteins, where the match score is given by the formula
in which bl(a, b) is the sum of the BLOSUM62 matrix23 values for substituting residues in peptide a for the residues in amino acid stretch b.
Results
Assembly of a peptide immunogenicity data set
Immunogenicity can be assessed by different assays, and differs substantially between antigenic systems because of variances in the route of exposure to the antigens. We therefore only compared the immunogenicity of peptides from the same antigenic source and tested in the same assay systems. We used data sets from three studies representing different modes of exposure: infection with M. tuberculosis, vaccination with B. pertussis and inhalation of allergens from cockroaches. In each of these data sets, a large numbers of peptides (> 500) had been tested in a consistent fashion by ELISPOT assays in a large (> 30) number of donors. For the cockroach data set, peptides were considered positive if they had significant responses in at least two donors (t‐test P < 0·05, stimulation index > 2, SFC > 20), and if the total number of SFC per million summed over all donors in the cohort was > 100. For the M. tuberculosis and B. pertussis data sets, reactivities were higher in general, and the cut off for positivity was set to three reacting donors and a total SFC > 200. Peptides were considered negative if they did not give a significant response in any single donor. Peptides with intermediate reactivities were discarded. These selection criteria take into account inherent differences in assays and immunization procedures to ensure that the positive set capture 70% or more of the total reactivity. Table 1 lists the number of positive and negative peptides as well as the number of donors for each of the data sets. Table S1 (see Supplementary material) lists the peptide sequences and their immunogenicity classification.
Table 1.
Peptide immunogenicity data sets
Data set | No. of positive peptides | No. of negative peptides | No. of intermediate peptides | No. of donors tested |
---|---|---|---|---|
Mycobacterium tuberculosis | 79 | 523 | 148 | 61 |
Bordetella pertussis | 142 | 300 | 206 | 31 |
Cockroach | 59 | 437 | 170 | 90 |
Quantifying peptide similarity
To compare the bacterial and allergen‐derived peptides tested for immunogenicity with the human proteome and human microbiome, we needed to define a quantitative score of similarity. A simple approach is to count the number of different amino acids in two sequences, but this neglects that some exchanges of amino acids alter the properties of a peptide significantly more than others. To account for this, we quantified amino acid similarity using a BLOSUM similarity score described in the Materials and methods section, which quantifies amino acid similarity based on large‐scale protein alignments. This score will give peptides with an identical match a similarity score of 1·0, regardless of the amino acid composition and length of the peptides. The bigger that differences between two sequences are according to the BLOSUM matrix, the lower the score. Figure 1 illustrates the ranges of similarity scores for peptides with varying numbers of amino acid exchanges. Whereas a single amino acid exchange results in scores in the range from 0·901 to 0·987, (90% confidence interval displayed in Fig. 1), for multiple exchanges the score range is much broader (from 0·536 to 0·771). The score ranges for a given number of substitutions are provided here as a reference of how the BLOSUM scores should be interpreted.
Figure 1.
The relationship between number of amino acid exchanges and the BLOSUM score. Twenty thousand peptides were randomly selected from UniProt proteins and compared with the human gut microbiome in terms of BLOSUM score. The figure shows the average and 5th centile to 95th centile range of the generated scores as a function of the number of amino acid substitutions between the peptides.
Correlating peptide immunogenicity with similarity to the human proteome
To identify if negative selection of self‐reactive T cells reduces the immunogenicity of peptides that are similar to the human proteome, we calculated the peptide similarity of immunogenic peptides (epitopes) and negative peptides from our three data sets to the human proteome. Figure 2 shows the cumulative distribution of similarity scores for three data sets. For all three data sets, epitopes had a slightly but significantly lower median similarity to self‐peptides than non‐epitopes, as shown in Table 2 (B. pertussis: 0·485 versus 0·493, P = 0·049; M. tuberculosis: 0·507 versus 0·515, P = 0·037; cockroach: 0·513 versus 0·534, P = 0·008). This confirms that immunogenic MHC class II restricted peptides have a tendency to be less similar to self‐peptides.
Figure 2.
Similarity of epitopes and non‐epitopes to the human proteome. Each panel shows the cumulative distribution of similarity scores for epitopes (blue line) and non‐epitopes (red‐line). The different panels depict peptides from (a) Bordetella pertussis, (b) Mycobacterium tuberculosis and (c) cockroach.
Table 2.
Median similarity scores of epitopes and non‐epitopes to the human proteome
Data set | Median similarity score, positive peptides (SD) | Median similarity score, negative peptides (SD) |
---|---|---|
Mycobacterium tuberculosis | 0·507 (0·048) | 0·515 (0·048) |
Bordetella pertussis | 0·485 (0·041) | 0·493 (0·047) |
Cockroach | 0·513 (0·094) | 0·534 (0·107) |
Correlating peptide immunogenicity with similarity to the human gut microbiome
Next, we assessed in an analogue fashion if there was a detectable reduction in immune reactivity for peptides that had similar matches in the human gut microbiome. Figure 3 shows the cumulative distribution of similarity scores for epitopes and non‐epitopes from the three data sets. For both the B. pertussis and cockroach data sets, epitopes had a significantly lower median similarity to the gut microbiome compared with non‐epitopes as assessed by a one‐tailed Mann–Whitney U‐test (B. pertussis: 0·558 versus 0·571, P = 0·012; cockroach: 0·580 versus 0·599, P = 0·0002). The M. tuberculosis data set showed the same trend, but did not reach the significance of P < 0·05 in this test (0·584 versus 0·590; P = 0·11). All median similarity scores can be seen in Table 3. Hence, MHC class II restricted epitopes also have a tendency to be less similar to peptides found in the healthy gut microbiome.
Figure 3.
Similarity of epitopes and non‐epitopes to proteins encoded by microbes found in healthy human gut. Each panel shows the cumulative distribution of similarity scores for epitopes (blue line) and non‐epitopes (red‐line). The different panels depict peptides from (a) Bordetella pertussis, (b) Mycobacterium tuberculosis and (c) cockroach.
Table 3.
Median similarity scores of epitopes and non‐epitopes to the human gut microbiome
Data set | Median similarity score, positive peptides (SD) | Median similarity score, negative peptides (SD) |
---|---|---|
Mycobacterium tuberculosis | 0·584 (0·044) | 0·590 (0·059) |
Bordetella pertussis | 0·558 (0·052) | 0·571 (0·053) |
Cockroach | 0·580 (0·046) | 0·599 (0·073) |
Combining human and microbiome conservation scores
To determine if the similarity scores of a peptide to the human proteome and the human gut microbiome can be combined to better predict immunogenicity, we performed a linear regression of the two scores calculating the total score = offset − β microbiome × scoremicrobiome − β human × scorehuman, where the three model parameters are (i) a constant offset, and (ii) β microbiome and (iii) β human are weights of the gut microbiome and the human proteome similarity scores, respectively. The model parameters were fitted by calculating the total score of a set of peptides, and minimizing the squared difference to their immunogenicity, with immunogenic peptides set to 1·0 and non‐immunogenic peptides set to 0·0. Fitted model parameters determined in 20‐fold cross‐validation are listed in Table 4. The fitted model parameters indicated that the microbiome score is assigned nearly double the weight of the human proteome score (0·66 versus 0·39), suggesting that it has higher predictive power in this model. The average cross‐validated distance for the combined model is 0·1458. This distance is statistically significantly lower when comparing it with a model including only the human proteome score [distance = 0·1471; P = 0·025 (one‐sided, paired t‐test)], and shows the same trend but does not reach statistical significance when compared with a model including only the microbiome score (distance = 0·1459, P = 0·13). These data suggest that both scores provide independent information on the immunogenicity of a peptide, and that the microbiome score has higher predictive value in this simple model.
Table 4.
Linear regression combining microbiome and human proteome scores to predict immunogenicity
Model | Parameter fit (SD) | Distance | ||
---|---|---|---|---|
Offset | β microbiome | β human | ||
Both parameters | 0·78 (0·02) | 0·66 (0·04) | 0·39 (0·03) | 0·1458 |
Microbiome only | 0·69 (0·02) | 0·85 (0·04) | N/A | 0·1459 |
Human only | 0·50 (0·02) | N/A | 0·6 (0·03) | 0·1471 |
Discussion
Previous studies for MHC class I restricted epitopes had shown that there is evidence for negative selection against peptides that are similar to the human proteome.24 In this study, we have expanded those findings for MHC class II restricted epitopes, and in addition demonstrated for the first time that there is a correlation between a peptide's T‐cell immune reactivity and its conservation in the microbiome. This suggests that there is an imprint on the availability of T cells recognizing certain peptide targets that shapes the development of the immune response against subsequent exposures.
Our study provides proof of principle that it should be possible to include similarity to self and the microbiome as selection factors in prediction pipelines for MHC class II restricted epitopes and adaptive T‐cell immunotherapy.25 Given the discrepancy between our ability to predict MHC class II binding and MHC class II T‐cell immunogenicity,14 any such additional factors are highly desirable. However, at the current stage, the magnitude of the detected differences in similarity to either the human proteome or the microbiome are very small. As a result, incorporating the current similarity scores into epitope prediction pipelines would be expected to only give marginal improvements. Additional improvements will be necessary, such as better understanding which species in the microbiome have a selective impact on the epitope repertoire, or by developing better similarity matrices that quantify which amino acid substitutions are considered conservative in the context of T‐cell immune recognition. Similarly, the present study was limited to three data sets, representing different types of antigen exposure (infection, vaccination and allergen exposure). Future studies will explore in much more detail if and how this observation holds in different antigenic systems, and how the scores can best be combined with other factors to derive overall immunogenicity predictions.
Disclosures
The authors declare that they have no conflict of interest.
Supporting information
Table S1. Peptide sequences and their immunogenicity classification
Acknowledgements
This work has been funded by National Institutes of Health contract HHSN272201200010C.
References
- 1. Karosiene E, Lundegaard C, Lund O, Nielsen M. NetMHCcons: a consensus method for the major histocompatibility complex class I predictions. Immunogenetics 2012; 64:177–86. [DOI] [PubMed] [Google Scholar]
- 2. Kim Y, Ponomarenko J, Zhu Z, Tamang D, Wang P, Greenbaum J et al Immune epitope database analysis resource. Nucleic Acids Res 2012; 40:W525–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Nielsen M, Lundegaard C, Blicher T, Peters B, Sette A, Justesen S et al Quantitative predictions of peptide binding to any HLA‐DR molecule of known sequence: NetMHCIIpan. PLoS Comput Biol 2008; 4:e1000107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Peters B, Bui H‐H, Frankild S, Nielsen M, Lundegaard C, Kostem E et al A community resource benchmarking predictions of peptide binding to MHC‐I molecules. PLoS Comput Biol 2006; 2:e65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Wang P, Sidney J, Dow C, Mothé B, Sette A, Peters B. A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach. PLoS Comput Biol 2008; 4:e1000048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Lankat‐Buttgereit B, Tampe R. The transporter associated with antigen processing: function and implications in human diseases. Physiol Rev 2002; 82:187–204. [DOI] [PubMed] [Google Scholar]
- 7. Paz P, Brouwenstijn N, Perry R, Shastri N. Discrete proteolytic intermediates in the MHC class I antigen processing pathway and MHC I‐dependent peptide trimming in the ER. Immunity 1999; 11:241–51. [DOI] [PubMed] [Google Scholar]
- 8. Craiu A, Akopian T, Goldberg A, Rock KL. Two distinct proteolytic processes in the generation of a major histocompatibility complex class I‐presented peptide. Proc Natl Acad Sci 1997; 94:10850–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Kloetzel PM. Antigen processing by the proteasome. Nat Rev Mol Cell Biol 2001; 2:179–87. [DOI] [PubMed] [Google Scholar]
- 10. Larsen MV, Lundegaard C, Lamberth K, Buus S, Lund O, Nielsen M. Large‐scale validation of methods for cytotoxic T‐lymphocyte epitope prediction. BMC Bioinformatics 2007; 8:424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Stranzl T, Larsen MV, Lundegaard C, Nielsen M. NetCTLpan: pan‐specific MHC class I pathway epitope predictions. Immunogenetics 2010; 62:357–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Alexander J, Sidney J, Southwood S, Ruppert J, Oseroff C, Maewal A et al Development of high potency universal DR‐restricted helper epitopes by modification of high affinity DR‐blocking peptides. Immunity 1994; 1:751–61. [DOI] [PubMed] [Google Scholar]
- 13. Calis JJA, Maybeno M, Greenbaum JA, Weiskopf D, De Silva AD, Sette A et al Properties of MHC class I presented peptides that enhance immunogenicity. PLoS Comput Biol 2013; 9:e1003266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Chaves FA, Lee AH, Nayak JL, Richards KA, Sant AJ. The utility and limitations of current Web‐available algorithms to predict peptides recognized by CD4 T cells in response to pathogen infection. J. Immunol. 2012; 188:4235–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Paul S, Sidney J, Peters B, Sette A. Development and validation of a broad scheme for prediction of HLA class II restricted T cell epitopes Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. New York, NY, USA: BCB ‘14 ACM Press, 2014: 733–8. [Google Scholar]
- 16. Carpenter C, Sidney J, Kolla R, Nayak K, Tomiyama H, Tomiyama C et al A side‐by‐side comparison of T cell reactivity to fifty‐nine Mycobacterium tuberculosis antigens in diverse populations from five continents. Tuberculosis 2015; 95:713–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Lindestam Arlehamn CS, Gerasimova A, Mele F, Henderson R, Swann J, Greenbaum JA et al Memory T cells in latent Mycobacterium tuberculosis infection are directed against three antigenic islands and largely contained in a CXCR3‐ CCR6+ Th1 subset. PLoS Pathog 2013; 9:e1003130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Oseroff C, Sidney J, Tripple V, Grey H, Wood R, Broide DH et al Analysis of T cell responses to the major allergens from German cockroach: epitope specificity and relationship to IgE production. J. Immunol. 2012; 189:679–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Dillon MBC, Schulten V, Oseroff C, Paul S, Dullanty LM, Frazier A et al Different Bla‐g T cell antigens dominate responses in asthma versus rhinitis subjects. Clin Exp Allergy 2015; 45:1856–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. The UniProt Consortium . UniProt: a hub for protein information. Nucleic Acids Res 2014; 43:D204–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. The Human Microbiome Project Consortium . A framework for human microbiome research. Nature 2012; 486:215–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. The Human Microbiome Project Consortium . Structure, function and diversity of the healthy human microbiome. Nature 2012; 486:207–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 1992; 89:10915–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Calis JJA, de Boer RJ, Keşmir C. Degenerate T‐cell recognition of peptides on MHC molecules creates large holes in the T‐cell repertoire. PLoS Comput Biol 2012; 8:e1002412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Haase K, Raffegerst S, Schendel DJ, Frishman D. Expitope: a web server for epitope expression. Bioinformatics 2015; 31:1854–6. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1. Peptide sequences and their immunogenicity classification