New light on the HLA-DR immunopeptidomic landscape

Emilie Egholm Bruun Jensen; Birkir Reynisson; Carolina Barra; Morten Nielsen

doi:10.1093/jleuko/qiae007

. 2024 Jan 12;115(5):913–925. doi: 10.1093/jleuko/qiae007

New light on the HLA-DR immunopeptidomic landscape

Emilie Egholm Bruun Jensen ¹, Birkir Reynisson ², Carolina Barra ³, Morten Nielsen ^4,^5,^✉,^b

PMCID: PMC11057780 PMID: 38214568

Abstract

The set of peptides processed and presented by major histocompatibility complex class II molecules defines the immunopeptidome, and its characterization holds keys to understanding essential properties of the immune system. High-throughput mass spectrometry (MS) techniques enable interrogation of the diversity and complexity of the immunopeptidome at an unprecedented scale. Here, we analyzed a large set of MS immunopeptidomics data from 40 donors, 221 samples, covering 30 unique HLA-DR molecules. We identified likely co-immunoprecipitated HLA-DR irrelevant contaminants using state-of-the-art prediction methods and unveiled novel light on the properties of HLA antigen processing and presentation. The ligandome (HLA binders) was enriched in 15-mer peptides, and the contaminome (nonbinders) in longer peptides. Classification of singletons and nested sets showed that the first were enriched in contaminants. Investigating the source protein location of ligands revealed that only contaminants shared a positional bias. Regarding subcellular localization, nested peptides were found to be predominantly of endolysosomal origin, whereas singletons shared an equal distribution between the cytosolic and endolysosomal origin. According to antigen-processing signatures, no significant differences were observed between the cytosolic and endolysosomal ligands. Further, the sensitivity of MS immunopeptidomics was investigated by analyzing overlap and saturation between biological MS replicas, concluding that at least 5 replicas are needed to identify 80% of the immunopeptidome. Moreover, the overlap in immunopeptidome between donors was found to be very low both in terms of peptides and source proteins, the latter indicating a critical HLA bias in the antigen sampling in the HLA antigen presentation. Finally, the complementarity between MS and in silico approaches for comprehensively sampling the immunopeptidome was demonstrated.

Keywords: antigen processing, HLA antigen presentation, immunopeptidome, in silico models, mass spectrometry

Immunoinformatics analyses reveal novel insights to the properties of the HLA-DR immunopeptidome, suggesting singleton peptides to be a high source of HLA irrelevant contaminants.

1. Introduction

The HLA class II pathway allows peptides originating from autophagy or endocytosed proteins to be presented on the cell surface of professional antigen-presenting cells.^1,2 These pathways consist of a series of complex steps including proteolysis of self and foreign antigens in endolysosomes, antigen loading on to HLA class II in the antigen processing compartment, and translocation of mature peptide-HLA complexes onto the cell surface.² On the surface of the cell, the HLA class II molecules present the loaded peptides for scrutiny by CD4+ T cells. If a CD4+ T cell recognizes the presented peptide, an immune response can be triggered. Understanding the rules of HLA class II antigen processing and presentation is thus of great importance for the basic understanding of cellular immunity and the design of immune-therapeutic interventions.

HLA class II molecules are dimeric molecules formed by an alpha and beta chain. HLA is both polygenic and polymorphic and is encoded at 3 loci, HLA-DR, HLA-DP, and HLA-DQ, each locus representing a large variety of allelic variants across the human population. Each HLA class II molecule has a potentially unique peptide binding specificity defined predominantly by the residues forming in the binding cleft. Unlike major histocompatibility complex class I (MHC-I), the MHC-II binding cleft is open-ended, allowing it to bind peptides of a large length variety. However, the vast majority of HLA class ligands are of lengths of 13 to 19, with a preference for 15-mers.³

Given its critical role in antigen presentation, large efforts have been dedicated to characterize the rules defining the presentable set of peptides by HLA, the so-called immunopeptidome. This includes experimental in vitro binding assays⁴ and liquid chromatography–tandem mass spectrometry (LC-MS/MS).^5–8 Likewise, many in silico models trained on such experimental data types have been proposed for the prediction of peptide-HLA binding and presentation.⁹

Earlier work has analyzed such data with the aim of characterizing properties of MS immunopeptidomes including the milestone work of Ciudad et al.¹⁰ This and previous work demonstrated that the vast majority of MS immunopeptidome ligands were high-affinity HLA binders located in nested sets and belonged to endolysosomal pathway–degraded proteins. Further, the minority of ligands found in cytosolic and nuclear proteins predominantly corresponded to single sequences, did not originate from nested sets, and were located at the C terminus of the parental protein. Finally, subtle differences were observed in the processing signal between ligands of endolysosomal and cytosolic origin. Together, the authors concluded that these observations suggested alternative antigen processing between the 2 classes of ligands.

The work and conclusions of Ciudad et al.¹⁰ are, however, challenged by the very small data analyzed: only 1,319 peptides covering a total of 9 HLA-DR molecules. Further, the applied HLA motif deconvolution approach (the process of assigning each ligand to a likely HLA restriction element) was performed very crudely, based only on predicted HLA binding affinity. Also, this approach had limited power in identifying co-immunoprecipitated HLA-irrelevant contaminants. Such contaminants form a well-known byproduct of the LC-MS/MS pipeline.¹¹ Cell lysis conditions in immunopeptidomics are mild; therefore, coprecipitation of other proteins and peptides is expected to certain extent. The degree of such HLA-irrelevant peptides varies depending on the expertise of the experimental lab running the MS experiments and the false discovery rate (FDR) threshold applied in the spectral annotation step. Note, that the FDR threshold for most MS HLA immunopeptidomics studies is set between 1% and 5%, meaning that 1% to 5% of the identified peptides are anticipated to be false discovery hits, and hence HLA-irrelevant.

In this work, we seek to unfold these challenges and limitations and revisit the findings of Ciudad et al.¹⁰ in the context of a vastly increased MS immunopeptidomics dataset and with the application of current state-of-the-art methods for MHC motif deconvolution allowing for accurate identification of likely MS contaminants.

2. Methods

2.1. MS-eluted MHC-II peptides

The data used for this study were obtained from an earlier study¹² containing 1,107,591 MS-eluted HLA-II ligands of 13 to 21 amino acids in length sourced from 40 different donors, and 221 MS immunopeptidomics experiments. The data are available at https://services.healthtech.dtu.dk/suppl/immunology/Immunology2020_Attermann. A summary table of these 40 donors, their HLA typing, and the number of peptides are available at https://services.healthtech.dtu.dk/suppl/immunology/JLB_2023_Jensen.

In those MS-eluted HLA-II ligands, monocytes were isolated from the peripheral blood mononuclear cell fraction of peripheral blood samples by positive selection and differentiated in vitro into immature dendritic cells (DCs) in the presence of interleukin-4 and granulocyte-macrophage colony-stimulating factor.¹² The immature DCs were matured overnight with lipopolysaccharide. Next, DCs were lysed and HLA-DR molecules were recovered using L243 monoclonal antibody immunoprecipitation. Peptides were eluted from the HLA-DR molecules and sequenced using LC-MS/MS, and the spectra were mapped to human proteins using a 1% FDR. Of those, 5,926 ligands did not map to the current human reference proteome and were excluded from further analysis, resulting in a final dataset of 1,101,665 ligands.

2.2. Motif deconvolution and binding core identification

The likelihood of HLA antigen presentation and the location of the binding core were predicted by NetMHCIIpan-4.1 (https://services.healthtech.dtu.dk/services/NetMHCIIpan-4.1/). To avoid overprediction, all possible 9-mers of each ligand were checked against the binding cores in a particular test set partition. If an identical match was found between a 9-mer and a test set binding core, the network subset corresponding to this training partition was used for predictions. This was the case for 764,769 ligands. If no 9-mer of the ligand was observed in the training or test data, the network subset corresponding to a random partition was chosen. This was the case for 258,984 ligands. If a 9-mer of a ligand was observed multiple times in different test set partitions, the ligand was excluded from the dataset to avoid overfitting. A total of 77,912 ligands were excluded because of this. After this filter step, the dataset was reduced to 1,023,753 ligands. A high proportion of ligands were observed across multiple datasets obtained for a given donor. This redundancy was removed for some of the analyses, and only unique peptides on the donor level were kept. This reduced the dataset to 349,225 peptides.

2.3. Nested sets and singletons

Nested sets are defined as peptides whose overlap is identical and of a size of 9 or more amino acids. A singleton is a ligand that is not part of a nested set. This peptide classification was done in a per-donor manner.

Information about the data and data annotation is available at https://services.healthtech.dtu.dk/suppl/immunology/JLB_2023_Jensen.

2.4. Relative location of peptides in their source proteins

To define the relative location of ligands, ligands were mapped onto their source protein sequence. Here, the signal peptides of the source proteins were predicted using SignalP version 5.0b (https://services.healthtech.dtu.dk/services/SignalP-5.0/) with eukaryote as the organism, and the signal peptides were removed. Only 1 peptide chosen at random per nested set was included per donor, reducing the number of peptides from 349,225 to 122,657. The proteins were compared by calculating the relative location from position 1 to 100. Because some of the proteins had extreme lengths, only proteins with a length between the 25th and 75th quantiles (range: 228–752 amino acids) were included in the analysis (excluding 3,416 source proteins). The pool of peptides was then sampled with replacement 122,657 times (corresponding to the total number of peptides). For each peptide, a random k-mer of equal size was generated within its source protein to generate the background localization profile for the different binding categories (binders: predicted percentile rank score below 5%; weak binder: percentile rank 5%–20%; contaminants: percentile rank over 20%). The background profiles were hence different depending on which ligand dataset was used for the relevant plot (i.e. contaminants had a different background profile than the binders, as the background profile was generated from a different set of source proteins and a different number of sampled random peptides). This was repeated 100 times to generate a localization profile with standard deviation. The log of the relative foreground divided with the relative background of the random generated k-mers was used for the relative localization profile fold-change figures.

2.5. Investigation of the contaminant-delivering proteins

To evaluate which proteins deliver contaminants, the proteins were ranked using the relation (#cont / (#binders + 10)) > 2. Here, #binder is defined as the number of singleton peptides with prediction rank score below 5, #cont is defined as the number of singleton peptides with predicted rank score above 20, and the value of 10 was selected arbitrarily to limit the effect of comparing small numbers.

The identified contaminant-delivering proteins were submitted to CRAPome (https://reprint-apms.org) to check if these were known MS contaminants.

2.6. Assigning subcellular localization

The subcellular location of the source proteins of the HLA presented peptides was grouped into degradation coming from the “cytosolic pathway” and the “endolysosomal pathway” using the Gene Ontology (GO) cellular component ontology. The GO terms for the cytosolic pathways include “nucleus” (GO:0005634), “cytoplasm” (GO:0005737), “mitochondrion” (GO:0005739), and “peroxisome” (GO:0005777). The endolysosomal GO terms include “extracellular region” (GO:0005576), “cell membrane” (GO:0033644), “endoplasmic reticulum” (GO:0005783), “Golgi apparatus” (GO:0005794), “lysosome” (GO:0005764), and “vacuole” (GO:0005773).

The GO terms were obtained from the UniProt Retrieve/ID Mapping tool (https://www.uniprot.org/id-mapping). For each protein, a GO term list was compiled including parent GO terms. The classification of the major subcellular localization (i.e. the most used by each protein) was made by comparing the collection of GO terms to the endolysosomal and cytosolic categories defined above. If the GO term list of a protein contained terms for both endolysosomal and cytosolic cellular location, the protein was assigned “dual.”

As an alternative to GO, the subcellular location of the source proteins were predicted with DeepLoc version 1.0 (https://services.healthtech.dtu.dk/services/DeepLoc-1.0/). DeepLoc returns the likelihood of a protein being located in 10 different categories. One of them is “Plastid,” which is not relevant for human proteins and was removed from the analysis. This left 9 categories, 4 being cytosolic and 5 being endolysosomal. Next, proteins were assigned as cytosolic if the summed prediction score over the cytosolic categories was more than 4 of 9 and were assigned endolysosomal if the summed endolysosomal prediction score was more than 5 of 9. A difference between the prediction scores for the 2 groups was also tested to increase the confidence in the classification.

The prediction method of determining the cellular location of a protein using DeepLoc was validated with GO term as the “real” classification in a confusion matrix. Different thresholds for DeepLoc classification were tested and the performance evaluated using Matthews correlation coefficient (MCC). The nonredundant dataset filtered on prediction rank score <5 was used for this analysis. This dataset contained 276,072 MHC-II binders.

2.7. Differences between endolysosomal and cytosolic peptides

In addition to the relative location, the location of cytosolic and endolysosomal peptides was investigated by defining the N-terminal as the first 30 amino acids and the C-terminal as the last 30 amino acids. The distribution of N-terminal, internal, and C-terminal peptides was compared between the cytosolic and endolysosomal peptides. A strict approach of defining terminal peptides was used as well as an approach that allowed the peptide to be partly located within the first or last 30 amino acids.

The contribution of cytosolic and endolysosomal peptides to the total internal compartment of the source protein and the total terminal compartments of the source protein were computed, again to investigate if either the cytosolic or endolysosomal would be overrepresented in either compartment.

The last comparison was made by comparing the processing patterns of the context and peptide flanking region (PFR) of the cytosolic and endolysosomal peptides. Only binding cores located at least 3 amino acids away from the termini were included in this analysis. The unique context PFR of the N-terminal of the peptides as well as the unique context PFR of the C-terminal of the peptides were visualized as sequence logos using Seq2Logo.¹³ Context PFR sequences <6 amino acids located too close to the termini of the protein were not included. The logo plots were created excluding sequence weighting.

2.8. Coverage of MS immunopeptidomes

The coverage of the MS immunopeptidomes and the overlap between measured and predicted ligands were investigated by focusing on 2 donors: donor 312 and donor 936. These were assessed at a very high degree compared with the other donors, 12 and 10 times, respectively. MHCMotifDecon (https://services.healthtech.dtu.dk/services/MHCMotifDecon-1.0/) with default MHC-II setting was used to deconvolute the MS immunopeptidome data for each donor. The consistency across multiple replicas (i.e. independent MS experiments for the same donor) was evaluated both at the ligand binding core and protein level by identifying the number of samples in which they were present. This was repeated at increased confidence, achieved by only allowing proteins containing 2 or more unique binding cores. Additionally, the saturation of unique proteins and binding cores was evaluated by combining an increasing number of samples each time. Also here, the confidence was increased by only counting proteins containing 2 or more unique binding cores. The order of samples were randomly sampled 100 times for the saturation analyses because the samples contained a different number of unique proteins and binding cores. Last, the overlap between the 2 donors was investigated at the unique protein and binding core levels.

To investigate the effect of combining MS experiments and in silico predictions to uncover the immunopeptidome, half of the MS experiments for a given donor were chosen at random and used as the golden MS-detectable immunopeptidome standard. Next, 5 approaches were used to uncover these binding cores from the gold standard: (1) independent MS experiments, (2) independent MS experiments combined with in silico predictions on MS-identified proteins filtered at prediction rank score <5, (3) MS experiments combined with in silico predictions on MS-identified proteins filtered at prediction rank score <1, (4 and 5) MS experiments combined with generating a random set of cores corresponding to the number of in silico recovered cores (found at prediction rank scores <1 and 5) on MS-identified proteins. The random generated cores were ensured to be different from the MS-identified cores and were sampled from the protein in which the predicted core was found.

The cost of decreased specificity with increased sensitivity obtained with introducing in silico models to expand the ligand pool was investigated as well. Here, the ability of recovering peptides from the golden MS-detectable immunopeptidome standard was tested for 3 of the methods: MS, MS combined with in silico predictions with rank score 1, and MS combined with in silico predictions with rank score 5. To quantify this cost, the total number of cores identified from the gold standard was divided by the total number of cores in the gold standard.

3. Results

Characterizing the rules and properties of peptides that undergo MHC-II antigen processing and presentation holds essential information relevant for the understanding of cellular immunity. Here, we performed such characterization of a large set of MS-identified immunopeptidome and extracted shared features related to MHC presentation potential, positional localization in the source protein, source protein subcellular localization, and traces of antigen processing. Further, we investigated how the expressed MHC shapes the immunopeptidome in a given individual and quantify potential limitations of MS for the comprehensive characterization of immunopeptidomes.

3.1. MHC presentation potential

First, we investigated if biases in predicted binding potential (i.e. percentile rank values) existed between peptides of variable length and singletons vs nested peptides. For this analysis, the nonredundant dataset containing 349,225 peptides was used. The binding potential of the peptides was predicted with NetMHCIIpan-4.1 as described in the Methods.

The prediction rank score distribution for different length peptides in Fig. 1A shows that peptides with length 15 overall have lowest prediction rank scores, and that longer (and to a certain degree also shorter) peptides have higher prediction rank scores. We and others have earlier shown that a proportion of the MS immunopeptidome consists of co-immunoprecipitated MHC-irrelevant peptides, and that such “contaminants” are characterized by a low MHC antigen presentation potential (i.e. a high predicted percentile rank score).¹¹ Note that in this work the immunoprecipitation step and HLA motif deconvolution was limited to HLA-DR, and any co-immunoprecipitated peptides presented by HLA-DP or HLA-DQ will thus likely be classified as contaminants. Figure 1B shows the proportion of ligands grouped based on their prediction rank score. Ligands with length 15 have the highest percentage of binders (percentile rank score <5). As the ligands get longer, the proportion of peptides with a percentile rank score above 20, hereafter referred to as contaminants, increases. Further, Fig. 1C demonstrates the length distribution of predicted binders and contaminants, again underlining the very different properties of the 2 peptide subsets, with the contaminants having a close to uniform distribution across the different length categories. In line with earlier work, these results strongly suggest that at least a large proportion of the contaminant peptides are MHC irrelevant.

MS immunopeptidome data are known to share a large proportion of so-called nested ligand subsets.⁷ These are peptides that are either nested or overlapping. In contrast, ligands that do not share overlap with any other ligand are referred to as singletons (for further details on the exact definition of the two, refer to the Methods). Here, we investigated if nested and singleton peptides shared different HLA binding potential. The results of this analysis are displayed in Fig. 1D and show that MHC-II singleton ligands in general have a higher prediction rank score than MHC-II ligands located in nested sets. This observation is supported in Fig. 1E, in which only about 60% of the singleton ligands have prediction rank score <5, and more than 20% are classified as contaminants. In contrast, these values for nested peptides are 84% and 5%, respectively. Finally, Fig. 1F displays the length distributions for the different peptide subsets, confirming the expected profile for the predicted binders, and in agreement with the results in Fig. 1C a very different profile for both nested and singleton contaminants. The data were found to contain 62,694 nested peptide sets, with an average of 5 peptides per set. A total of 4% (n = 2591) of the nested sets were found to consist entirely of predicted nonbinding peptides. Investigating the length distribution of peptides in these nested sets of nonbinders revealed that these nonbinding nested sets share a profile similar to that of the singleton binders (see Supplementary Material 1). This suggests that these nested sets at least in part are HLA ligands likely from a non–HLA-DR origin.

3.2. Limited evidence for bias in relative location of ligands in their source proteins

Earlier works have suggested that subsets of HLA class II ligands share a positional bias toward the protein C-terminal.¹⁰ To further investigate this, relative location plots were computed for the nonredundant ligand dataset. To avoid biases in this analysis toward atypically short or long proteins, only peptides from source proteins whose length was found in the 25th to 75th percentile of the protein length distribution were included here (for details, refer to the Methods). The backgrounds of the 3 peptide categories were used for this investigation. Contaminants were, as previous, defined as peptides with a predicted percentile rank score above 20. Binders were defined as peptides with prediction rank scores below 5, and the intermediate category as peptides with percentile rank scores between 5 and 20. The number of peptides in each category was 8,234 (contaminants), 9,390 (intermediate), and 44,250 (binders). Thus, the majority (72%) of the peptides are classified as binders.

To investigate if relative positions in the source protein were significantly depleted or enriched of ligands, backgrounds were computed for each category (for details, refer to the Methods). Next, the enrichment of ligands with respect to this background at relative locations in their source protein for the 3 categories was calculated and displayed in Fig. 2A–C. These results demonstrate that contaminants are enriched at both termini of the source protein. In contrast, no significant positional bias was observed for the binder category, and the intermediate category showed an enrichment only toward the C-terminal end.

Fig. 2. — Relative location of MHC-II peptides in their source protein. For all subfigures, the standard deviation is shown in blue. (A) Relative location of the 8,234 contaminant peptides with prediction rank score above 20. (B) Relative location of the 9,390 MHC peptides with prediction rank score above 5 and below 20. Here, an enrichment is observed at the C-terminal end, suggesting that there is still a contaminant signal in this bin. (C) Relative location of the 44,250 MHC-II binders. No signal is observed, meaning that the binders are located all over the protein. **(D)** Relative location of the 1,902 contaminant peptides with prediction rank score above 20 located in nested sets. **(E)** Relative location of 3,887 peptides with prediction rank score above 5 and below 20 located in nested sets. **(F)** Relative location of the 27,042 binder peptides with prediction rank score below 5 located in nested sets. **(G)** Relative location of the 6,319 contaminant singleton peptides with prediction rank score above 20. **(H)** Relative location of 5,536 singleton peptides with prediction rank score between 5 and 20. **(I)** Relative location of the 17,185 singleton binder peptides with prediction rank score below 5.

Thus, this analysis showed no evidence for a positional bias in HLA class II ligands, and suggests that earlier observations of such a bias might have been driven by the signal from HLA-irrelevant MS co-immunoprecipitated contaminants.

To further investigate signature of position biases, the analysis was repeated splitting the peptides into singletons and nested subsets. Here, the backgrounds were computed to account for the characteristics defining a singleton peptide and peptides located in nested sets (see the Methods). The results of this analysis are shown in Fig. 2D–I.

No positional bias was observed for the peptide binder category for nested or singleton binders. In contrast, both nested and singleton contaminants share a positional bias toward both the N- and C-termini, with the C-terminal bias being more pronounced.

3.3. Investigation of the contaminant-delivering proteins

To investigate potential biases in the proteins delivering contaminant ligands, proteins were ranked based on their proportion of singleton contaminants to singleton binders, as singletons are assumed real contaminants. In total, 21 proteins were found to be contaminant-delivering according to the relation (#cont / (#binders + 10)) > 2, as described in the Methods. Table 1 lists the 5 proteins with the highest contaminant proportion.

Table 1.

Top 5 contaminant-delivering proteins.

Protein ID	No. contaminants	No. binders	Protein name	CRAPome no. of experiments. (found/total)	CRAPome averaged SCs	CRAPome maximal SCs
P20930.3	89	5	Filaggrin	54/716	3.2	17
P02768.2	74	16	Albumin	239/716	7.8	89
P04264.6	73	4	Keratin, type II cytoskeletal 1	671/716	58.6	1141
P61247.2	61	8	40S ribosomal protein S3a	434/716	8.9	56
P13645.6	54	3	Keratin, type I cytoskeletal 10	616/716	47	811

Open in a new tab

The numbers of contaminants and binders refer to the number of unique peptides with predicted rank >20 and <5, respectively, across all samples. Protein name is identified from UniProt. The last 3 columns are from the CRAPome database, and states the number of experiments in the database in which the selected gene/protein was detected, the averaged SCs and maximal SCs for the selected gene/protein. For details, refer to https://reprint-apms.org/?q=tutorial.

SC = spectral count.

All 5 of these protein IDs are found in the CRAPome database,¹⁴ with high counts of experiments and spectral count values suggesting that they are common nonspecific MS contaminants. Three of these 5 proteins, i.e. filaggrin (P20930.3), keratin type I cytoskeletal 9 (P35527.3), and keratin type I cytoskeletal 10 (P13645.6), are expressed in human skin, and albumin (P02768.2) is abundant in human blood, suggesting that these probably derived from sample handling. 40S ribosomal protein S3a (P61247.2) is part of the group of highly abundant RNA-binding proteins and is a known source of HLA MS contaminants. A full list of the 21 contaminant-delivering proteins as well as their CRAPome information is available at https://services.healthtech.dtu.dk/suppl/immunology/JLB_2023_Jensen.

These observations illustrate that MS immunopeptidomics data in general contain a substantial proportion of contaminants, but that simple bioinformatics analyses can be applied to identify these, aiding the interpretation of the data.

3.4. Effects on source protein subcellular location

Earlier work has observed differences in terms of source protein subcellular localization and signals of antigen processing between peptides originating from cytosolic proteins and peptides originating from endolysosomal proteins.¹⁰ To further investigate this, HLA ligands were subdivided based on the major subcellular location of the source proteins. For these analyses, only peptides with prediction rank score <5% (i.e. binders) were considered. This dataset consists of 276,072 binders.

First, the major subcellular location classification (i.e. the most used by the protein) was performed using GO terms (for details, refer to the Methods). This approach, however, resulted in 37% of the peptides being annotated as dual, meaning that the source protein had GO terms associated with both the cytosolic and endolysosomal groups (see Fig. 3A). To resolve this, and further analyze the major protein localization, the DeepLoc prediction method was applied.¹⁵ DeepLoc allows for protein classification into a set of subcellular localization classes, from which proteins can be classified as either cytosolic or endolysosomal, as described in the Methods.

Fig. 3. — Ligand cellular location comparisons. **(A)** Confusion matrix comparing the major subcellular localization results using DeepLoc and GO terms with difference threshold of 0.5. Here, the GO-based classification had the following raw proportions: 33,177 cytosolic, 97,068 endolysosomal, and 77,804 dual peptides, and the DeepLoc classification had the raw proportions of 78,511 cytosolic, 121,215 endolysosomal, and 8,323 dual peptides. **(B)** Summary performance table for different DeepLoc reliability score thresholds. MCC = Matthew's correlation coefficient.

To test the validity of this approach for classifying peptides into either cytosolic- or endolysosomal-originating groups, the results were first compared with the GO term classifying method. Figure 3A shows the confusion matrix, comparing the major subcellular localization results using DeepLoc and GO terms. We can define a reliability score for the DeepLoc prediction from the absolute difference between the prediction score of the two classes. As seen in Fig. 3B, the concordance between the DeepLoc and GO classifications increased as a function of this reliability score. Focusing on a reliability score of 0.5, the MCC between the DeepLoc and GO classifications was 0.8711. At this threshold, only 8,323 of the 208,049 peptides were from proteins classified as dual (i.e. having a reliability score <0.5). Increasing the reliability score further, the performance was increased only marginally, however at a high cost in terms of dually classified ligands. Refer to Supplementary Material 2 for a table containing the MCCs for the naive classification approach.

Based on these results, moving forward, a DeepLoc reliability score of 0.5 was used for assigning cytosolic/endolysosomal major cellular location to the MHC-II binders, allowing classification of 95% of the 276,072 peptides. With the increased classification ability, in total, 159,522 (57.8%) of the binders were classified as coming from endolysosomal proteins, and 103,424 (37.5%) were classified cytosolic and 4.7% were assigned as dual. The binders were next grouped into nested and singletons as described previously to further investigate the differences between the cytosolic and endolysosomal originating binders. This analysis (see Table 2) suggested that while nested peptides predominantly are of endolysosomal origin, singletons are a mix of cytosolic and endolysosomal origin.

Table 2.

Number of endolysosomal and cytosolic binders located in either nested sets or classified as singletons.

	Binders in nested sets	Singleton binders	Total
Endolysosomal	143,753 (62)	15,769 (48)	159,522
Cytosolic	86,629 (38)	16,795 (52)	103,424
Total	230,382 (100)	32,564 (100)	—

Open in a new tab

Values are n (%) or n.

Further, no significant difference in the patterns of source protein positional location was observed between the cytosolic and endolysosomal peptides (see Fig. 4A, B). Splitting the data into nested and singletons (Fig. 4C–F), a nonsignificant tendency toward enrichment toward the C-terminal was observed, while no positional enrichment was observed for the other subsets. Note that the analysis conducted here was performed somewhat differently from what was earlier proposed by Ciudad et al.,¹⁰ in which termini were as the first and last 30 amino acids in a given protein sequence. However, the observed results were maintained repeating the analysis using the termini definition by Ciudad et al. (see Supplementary Material 3).¹⁰

Fig. 4. — Relative location of endolysosomal and cytosolic peptides in their source protein. Only peptides from source protein between length 288 and 752 were included for a more fair comparison. For all subfigures, the standard deviation is shown in blue. **(A)** Relative location of the 23,104 endolysosomal peptides. **(B)** Relative location of the 18,909 cytosolic peptides. **(C)** Relative location of the 14,925 endolysosomal peptides in nested sets. **(D)** Relative location of the 10,839 cytosolic peptides in nested sets. **(E)** Relative location of the 8,179 endolysosomal singletons. **(F)** Relative location of the 8,070 cytosolic singletons. No significant depletion or enrichment, defined as a fold change below or above 2, is observed for any groups.

These results observed here thus are in contrast to the earlier findings by Ciudad et al.,¹⁰ in which only around 20% of the ligands were identified to be of cytosolic origin, and cytosolic peptides were found to be enriched especially at the C-terminal. In line with our previous observations, these results further suggest that the earlier findings proposing a differential positional bias between peptides of endolysosomal and cytosolic origin results from a failure to properly handle MS contaminants. No such positional bias was observed when these contaminants were excluded from the analysis.

3.5. Antigen processing signatures

Next, we investigated the peptide subsets with cytosolic and endolysosomal origin for potential differences in processing patterns. Here, the analysis was conducted for the N- and C-terminal separately by extracting the 3 N/C-terminal amino acids of the peptide together with the corresponding 3 source protein amino acids, limiting the dataset to instances with 3 or more peptide flanking residues, as suggested previously.³ This resulted in 4 subsets of unique 6-mer peptides. These different peptide sets were next analyzed using Seq2Logo¹³ with default settings (see Fig. 5).

Fig. 5. — Processing patterns of cytosolic and endolysosomal C- and N-termini. **(A)** Cytosolic N-terminal context logo based on 15,705 unique sequences. **(B)** Cytosolic C-terminal context logoplot based on 11,030 unique sequences. **(C)** Endolysosomal N-terminal context logoplot based on 17,820 unique sequences. **(D)** Endolysosomal C-terminal context logo based on 12,646 unique sequences.

Here, a proline signal is clearly visible for both cytosolic and endolysosomal context PFR sequences at position N + 2 and C-2. The same relates to the negatively charged aspartic acid and glutamic acid signal at positions N + 1, N + 2, C-1, and C-2. In summary, no large differences were thus observed between the cytosolic and endolysosomal processing patterns.

These results thus expand on the earlier results of Ciudad et al.,¹⁰ in which only in endolysosomal ligands the N + 2/C-2 proline signal was observed to be significantly increased compared with background. Also, Ciudad et al.¹⁰ observed the enriched lysine signal at position C-1 only in the context of the cytosolic peptides but not for endolysosomal peptides, in alignment with our results. However, in contrast to our results, Ciudad et al.¹⁰ observed an enrichment on an aspartic acid signal at N-1 compared with the neighboring positions. This signal is absent from our results, in which the enriched aspartic acid signal was found at N + 1 and N + 2 both for cytosolic and for endolysosomal peptides (see Fig. 5).

3.6. Overlap between measured and predicted MS immunopeptidomics

Several earlier studies have suggested that MS immunopeptidomics to some degree experiences a low sensitivity, and immunoinformatics prediction methods experience a low specificity (high rate of false positive prediction).^12,16 Here, we benefited from the very large dataset available for this study including MS immunoproteomics datasets generated repeatedly for the same donor. Such data allow us to investigate how the MS immunopeptidomics coverage and the overlap between measured and predicted immunopeptidomics depend on the number of MS replica. Here, replica should be interpreted loosely, to mean multiple independent MS experiments conducted on the same donor.

Two donors were chosen for this analysis, as they were sampled 10 and 12 times (about twice as many times as the other donors). Donor 936 expressed alleles DRB1*01:03, DRB1*04:01, and DRB4*01:01 and donor 312 the alleles DRB1*03:01, DRB1*10:01, and DRB3*01:01. The MHCMotifDecon method¹¹ was used to deconvolute and visualize the motifs contained in the combined MS immunopeptidomics data for each of the 2 donors (see Fig. 6).

Fig. 6. — Motif deconvolution of the complete MS immunopeptidome data for donor 936 and donor 312. The figure was generated using the MHCMotifDecon server with default class II settings.

Focusing first at the protein level, Fig. 7A, B displays the number of unique sampled proteins as a function of the number of MS samples included.

Here, a sampled protein is a protein with at least 1 mapped MS ligand. The trends of Fig. 7 approach saturation as the number of analyzed samples is increased. This saturation is for both donors around 1,600 to 1,750 proteins, thus clearly demonstrating that the HLA pathway only samples a very small proportion of the total proteome (∼20,000 proteins). Further, the figure shows the limited sensitivity of the individual MS experiments, with each dataset targeting only between 28% to 70% of the total sampled protein space. The figure also demonstrates that this issue can in part be resolved by included MS replicas. With 5 such replicas, the sensitivity is in most cases well above 80%, and with 10 replicas the protein sampling curves in both cases tend to saturate.

As a means to limit the analysis to high-confidence proteins of the immunopeptidome, one can impose a threshold on the number of unique binding cores a protein should contain to be included. Imposing such a threshold and including only peptides and source proteins sampled by at least 2 unique binding cores (across the complete set of samples) in the analysis decreased the total number of unique sampled proteins for both donors to about half. On such datasets, the saturation was achieved already after 4 to 5 samples, with 80% sensitivity after only 2 to 3 samples (see Supplementary Material 4A, B).

Moving into the space of identified HLA ligands, Fig. 7C, D shows how the number of unique binding cores varies as more samples are included. Also, here we approach a saturation as the number of samples is increased with the total number of unique binding cores saturating around 3,000 for donor 312 and 2,500 for donor 936. These figures further underline the limited sensitivity of individual MS experiments, with each single MS run only identifying ∼30% to 50% of the total set of unique binding cores, and a sensitivity that in many cases remains as low as 80% even after 5 replicas.

As for the protein saturation plots, thresholds defined from the number of binding cores were applied to select high-confidence proteins. Imposing such thresholds, the number of unique binding cores was substantially decreased for both donors, as seen in Supplementary Material 4C, D. However, the overall sensitivity of the individual MS runs remained low (50%–60%) even when using a threshold of 5 unique cores per protein. However, as expected, both donors reached saturation faster when increasing the confidence threshold.

Next, we investigated the overlap between donor 936 and donor 312 in terms of the total sampled protein space and the set of uniquely identified binding cores. Also, here the sampled space was limited to high-confidence proteins as described previously. The result of this analysis is shown in Fig. 8.

Fig. 8. — Overlap between donor 312 and donor 936. **(A)** Protein overlap between donors: donor 936 shares 51.39% of their proteins with donor 312. Donor 312 shares 45.70% of their proteins with donor 936. **(B)** Protein overlap between donors, with only proteins with at least 2 unique binding cores included. Donor 936 shares 46.88% of their proteins with donor 312. Donor 312 shares 39.43% of their proteins with donor 936. **(C)** Unique binding core overlap between donors. Donor 936 shares 14.41% of their binding cores with donor 312. Donor 312 shares 11.02% of their binding cores with donor 936. **(D)** Unique binding core overlap between donors. Only binding cores originating from protein having at least 2 unique binding cores included. Donor 936 shares 14.21% of their binding cores with donor 312. Donor 312 shares 11.10% of their binding cores with donor 936.

Figure 8A, B displays the overlap on the protein level and demonstrates a close to constant relative overlap of only 50% in the sampled protein space between the 2 donors independent of the binding core threshold. This low overlap, combined with the observed protein sampling saturation shown in Fig. 7, suggests, in line with earlier work,^17,18 that the protein sampling to a very high degree is dictated by the donor HLA types, rather than by properties of the donor cell proteome. Figure 8C, D displays the same analysis on the level of unique binding cores and shows that they share about 15% of their binding cores. This underlines that the binding cores of a donor are unique to the donor alleles as illustrated from the divergent motifs in Fig. 6.

3.7. Combining experimental and in silico methods uncover MHC-II ligands at low cost

As seen previously, individual MS samples only partially recover the complete MS-detectable immunopeptidome. We previously proposed that in silico prediction models could be applied to resolve this undersampling issue.¹² The current data can be used to further quantify this. From the results displayed in Fig. 7, we have argued that at least 5 individual MS replicas are needed in order to approach full coverage of the MS-detectable immunopeptidome. From the MS data from donor 312 and 936, we can now define an experiment to assess the complementarity of MS and in silico immunopeptidome discovery. Here, we define a golden MS-detectable immunopeptidome binding core standard by including for each donor at random 50% of the MS samples and selecting the unique set of binding cores from the set of MS peptides with predicted rank score to at least 1 of the donor HLAs <5% (to exclude HLA-DR–unspecific contaminants). Next, we investigate to what degree these gold standard datasets are discovered by either (1) independent MS samples generated from the same donor or (2) in silico predictions. Here, the in silico predictions are limited to the set of antigen source proteins identified in the corresponding set of MS experiments. That is, if 2 independent MS experiments are considered, we predict the unique set of binding cores from the combined set of MS peptides with predicted rank score to at least 1 of the donor HLAs <5% and report the proportion of the gold standard data recovered. Likewise, we select the unique set of source proteins identified in the 2 MS experiments (source proteins containing 1 more of the identified binding cores), perform in silico predictions using NetMHCIIpan-4.1 for the donor HLAs, and select the unique set of binding cores. Next, the recall of the gold standard binding core set is reported. Because by construction the in silico model in this experiment is ensured to sample a superset of the binding cores obtained from the MS samples, the analysis is complemented by a random sampling approach. Here, unique binding cores are sampled at random from the source protein sequence together with the MS identified core(s) in a number equal to the number of unique predicted binding cores. The result of this analysis is reported in Fig. 9A, B for each of the 2 donors.

By only relying on MS, fewer cores from the gold standard are recovered (Fig. 9A, B) than by MS combined with in silico prediction. However, there is a higher specificity (i.e. a higher proportion of the cores generated are found in the gold standard) when using only MS (Fig. 9C, D). This demonstrates that MHC-II ligand discovery aided by in silico prediction will increase sensitivity, though at the cost of specificity.

Two important conclusions can be drawn from this analysis. First, and in line with our earlier observations, MS experiments share a limited sensitivity, and even including 5 replicas only recovers between 70% and 80% of the gold standard binding core dataset. Second, in silico models applied on the subset of MS-identified antigens can be applied to increase the sensitivity up to 85% to 93%. This increase is not observed when including an equally sized random increase of the sampled set of binding cores. The increased sensitivity, however, comes at a cost in specificity in which the in silico model (prediction rank score 5) predicts ∼39,000 unique binding cores in the MS-sampled antigens in contrast to the ∼2,000 MS identified antigens (see Supplementary Material 5). This large binding core volume can, however, be decreased by imposing a more stringent prediction score threshold. The results of limiting the predictions to 1% rank are shown as orange data points Fig. 9A, B, and demonstrate a maintained increased sensitivity compared with MS alone allowing recovery of between 79% to 88% and the reduced cost of only predicting a total of 10,000 unique cores (see Supplementary Material 5). In summary, these results strongly suggest that immunopeptidomes are optimally sampled by combining MS- and in silico–driven approaches.

4. Discussion

Here, we have analyzed a large volume of MS immunopeptidome data with the purpose of identifying properties associated with antigen processing and MHC presentation, antigen source protein origin and subcellular localization, and the relative location in the source protein. In addition, we wanted to explore potential MS sampling biases and investigate how prediction of HLA presentation can fill the sensitivity gap of MS by learning the rules of specific HLA-allele presentation. The overall conclusions are that (1) the vast majority of identified MS peptides are associated with a strong predicted HLA antigen presentation potential; (2) by excluding co-eluted HLA irrelevant MS contaminants, we observed, in contrast to what have been proposed earlier, that ligands do not display significant biases in source protein positional location; (3) comparing nested and singleton peptide sets revealed an overall higher predicted antigen presentation potential in the former, and a higher proportion of MS contaminants in the latter; (4) nested peptides were found to predominantly be of endolysosomal origin, with singletons of a mixed cytosolic and endolysosomal origin, which may suggest different pathways for MHC presentation; and finally, (5) clear signals of antigen-processing were observed in the context of both endolysosomal- and cytosolic-presented peptides.

These results are somewhat in disagreement with earlier findings, in which strong source protein positional biases between nested and singleton ligands have been proposed.¹⁰ The likely main reason for these differences originates from the inclusion of MS contaminants in the earlier studies. Here, we show that such contaminants do not share the expected binding motifs, display a clear positional bias with respect to the source protein location, and likewise are highly enriched in singletons compared with nested peptide sets.

As mentioned, a proportion of the MS identified peptides were found to be likely HLA-irrelevant MS contaminants. This was the case for both singletons and peptides in nested sets. We are aware that some biases could be introduced in these results from the fact that contaminants were identified as predicted nonbinders and that a proportion of peptides defined as contaminants might be true HLA ligands. However, investigating these contaminants in more detail revealed that they shared a very different length distribution compared with the set of predicted binders, thus further supporting the notion that these peptides are HLA irrelevant. Nonetheless, a minor proportion of the nested sets were found to consist entirely of such contaminants. This was surprising to us because nested peptide sets are a hallmark of HLA class II ligands. However, the length distribution of the peptides in these nested sets was found to be very different to the general distribution of contaminants, and resembled more that of HLA binders. This suggests that these peptides are indeed HLA ligands, though from a non–HLA-DR origin, i.e. likely HLA-DQ or DP.

In the context of different pathways for endolysosomal and cytosolic antigen-presented peptides, it is interesting to mention a previous work by Dengjel et al.,¹ in which stress-induced autophagy was suggested as the main source of cytosolic peptides, showing that upon stress, cells downregulate the expression of specific cathepsins exhibiting higher amounts of undigested peptides. Such mechanisms might be the source of the observed difference in processing signals between peptides of endolysosomal and cytosolic origin.

In the second part of the article, we investigated the issue of the potential low sensitivity of MS immunopeptidomics and low specificity of immunoinformatics prediction methods. Here, we demonstrated that overlap between individual repeated MS experiments performed on samples originating from the same donor is low, and is around 50% when it comes to the set of both identified proteins and HLA binding cores. Further, we illustrated how a comprehensive coverage of the immunopeptidome could only be approximated if including 5 or more MS replicas in the analysis. Both of these observations support the notion of a limited sensitivity of current MS immunopeptidomics.

Comparing the overlap between comprehensive MS immunopeptidomics data from 2 donors with functional divergent HLA-DR molecules confirmed a very strong HLA bias in both the identified source proteins and HLA binding cores, with an overlap of 50% on the source protein level and 15% in terms of binding core. In particular, the low overlap in source protein sampling is striking and suggests a current unappreciated role of HLA class II in defining the protein subset sampled in HLA antigen presentation.

Finally, we investigated the complementarity between an MS and in silico approach for comprehensively sampling the immunopeptidome and demonstrated how a synergistic approach of applying MS immunopeptidomics to antigen discovery and in silico prediction for HLA ligand identification results in a superior coverage of the immunopeptidome compared with any of the 2 approaches alone.

It is important to emphasize that the current work is limited to data generated from a single laboratory and to MS immunopeptidome data limited to HLA-DR immunoprecipitation on monocyte-derived DCs. This particular experimental homogeneity imposes certain biases in the analyzed data and hence potentially weakens the generality of conclusions drawn. However, the homogeneity of the data has reduced potential batch effects that otherwise would stem from comparing data generated at different laboratories using different experimental platforms and protocols. This has allowed us to extract more robust conclusions.

In conclusion, by analyzing a large MS immunopeptidomics dataset (comprising 349,225 unique peptide-allele pairings), and using a recently developed MHCMotifDecon tool that allowed us to assign the most likely HLA-DR restriction allele to each peptide and identify HLA-DR–irrelevant MS contaminants, our analyses have revealed novel insights to the rules of HLA antigen presentation and have demonstrated that optimal immunopeptidome sampling is best performed by combining MS- and in silico–driven approaches.

Supplementary Material

qiae007_Supplementary_Data

qiae007_supplementary_data.pdf^{(547.8KB, pdf)}

Acknowledgments

The research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases under award number 75N93019C00001.

Contributor Information

Emilie Egholm Bruun Jensen, Department of Health Technology, Building 204, Technical University of Denmark, DK-2800 Lyngby, Denmark.

Birkir Reynisson, Department of Health Technology, Building 204, Technical University of Denmark, DK-2800 Lyngby, Denmark.

Carolina Barra, Department of Health Technology, Building 204, Technical University of Denmark, DK-2800 Lyngby, Denmark.

Morten Nielsen, Department of Health Technology, Building 204, Technical University of Denmark, DK-2800 Lyngby, Denmark; Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, B 1650 HMP, Buenos Aires, Argentina.

Authorship

E.E.B.J.: Data curation, formal analyses, visualization and investigation; B.R.: Data curation; C.B.: Data curation, investigation and conceptualization; M.N.: Investigation and conceptualization. All authors: Writing the manuscript.

Supplementary material

Supplementary materials are available at Journal of Leukocyte Biology online.

References

1. Dengjel J, Schoor O, Fischer R, Reich M, Kraus M, Müller M, Kreymborg K, Altenberend F, Brandenburg J, Kalbacher H, et al. Autophagy promotes MHC class II presentation of peptides from intracellular source proteins. Proc Natl Acad Sci U S A. 2005:102(22):7922–7927. 10.1073/pnas.0501190102 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Jurewicz MM, Stern LJ. Class II MHC antigen processing in immune tolerance and inflammation. Immunogenetics. 2019:71(3):171–187. 10.1007/s00251-018-1095-x [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Barra C, Alvarez B, Paul S, Sette A, Peters B, Andreatta M, Buus S, Nielsen M. Footprints of antigen processing boost MHC class II natural ligand predictions. Genome Med. 2018:10(1):84. 10.1186/s13073-018-0594-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Justesen S, Harndahl M, Lamberth K, Nielsen L-LB, Buus S. Functional recombinant MHC class II molecules and high-throughput peptide-binding assays. Immunome Res. 2009:5(1):2. 10.1186/1745-7580-5-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Rudensky AY, Preston-Hurlburt P, Hong S-C, Barlow A, Janeway CA. Sequence analysis of peptides bound to MHC class II molecules. Nature. 1991:353(6345):622–627. 10.1038/353622a0 [DOI] [PubMed] [Google Scholar]
6. Chicz RM, Urban RG, Lane WS, Gorga JC, Stern LJ, Vignali DAA, Strominger JL. Predominant naturally processed peptides bound to HLA-DR1 are derived from MHC-related molecules and are heterogeneous in size. Nature. 1992:358(6389):764–768. 10.1038/358764a0 [DOI] [PubMed] [Google Scholar]
7. Suri A, Lovitch SB, Unanue ER. The wide diversity and complexity of peptides bound to class II MHC molecules. Curr Opin Immunol. 2006:18(1):70–77. 10.1016/j.coi.2005.11.002 [DOI] [PubMed] [Google Scholar]
8. Purcell AW, Ramarathinam SH, Ternette N. Mass spectrometry-based identification of MHC-bound peptides for immunopeptidomics. Nat Protoc. 2019:14(6):1687–1707. 10.1038/s41596-019-0133-y [DOI] [PubMed] [Google Scholar]
9. Nielsen M, Andreatta M, Peters B, Buus S. Immunoinformatics: predicting peptide–MHC binding. Annu Rev Biomed Data Sci. 2020:3(1):191–215. 10.1146/annurev-biodatasci-021920-100259 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Ciudad MT, Sorvillo N, van Alphen FP, Catalán D, Meijer AB, Voorberg J, Jaraquemada D. Analysis of the HLA-DR peptidome from human dendritic cells reveals high affinity repertoires and nonconventional pathways of peptide generation. J Leukoc Biol. 2017:101(1):15–27. 10.1189/jlb.6HI0216-069R [DOI] [PubMed] [Google Scholar]
11. Kaabinejadian S, Barra C, Alvarez B, Yari H, Hildebrand WH, Nielsen M. Accurate MHC motif deconvolution of immunopeptidomics data reveals a significant contribution of DRB3, 4 and 5 to the total DR immunopeptidome. Front Immunol. 2022:13:835454. 10.3389/fimmu.2022.835454 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Attermann AS, Barra C, Reynisson B, Schultz HS, Leurs U, Lamberth K, Nielsen M. Improved prediction of HLA antigen presentation hotspots: applications for immunogenicity risk assessment of therapeutic proteins. Immunology. 2021:162(2):208–219. 10.1111/imm.13274 [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Thomsen MCF, Nielsen M. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res. 2012:40(W1):W281–W287. 10.1093/nar/gks469 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Mellacheruvu D, Wright Z, Couzens AL, Lambert J-P, St-Denis NA, Li T, Miteva YV, Hauri S, Sardiu ME, Low TY, et al. The CRAPome: a contaminant repository for affinity purification-mass spectrometry data. Nat Methods. 2013:10(8):730–736. 10.1038/nmeth.2557 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Almagro Armenteros JJ, Sønderby CK, Sønderby SK, Nielsen H, Winther O. DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics. 2017:33(24):4049. 10.1093/bioinformatics/btx431 [DOI] [PubMed] [Google Scholar]
16. Abelin JG, Harjanto D, Malloy M, Suri P, Colson T, Goulding SP, Creech AL, Serrano LR, Nasir G, Nasrullah Y, et al. Defining HLA-II ligand processing and binding rules with mass spectrometry enhances cancer epitope prediction. Immunity. 2019:51(4):766–779.e17. 10.1016/j.immuni.2019.08.012 [DOI] [PubMed] [Google Scholar]
17. Fisch A, Reynisson B, Benedictus L, Nicastri A, Vasoya D, Morrison I, Buus S, Ferreira BR, Kinney Ferreira de Miranda Santos I, Ternette N, et al. Integral use of immunopeptidomics and immunoinformatics for the characterization of antigen presentation and rational identification of BoLA-DR-presented peptides and epitopes. J Immunol. 2021:206(10):2489–2497. 10.4049/jimmunol.2001409 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Karnaukhov V, Paes W, Woodhouse IB, Partridge T, Nicastri A, Brackenridge S, Scherbinin D, Chudakov DM, Zvyagin IV, Ternette N, et al. HLA binding of self-peptides is biased towards proteins with specific molecular functions. bioRxiv. 2021.02.16.431395. 2021. 10.1101/2021.02.16.431395 [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

qiae007_Supplementary_Data

qiae007_supplementary_data.pdf^{(547.8KB, pdf)}

[qiae007-B1] 1. Dengjel J, Schoor O, Fischer R, Reich M, Kraus M, Müller M, Kreymborg K, Altenberend F, Brandenburg J, Kalbacher H, et al. Autophagy promotes MHC class II presentation of peptides from intracellular source proteins. Proc Natl Acad Sci U S A. 2005:102(22):7922–7927. 10.1073/pnas.0501190102 [DOI] [PMC free article] [PubMed] [Google Scholar]

[qiae007-B2] 2. Jurewicz MM, Stern LJ. Class II MHC antigen processing in immune tolerance and inflammation. Immunogenetics. 2019:71(3):171–187. 10.1007/s00251-018-1095-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[qiae007-B3] 3. Barra C, Alvarez B, Paul S, Sette A, Peters B, Andreatta M, Buus S, Nielsen M. Footprints of antigen processing boost MHC class II natural ligand predictions. Genome Med. 2018:10(1):84. 10.1186/s13073-018-0594-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[qiae007-B4] 4. Justesen S, Harndahl M, Lamberth K, Nielsen L-LB, Buus S. Functional recombinant MHC class II molecules and high-throughput peptide-binding assays. Immunome Res. 2009:5(1):2. 10.1186/1745-7580-5-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[qiae007-B5] 5. Rudensky AY, Preston-Hurlburt P, Hong S-C, Barlow A, Janeway CA. Sequence analysis of peptides bound to MHC class II molecules. Nature. 1991:353(6345):622–627. 10.1038/353622a0 [DOI] [PubMed] [Google Scholar]

[qiae007-B6] 6. Chicz RM, Urban RG, Lane WS, Gorga JC, Stern LJ, Vignali DAA, Strominger JL. Predominant naturally processed peptides bound to HLA-DR1 are derived from MHC-related molecules and are heterogeneous in size. Nature. 1992:358(6389):764–768. 10.1038/358764a0 [DOI] [PubMed] [Google Scholar]

[qiae007-B7] 7. Suri A, Lovitch SB, Unanue ER. The wide diversity and complexity of peptides bound to class II MHC molecules. Curr Opin Immunol. 2006:18(1):70–77. 10.1016/j.coi.2005.11.002 [DOI] [PubMed] [Google Scholar]

[qiae007-B8] 8. Purcell AW, Ramarathinam SH, Ternette N. Mass spectrometry-based identification of MHC-bound peptides for immunopeptidomics. Nat Protoc. 2019:14(6):1687–1707. 10.1038/s41596-019-0133-y [DOI] [PubMed] [Google Scholar]

[qiae007-B9] 9. Nielsen M, Andreatta M, Peters B, Buus S. Immunoinformatics: predicting peptide–MHC binding. Annu Rev Biomed Data Sci. 2020:3(1):191–215. 10.1146/annurev-biodatasci-021920-100259 [DOI] [PMC free article] [PubMed] [Google Scholar]

[qiae007-B10] 10. Ciudad MT, Sorvillo N, van Alphen FP, Catalán D, Meijer AB, Voorberg J, Jaraquemada D. Analysis of the HLA-DR peptidome from human dendritic cells reveals high affinity repertoires and nonconventional pathways of peptide generation. J Leukoc Biol. 2017:101(1):15–27. 10.1189/jlb.6HI0216-069R [DOI] [PubMed] [Google Scholar]

[qiae007-B11] 11. Kaabinejadian S, Barra C, Alvarez B, Yari H, Hildebrand WH, Nielsen M. Accurate MHC motif deconvolution of immunopeptidomics data reveals a significant contribution of DRB3, 4 and 5 to the total DR immunopeptidome. Front Immunol. 2022:13:835454. 10.3389/fimmu.2022.835454 [DOI] [PMC free article] [PubMed] [Google Scholar]

[qiae007-B12] 12. Attermann AS, Barra C, Reynisson B, Schultz HS, Leurs U, Lamberth K, Nielsen M. Improved prediction of HLA antigen presentation hotspots: applications for immunogenicity risk assessment of therapeutic proteins. Immunology. 2021:162(2):208–219. 10.1111/imm.13274 [DOI] [PMC free article] [PubMed] [Google Scholar]

[qiae007-B13] 13. Thomsen MCF, Nielsen M. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res. 2012:40(W1):W281–W287. 10.1093/nar/gks469 [DOI] [PMC free article] [PubMed] [Google Scholar]

[qiae007-B14] 14. Mellacheruvu D, Wright Z, Couzens AL, Lambert J-P, St-Denis NA, Li T, Miteva YV, Hauri S, Sardiu ME, Low TY, et al. The CRAPome: a contaminant repository for affinity purification-mass spectrometry data. Nat Methods. 2013:10(8):730–736. 10.1038/nmeth.2557 [DOI] [PMC free article] [PubMed] [Google Scholar]

[qiae007-B15] 15. Almagro Armenteros JJ, Sønderby CK, Sønderby SK, Nielsen H, Winther O. DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics. 2017:33(24):4049. 10.1093/bioinformatics/btx431 [DOI] [PubMed] [Google Scholar]

[qiae007-B16] 16. Abelin JG, Harjanto D, Malloy M, Suri P, Colson T, Goulding SP, Creech AL, Serrano LR, Nasir G, Nasrullah Y, et al. Defining HLA-II ligand processing and binding rules with mass spectrometry enhances cancer epitope prediction. Immunity. 2019:51(4):766–779.e17. 10.1016/j.immuni.2019.08.012 [DOI] [PubMed] [Google Scholar]

[qiae007-B17] 17. Fisch A, Reynisson B, Benedictus L, Nicastri A, Vasoya D, Morrison I, Buus S, Ferreira BR, Kinney Ferreira de Miranda Santos I, Ternette N, et al. Integral use of immunopeptidomics and immunoinformatics for the characterization of antigen presentation and rational identification of BoLA-DR-presented peptides and epitopes. J Immunol. 2021:206(10):2489–2497. 10.4049/jimmunol.2001409 [DOI] [PMC free article] [PubMed] [Google Scholar]

[qiae007-B18] 18. Karnaukhov V, Paes W, Woodhouse IB, Partridge T, Nicastri A, Brackenridge S, Scherbinin D, Chudakov DM, Zvyagin IV, Ternette N, et al. HLA binding of self-peptides is biased towards proteins with specific molecular functions. bioRxiv. 2021.02.16.431395. 2021. 10.1101/2021.02.16.431395 [DOI] [PMC free article] [PubMed]

PERMALINK

New light on the HLA-DR immunopeptidomic landscape

Emilie Egholm Bruun Jensen

Birkir Reynisson

Carolina Barra

Morten Nielsen

Abstract

1. Introduction

2. Methods

2.1. MS-eluted MHC-II peptides

2.2. Motif deconvolution and binding core identification

2.3. Nested sets and singletons

2.4. Relative location of peptides in their source proteins

2.5. Investigation of the contaminant-delivering proteins

2.6. Assigning subcellular localization

2.7. Differences between endolysosomal and cytosolic peptides

2.8. Coverage of MS immunopeptidomes

3. Results

3.1. MHC presentation potential

Fig. 1.

3.2. Limited evidence for bias in relative location of ligands in their source proteins

Fig. 2.

3.3. Investigation of the contaminant-delivering proteins

Table 1.

3.4. Effects on source protein subcellular location

Fig. 3.

Table 2.

Fig. 4.

3.5. Antigen processing signatures

Fig. 5.

3.6. Overlap between measured and predicted MS immunopeptidomics

Fig. 6.

Fig. 7.

Fig. 8.

3.7. Combining experimental and in silico methods uncover MHC-II ligands at low cost

Fig. 9.

4. Discussion

Supplementary Material

Acknowledgments

Contributor Information

Authorship

Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases