Skip to main content
PLOS One logoLink to PLOS One
. 2025 Sep 25;20(9):e0332950. doi: 10.1371/journal.pone.0332950

Framework for analyzing MAE-derived immunopeptidomes from cell lines with shared HLA haplotypes

Queenie W T Chan 1, Teesha C Baker 2, Chia-Wei Kuan 3, Lucy Song 1, Hongbing Yu 4, Leonard J Foster 1,*
Editor: Shahina Akter5
PMCID: PMC12463239  PMID: 40996974

Abstract

Background

The goal in vaccinology is to identify candidate antigens for clinical trials that will elicit an immune response for a significant portion of the target population. Unfortunately, promising data generated at the preclinical level often cannot be replicated in larger sample sizes. The goal of this project was to develop a methodology for processing MAE-generated data to identify MHC epitopes, minimize non-specific contaminants, find binding motifs, and utilize genetic connections among donors to determine which peptides were presented by specific MHC alleles.

Results

Our approach demonstrated that mild acid elution of peptides from seven consanguineous B-lymphocyte lines accurately reflects the HLA genotypes within family members, highlighting the specificity of MAE. Additionally, the data successfully reproduced known MHC binding motifs and partially deconvoluted the originating HLA alleles of the epitopes.

Conclusions

These findings suggest that our approach could be applied to numerous cell lines globally to evaluate a wide array of HLA haplotypes. This may help to reveal candidate vaccine antigens that induce immune protection for a wider population.

Introduction

Vaccines are the best preventative measure against infectious diseases, having saved millions of lives at relatively little cost. They are also the only conceivable way of completely eradicating a disease [1], benefiting the health of both current and future generations. Despite advancements such as subunit and mRNA vaccines, there are still many diseases without effective vaccines, suggesting that more systematic approaches must be taken. Currently, “reverse immunology” is the conventional method for discovering subunit vaccines: hundreds or thousands of potential antigens are synthesized and tested individually for their ability to elicit an immune response in animal models. Not only is this a time-consuming process, but there is little guarantee that the results will translate to humans. Even existing vaccines do not elicit protective immunity in all recipients, with some at efficacies of 50% or lower [2]. One major reason for this is the unparalleled genetic diversity of MHC proteins among the human population [3]. This might be overcome by testing vaccine candidates in more individuals in the early phase of testing, especially selecting participants from the vaccine’s target demographic, but that would be financially unworkable. Nevertheless, this is at least, in principle, a crucial aspect of evidence-guided, rational vaccine design. A more practical approach would be to instead use many human-derived cells from this population to identify immunodominant antigens prior to clinical testing. With the rapidly improving techniques surrounding immunopeptidomics, i.e., the mass spectrometric analysis of MHC-bound peptides, this can be a real possibility.

Traditionally, antigen discovery by immunopeptidomics involves using antibodies that selectively precipitate MHC proteins from lysed cells, then eluting presented peptides for identification by mass spectrometry (MS). This method is widely used but it requires significant time, effort, and expertise, making this impractical to apply to many cell types. Immunoprecipitation also tends to select only for strongly bound peptides, which may not be the most immunogenic in vivo since immunodominance is also highly affected by protein abundance [4]. The procedure also requires multiple immunoprecipitation reactions with different antibodies to cover all MHCs that might be present in a system, and cell lysis prior to the immunoprecipitation means that all MHC-bound peptides are sampled instead of just the ones on the cell surface. Alternatively, the mild acid elution (MAE) technique, where peptides are stripped directly from the cell surface, is growing in popularity [5]. While this technique is relatively easier and quicker to carry out, the complicated variety of antigen-presenting proteins [6] and other proteins on the cell surface and in the extracellular milieu means the results are more complicated to interpret.

Here we present an analytical framework for MAE-based immunopeptidomics that is fast, sensitive, and specific and can be applied to any antigen-presenting cell. This can pave the way to study many cell lines from a vaccine’s target population, analogous to recruiting a large number of trial participants. To illustrate this concept, we tested it on B lymphocyte cell lines from a family of seven – father, mother, and five children – discovering which antigen consensus sequences are most likely to be immunodominant for this group. Although this is a small sample size, the shared MHC alleles among these individuals reflect the genetic variations found within an ethnic group. Using samples collected by MAE, we focused on the narrow length of MHC I-presented ligands and the tendency of MHC II to present nested (i.e., overlapping or “ragged”) peptides [7]. The goal of this project was to develop a methodology for processing MAE-generated data to identify MHC epitopes while minimizing non-specific contaminants, finding binding motifs, and test how we can take advantage of the genetic connections among the donors to determine which peptides were presented by specific MHC alleles. Ultimately, this work aims to advance the rational selection of vaccine antigens at the preclinical stage. By incorporating HLA alleles from the target population in cultured cells, researchers can address the diverse HLA polymorphisms early in the process, rather than during costly clinical trials where many vaccines fail to show immune efficacy among participants.

Materials and methods

Cell lines

Seven immortalized B-lymphocyte cell lines from Family 243 (father GM2705, mother GM2707, five children by birth order GM2728, GM3027, GM2713, GM2709, GM2711) were purchased from the Coriell Institute for Medical Research. Partial HLA genotypes for each were available from the Coriell Institute and these were supplemented with additional typing using HLA Fusion 3.2.0.13925 (service provided by the Vancouver General Hospital Immunology Laboratory). Cell lines LBL-721 and MHC-mutant LBL-721–174 were kind gifts from Dr. Paul Sondel (Department of Human Oncology, University of Wisconsin School of Medicine and Public Health). These cells were maintained in RPMI-1640 medium with varying percentages fetal bovine serum according to manufacturer’s protocol.

Generation of stable cell line with lentivirus infection

HLA-A*02 shRNA construct (Clone ID: TRCN0000057238, GE Healthcare) and a non-mammalian-targeting control shRNA (SHC002, Sigma) were packed into lentivirus particles in 293T/17 cells (ATCC) using the lentiviral packaging mix (Sigma). GM02709 cells, one of two B lymphocyte cell lines that possess homozygous HLA-A*02 alleles, were transduced with these lentivirus particles in the presence of 8 µg/mL polybrene. Stable knockdown cells were selected with puromycin (1 µg/mL) for two weeks.

Mild acid elution

A total of 5x107 to 2x108 cells were required for each biological replicate, with a minimum viability threshold of 85%. All experiments were done in triplicate. Cells were harvested for elution by centrifuging cells (Thermo Scientific, Sorvall T1 centrifuge) at 500 x g for 3 min in 50 mL conical tubes. This and all following steps were performed at 4oC to prevent sample degradation and minimize proteolytic cleavage. Note that it is impossible to differentiate between peptide degradation due to sample handling and natural antigen processing by the immunoproteasome or cathepsins [8]. Cell pellets were washed sequentially with 10 mL of room temperature 1X PBS and then twice with 10 mL with cold PBS. To remove residual phosphate salts, cells were then resuspended in 10 mL of cold saline, made with the same sodium chloride and potassium chloride concentration as 1X PBS, but without either of the cations’ phosphate salts, then transferred to new conical tube. Following the same centrifugation conditions, the supernatant was once again discarded. The elution of presented antigens was achieved using a solution of 2% acetic acid in the same 1X cold saline as above. Cells were pelleted at 1000 x g for 5 minutes. The immunopeptide-supernatant was collected in a new tube then frozen at −80°C overnight, or snap frozen in liquid nitrogen then lyophilized until dry.

Preparing peptides for MS analysis

Desalting peptides.

Immunopeptide samples were thawed or resuspended in 1 mL of Desalting Buffer A (2% acetonitrile in 0.1% TFA), ensuring that the sample has been acidified to below pH 2.5. Using the Stop-And-Go Extraction (STAGE) tip desalting protocol [9] desalting columns were prepared by cutting out small cores (using a 14-gauge flat-tipped syringe needle) from Empore C18 solid phase extraction discs and inserting them into P200 pipette tips. The C18 material was made wet by passing 50 µL of methanol, using the pressure generated by applying force on the plunger of a 10 mL plastic syringe to push the liquid through. The columns were then conditioned by passing 100 µL of Desalting Buffer A. Peptide samples were applied to the column, then washed twice with 200 µL of Desalting Buffer A. Peptides were eluted with 80 µL of Desalting Buffer B (30% acetonitrile in 0.1%) into a clean 96-well plate or microtubes. Finally, the samples were dried by vacuum centrifugation before proceeding to the next step.

Offline liquid chromatography fractionation.

Desalted and dried samples from the LBL-721 and LBL-721–174 cell lines were resuspended in 23 µL of Fractionation Buffer A (2% acetonitrile in 5 mM NH4HCO2, pH 10) and fractionated through a 36 min gradient on an Agilent ZORBAX Extend 80 Å C18 column. The separation was performed at high pH, using a gradient that ran from 0% Fractionation Buffer B (90% acetonitrile in 5 mM NH4HCO2, pH 10) to 4% B over 30 seconds, then increased to 40% at 20 min, held at 90% for 5 min, and equilibrated for the remaining 10.5 min at 0% B. Each sample was separated into 12 fractions (~120 µL per well) in a 96-well plate then desalted and dried once more.

Sample resuspension.

Prior to MS analysis, samples were reconstituted in 20 µL of MS Buffer A (2% acetonitrile in 0.1% formic acid). A NanoDrop One (ThermoFisher Scientific, A205nm, Scopes mode) was used to spectrophotometrically quantify the amount of peptide in 1.5 µL of the sample. Peptides were diluted to 1 μg/μL for a 2 μL injection in MS analysis.

Online liquid chromatography and mass spectrometry (LC-MS/MS)

LC-MS/MS of peptides from LBL-721 and LBL-721–174 cell lines.

Samples were reconstituted in 20 µL of MS Buffer A (2% acetonitrile in 0.1% formic acid). Purified peptides were analyzed using a timsTOF Pro (Bruker Daltonics) mass spectrometer on-line coupled to an Easy nano LC 1000 HPLC (ThermoFisher Scientific) with a C18 column using a Captive spray nanospray ionization source (Bruker Daltonics). Samples were separated with a 45 min gradient, run from 5% MS Buffer B (90% acetonitrile in 0.1% formic acid) to 30% B over 45 min, then increased to 100% B over 2 min, held at 100% B for 13 min. The mass spectrometer was set to acquire in a data-dependent PASEF mode with fragmenting the 10 most abundant ions, including +1 ions by drawing the ion mobility zone of interest to include +1 ions (one at the time at 18 Hz rate) after each full-range scan from m/z 100 Th to m/z 1700 Th. The nano ESI source was operated at 1900 V capillary voltage, 3 L/min drying gas and 180°C drying temperature. Funnel 1 was set at 300 V, funnel 2 at 200 V, multipole RF at 200 V, deflection delta at 70 V, quadrupole ion energy at 5 eV, low mass at 200 Th, collision cell energy at 10 eV, collision RF at 1500 V, transfer time at 60 µs, and pre-pulse storage at 12 µs. PASEF was on with 10 PASEF scans for charges 0–4, Target intensity 20000 and Intensity threshold 2500.

LC-MS/MS of peptides from Family 243 cell lines.

Samples were separated using an EASY-nLC 1000 system with a C18 column coupled to a Q-Exactive mass spectrometer (Thermo Scientific). Peptides were eluted with a gradient from 100% buffer A (0.5% acetic acid) to 40% buffer B (80% ACN, 0.5% acetic acid) over 142 min at a constant flow of 250 nL/min. The mass spectrometer operated in a data-dependent acquisition mode, with fragmentation of the five most abundant ions per scan and dynamic exclusion of 30 seconds enabled. To maximize peptide identification, fragmentation was allowed for ions of +1 charge (default settings exclude +1), since the peptides were not digested by trypsin and therefore does not always have a + 2 charge in an acidic environment. MS resolution was set to 70000 with an automated gain control (AGC) target of 3x106, maximum fill time of 20 ms and a mass window of 300–2000 m/z. Higher collision dissociation (normalized collision energy 26 with 20% stepping, done in accordance to previous findings to obtain optimal spectra [10] was performed with an AGC target of 1x106, maximum fill time of 120 ms, mass resolution of 70000, and charge exclusion set to unassigned.

Processing raw data to identify peptides.

All mass spectrometry raw data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository [11] with the dataset identifier PXD058267.

Analysis of MAE data from HLA-A*02 knockdown cells

Raw data from HLA-A*02 knockdown and control cells were searched using MaxQuant (v 1.4.1.2) [12]. Default values were selected with these notable exceptions: Match Between Runs checked, unspecific digestion mode (i.e., no enzyme), no fixed modifications, N-terminal protein acetylation and methionine oxidation as variable modifications, reverse decoy mode, initial search using a human first search database provided by MaxQuant, default contaminant database used, peptide-spectrum match and protein false-discovery rate (FDR) of 0.1. The search was conducted against a protein database containing UniProtKB/TrEMBL human sequences (88844 sequences), cow sequences (2151 proteins, taken from bovine 32006 entries as of October 2013) that were identified in a separate search against cow sequences alone, and 11 likely viral contaminants including the Epstein-Barr virus, adenovirus, and bovine diarrhea virus.

Peptides found in at least 2 of 3 biological replicates were considered identified to produce a confident and reproducible dataset. After normalizing the peptide intensities, we calculated a knockdown/control presentation ratio for all peptides of human origin. Peptides found in the knockdown cells only were arbitrarily assigned an arbitrary but high ratio of 50, and ones found only in the negative control were given the reciprocal ratio of 0.02. The average remaining HLA-A*02 protein after knockdown was 21%, as estimated by western blot so we used this knockdown factor as a cut-off to determine whether a given peptide was or was not affected by the knockdown. Of those that were affected, 87% were less than 12 residues long, suggesting that they are MHC I peptides and consistent with the knockdown of one of the MHC I genes.

Analysis of MAE data from LBL-721 and LBL-721–174 cell lines

Data was searched using Byonic software (Dotmatics) against a database composed of all human annotated and reviewed protein entries (Uniprot, Swiss-Prot). A database of bovine proteins (Uniprot, Swiss-Prot) was added to a standard decoy database of common contaminants. A non-specific in silico digestion was performed with peptide lengths from 7–25 amino acids. Parameters used included precursor mass accuracy of 70 ppm and fragment mass accuracy of 40 ppm. A common variable modification of methionine oxidation was included. A 1% FDR cut off was used at the protein level with the software automatic score cutoff applied at the peptide level. A Byonic peptide score of 2, approximately corresponding to a p-value of 0.01, was used to remove low confidence identifications.

Analysis of cell lysate data from LBL-721 and LBL-721–174 cell lines and MAE data from family 243 cells

Data was searched using Fragpipe [13] version 19.1 against a database of Human Uniprot sequences including isoforms (104597 sequences, downloaded on May 16, 2024) with Bovine, contaminant and reverse decoy sequences. Default values were selected with these notable parameters: Match Between Runs checked, non-specific enzyme, no fixed modifications for MAE data or carbamidomethyl modification for cell lysate data, N-terminal protein acetylation and methionine oxidation as variable modifications. Other key parameters include peptide lengths set at 7–25 amino acids, precursor mass accuracy of 15 ppm and fragment mass accuracy of 20 ppm, and 1% FDR cut off was used at the protein level and peptide level, the acceptable standard used in mass spectrometry. For cell lysate data, peptides were considered identified if it was found in at least two out of three replicates. Data visualization and statistical analysis for cell lysis data was performed with FragPipe-Analyst [14] to generate a volcano plot.

Organizing peptides into potential epitopes

To remove sequences with low technical confidence, all epitopes (whether derived from single or overlapping peptides) with a total spectral count of 2 or fewer were removed from further analysis. Peptides with 8 or fewer residues were also rejected. Peptides whose sequences overlapped with others were condensed into the longest possible outcome and considered as epitopes. Peptides that did not overlap with any others were also considered as epitopes. Epitopes measuring 9–11 amino acids were sorted into the MHC I group, and 12–20 mers were placed into the MHC II group.

Multiple sequence alignment

Clustal W2 [15] was used for multiple sequence alignment (MSA), selecting the BLOSUM matrix for slow pairwise alignment and the GONNET matrix for multiple alignment, with gap open penalty of 100 and gap extension penalty of 10 for both cases.

Amino acid positional score and positional threshold score

Using the text outputs provided by the Weblogo tool [16], we extracted the following parameters: weight (W), entropy (E), its lower limit (L) from each position in every sequence logo, the number of occurrences of amino acid X (Ax) and the total number of amino acids at position n (Nn). Then an Amino Acid Positional Score (S) and Positional Threshold Score (R) was calculated using:

= (Ax/Nn)* E* W (1)
= E  L (2)

Alignment positions where no residues exceed the threshold are given a null value, indicated by a dash in the output. Positions with significant residue(s) are written out in standard regular expression format, with bold font, and grayscale formatted as before with the highest Amino Acid Position Score, capped at 1.0 which is defined as 100% black, and all other values in decreasing darkness with all values below 20% displayed as 20% black. Residues which only surpass the Positional Threshold only when grouped by their physiochemical properties are displayed in the same format, but with a smaller, non-bold font. Summarized views of the consensus sequences are made by displaying only the residues with top 4 Amino Acid Positional Scores.

Results

MAE is capable of isolating MHC I and II epitopes

To confirm the efficacy of MAE for isolating MHC I-presented ligands, we performed a shRNA knockdown of HLA-A*02 (S1 Fig, see S1 File for original blot) on the immortalized B lymphocyte cell line GM2709 [17,18] which is homozygous for this allele. Peptides were placed in the MHC I (9–11 residues) or MHC II (12–20 residues) group. Peptides with overlapping sequences were aligned and condensed, with their total span in length treated as epitopes for further analysis. Using these simple criteria, the final FDR was effectively zero as there were no hits against the reverse sequence database after applying these multiple constraints. MSA [15] followed by sequence logo [16] analysis of epitopes that were diminished in the HLA-A*02 knockdown compared to the control and revealed a consensus sequence that was identical to that known for HLA-A*02 (Fig 1A). Furthermore, 58% of the epitopes that were directly affected by the knockdown were marked as specific binders (strong or weak) by NetMHCpan [19], with another 20% designated as coming from the other MHC I molecules (HLA-B*07:02, HLA-B*35:01, HLA-C*07:02). Assuming that the remaining 22% are cell surface contaminants, this suggests that MAE can effectively produce peptide samples that are specific to MHC I.

Fig 1. Using MAE to isolate MHC I and MHC II ligands.

Fig 1

(A) Sequence logo (top) generated from epitopes that were found in the WT cell but not in the HLA-A*02 knockdown closely matches the HLA-A*02:01 motif from naturally presented ligands (bottom, NetMHCpan). (B) A volcano plot of the log2 fold change of proteins in a whole cell lysate of WT (LBL-721) and mutant (LBL-721-174) cell lines. Proteins associated with the MHC I (red) and MHC II (blue) antigen presentation machinery are highlighted by colored dots. Venn diagram of epitopes in the (C) 9-11 residues long MHC I and (D) 12-20 residues long category found in the WT and mutant cells.

To test the effectiveness of MAE for MHC II-presented peptides, we performed a comparable experiment on a gamma irradiated B-cell line LBL-721–174 [20,21] compared to its WT counterpart LBL-721 [22,23]. The mutant was missing 241 MHC I epitopes (71% of total detected in the 9–11 mer range, Fig 1B), consistent with a comparable (88–97%) reduced expression of major MHC I proteins (Fig 1C, S2 Fig). As for epitopes in the 12–20 mer (MHC II) group, one would expect to find no epitopes in the mutated cells, since according to next-generation sequencing results these cells lack both alleles of HLA-DRB1, HLA-DPA1, HLA-DPB1, HLA-DQA1, and HLA-DQB1 (S3 Fig). MS data indicates that there was still about 5% expression of these MHC II proteins (Fig 1B, S2 Fig). As such, we were surprised to detect 218 epitopes that account for almost 50% of all sequences in the MHC II group (Fig 1D), implying that about half of MAE-derived MHC II peptides are non-specific contaminants. This may be because longer peptides tend to be more hydrophobic, therefore more easily adsorb to any surface [24]. This suggests that while MAE can isolate MHC II epitopes, the method’s specificity must be improved by eliminating contaminating sequences.

Benchmarking epitope sequences derived from peptide data

Applying MAE on MHC mutant or knockdown cells can differentiate between true MHC-bound peptides from non-specific contaminants. However, this advantage does not exist when we apply MAE to WT cells. In striving to filter out contaminating peptides, we removed those measuring 8 or fewer amino acids (the MHC binding groove prefers for 9-mers [7]). To avoid “overcounting” highly similar peptides with shared sequences (produced by immunoproteasomes or antigen processing cathepsins [8], or as a result of sample handling), we coalesced them into epitopes. This avoids the illusion that a sample contains many specific ligands when in fact, many of them are just cleavage products of a larger ligand. They were classified as MHC I and II according to length as before, but MHC II epitopes must be composed of three or more overlapping peptides since this is one of their hallmark traits [25]. This approach, which was tested on seven different human-derived B-lymphocyte cell lines that have been immortalized but otherwise unmodified, is explained in detail in Methods and summarized in Fig 2A.

Fig 2. Benchmarking epitope sequences derived from peptide data.

Fig 2

(A) A flowchart describing how MAE-derived peptide data are processed, resulting in sequences that are either rejected or retained for further analysis. (B) These sequences are then benchmarked by using NetMHCpan (for epitopes of 9-11 amino acids in length) or NetMHCIIpan (for epitopes of 12 + amino acids in length) that tag entries either as a specific binder (weak or strong) or a non-binder against any of the queried MHC alleles, thereby allowing each MAE-derived sequence to be categorized as a true positive (TP), false negative (FN), false positive (FP), or true negative (TN).

To evaluate whether these data filters were effective at producing an accurate set of epitopes that are relevant to the HLA alleles of a given cell line, we used NetMHCpan [19] and NetMHCIIpan [25] with default parameters to benchmark our data. Both algorithms are widely used, accommodate diverse alleles, and allow the user to input sequences of various lengths and query them against almost any MHC isotype and allele. Our goal was to produce epitope sets that contain a high percentage of true positives (TP) and minimum false positives (FP) (Fig 2B) which can be calculated as precision (P = TP/(TP + FP)), also known as the positive predictive value (PPV). From the seven cell lines we analyzed, the MHC I epitope set showed an average precision of 93% (min = 87%, max = 96%) while the MHC II set averaged 55% (min = 41%, max = 67%) (see S4 Fig for individual results). This prompted us to question whether there is any evidence indicating that MAE can effectively isolate MHC II-specific epitopes.

MAE derived epitope sequences accurately reflect MHC allelic variations among related individuals

In the experiment described, we independently analyzed seven cell lines and found that some peptides were present in nearly all of them, while others appeared only in specific subsets. This was expected since these lines were derived from a single donor family, sharing HLA alleles (Fig 3A). If MAE can isolate true MHC binders, the relative amounts of their shared ligands should reflect their genetic relatedness, rather than being random. To test this, we first calculated the average spectral count (a common MS-based value that reflects peptide abundance [26] of each MAE-derived peptide from the seven cell lines, which we termed the Peptide Index (Pi). We then condensed the peptides into epitopes as previously described and assigned each epitope a value called the Epitope Index (Ei), equal to the sum of the Pi values of all its constituent peptides. Epitopes were categorized into MHC I or II groups, and sequences from each group were hierarchically clustered based on their Ei values using centroid linkage mode (Cluster 3.0 [27]). The clustering patterns of all nodes in the cell line dimension for both MHC I and MHC II epitopes were consistent with the HLA genotypic relatedness among the donors, showing slight differences in node distances between the two groups and the placement of the parents (Fig 3B). This strongly suggests that the method is highly specific for isolating presented peptides, with their abundance and sequence directly linked to their MHC alleles.

Fig 3. HLA genetic relatedness of B-lymphocytes from a seven-member donor family.

Fig 3

(A) HLA alleles of each family member are shown. (B) Highly similar clustergrams were generated from Ei values for both epitopes in the 9-11 mer group (top, MHC I) and 12-20 mer group (bottom, MHC II) that mirror the HLA genetic relatedness of the family members. Non-numbered individuals refer to the parents and numbered individuals are the children based on birth order.

MAE performed on consanguineous antigen presenting cells can be used to define HLA allele-specific consensus sequences

One key goal of immunopeptidomics is to define the consensus binding motif for individual MHC alleles. Identifying peptides presented by a specific MHC allele is impossible due to the absence of allele-specific antibodies. While MAE on a single cell line provides quick insights into immunodominant peptides, it cannot determine which MHC allele presented them. However, combining the analysis of genetically related antigen-presenting cells may correlate specific epitopes and their relative abundance with individual MHC alleles (Fig 4A). Furthermore, clustering epitopes based on their Ei values should sort them into nodes, and their sequence similarity should be revealed by MSA. For MHC I, we limited alignments to a maximum length of 12 residues due to the 9–11 residue range of MHC I epitopes (example in S2 File). For MHC II, we set a maximum alignment length of 30 residues (example in S3 File), since their long lengths make it difficult to produce high quality alignments [28]. We also removed nodes with less than 9 or more than 100 MHC II epitopes since they also resulted in poor alignment, ultimately retaining 750 MHC I and 150 MHC II high quality nodes (Fig 4B). On closer inspection, sequences in some nodes displayed Ei patterns that were present in some family members but absent in others, which are traceable to their HLA genes. This allowed us to identify or at least narrow down the HLA allele responsible for presenting specific ligands by comparing the experimentally derived Ei profiles to theoretical profiles for MHC I (Fig 5A) and MHC II (Fig 5B).

Fig 4. Strategy for analyzing consanguineous cell lines.

Fig 4

(A) Analyzing the Ei profiles of epitopes presented by theoretical cell lines X, Y, and Z that have partial overlap of HLA-A, HLA-B, and HLA-C genotypes can be used to trace the origin of a given epitope back to its originating allele. (B) A flow chart of how MAE-derived peptides from related cell lines can be processed together through a series of steps (condensing peptides into epitopes, hierarchical clustering, and multiple sequence alignment (MSA)) to predict MHC binding motifs.

Fig 5. Deriving HLA allele-specific consensus sequences from MAE-derived peptide data.

Fig 5

Theoretical Ei profiles for (A) MHC I and (B) MHC II genes for all seven members of the donor family, with non-numbered individuals referring to the parents and numbered individuals are the children based on birth order. Grey cells indicate Ei > 0 while white cells refer to Ei = 0. Positions 1-9 are predicted to be the region that fits into the MHC binding grove, with typical anchor positions highlighted (shaded circles). Shown are examples of (C) MHC I and (D) MHC II nodes that contain aligned epitopes, visualized as sequence logos, with similar Ei profile family members. Epitopes in these nodes were aligned and visualized as sequence logos. Corresponding known HLA motifs from NetMHCpan and NetMHCIIpan derived from a database of naturally bound ligands [29] are shown as evidence of this technique’s ability to reproduce them. Positions 1-9 are predicted to be the region that fits into the MHC binding grove, with typical anchor positions highlighted (shaded circles).

Aligned epitopes within these high-quality nodes were visualized by sequence logo plot [16], and many of them, in particular the MHC I epitopes, showed consensus sequences conforming to the major anchor sites at positions 2 and 9 of the binding core of MHC I-bound peptides, sometimes with additional auxiliary anchors. Our relatively small dataset was able to accurately reproduce every HLA-A and HLA-B consensus sequence that could be possible from these seven individuals (Fig 5C). Nodes that resulted in the weakest consensus sequences tended to contain epitopes that were found across MAE isolates from most or all seven donors in the family (not shown), suggesting that they are contaminants. The MHC II results were more difficult to interpret because proteins such as HLA-DP and HLA-DQ are each composed of two polymorphic genes whose products can each form two cis- and trans- heterodimers. This complexity leads to weak consensus sequences, and MHC II-presented peptides are generally less studied than MHC I proteins, resulting in less reliable data for comparison. Within our data, we were able to find some MHC II nodes with sequences that form a rough consensus at positions 1, 4, 6, and 9 in the core binding region (Fig 5D). The overall lack of clear consensus sequences corroborates what is known about the more variable nature of MHC II-bound peptides.

Visualizing consensus sequences

With the sheer numbers of nodes in this dataset, it is important to be able to find consensus sequences without needing to visually inspect each resultant sequence logo. Hence, we relied on the sequence logo algorithm to apply compositional bias compensation and Bayesian statistics at the 95% confidence level to discover consensus motifs. This allows logos (Fig 6, item I) to be summarized various one-line text outputs (Fig 6, items II, III, and IV) that can be easily viewed en masse or manipulated by simple scripts. Only significant amino acids at their relevant positions are shown, and non-significant positions are replaced by dashes. This effectively condenses each sequence logo into one visually intuitive line of text per node. Collapsing consensus sequences further by grouping chemically similar amino acids improves sensitivity; this is especially important for MHC II consensus sites, which are known to be more varied than MHC I.

Fig 6. Different ways to display a sequence logo.

Fig 6

One example node with epitopes (I) displayed as a sequence logo, as a single line output with (II) relative Amino Acid Positional Score (S) in grayscale (white = 0%, black = 100% or if S > 1) for amino acids that exceed threshold R (bold), or amino acids that pass threshold only when “grouped” by their physiochemical properties (not bold). (III) Shows only the amino acids with the top 4 highest S and also pass threshold, and (IV) similar to (III) but also displays non-significant positions contained within the consensus are represented by dashed lines.

Evaluating the impact of epitope clustering and MSA on MHC I and II ligand precision

We wanted to test whether combining analyses of related cell lines, along with hierarchical clustering and subsequent MSA analysis could increase precision for both MHC I and MHC II ligands. These additional steps served as extra data filtering measures, removing poorly aligned epitopes that likely did not originate from the binding groove of an MHC molecule. To investigate this, we compiled epitopes from the nodes that were retained after filtering and used NetMHCpan and NetMHCIIpan to benchmark them as before (detailed results in S4 File). For MHC I epitopes, precision generally exceeded 90% for well-deconvoluted sequences, sometimes even surpassing the precision achieved when analyzing cell lines individually Interestingly, queries involving the less-studied HLA-C showed poorer performance. For instance, 38 epitopes traced back to HLA-C*07:01 achieved only 42% precision. This issue was even more pronounced for MHC II epitopes traced solely to HLA-DQ and/or HLA-DP alleles, which consistently yielded precision of 20% or less. Conversely, epitopes deconvoluted to five or fewer alleles and searched against at least one of the well-studied HLA-DRB1 molecules demonstrated precision values ranging from 42% to 87%, with an average of 60%. This represents some improvement compared to the average precision of 55% from cells analyzed independently, without hierarchical clustering and MSA. These findings suggest that the additional data processing steps have positively contributed to identifying MHC II-specific ligands.

Discussion

Here we have shown that MAE used on antigen-presenting cells with connected genetic lineage, followed by a series of data processing steps that involve clustering epitopes by their abundance followed by sequence alignment, can help define allele-specific consensus sequences, particularly for MHC I proteins. Furthermore, this approach enables the deconvolution of these MAE-isolated peptides to their respective MHC molecules without the use of mono-allelic or knockdowns cells with reduced complexity in the HLA genes they express.

Historically, the majority of immunopeptidomic data has been generated with immunoenrichment methods. The use of MAE has only slowly increased in the past decade as there have long been doubts about its lack of specificity [5,30] regarding the coelution of non-MHC-bound peptides. One paper [30] suggested that only 40% of an MAE isolate are MHC I peptides while the rest are contaminants, yet our data from the LBL-721–174 mutant cell line indicates that the number is closer to 70%. It has also been said that the acidic environment of MAE is not enough to elute MHC II peptides [5,31]. However, if this was true, MHC II epitope clustering by Ei values would not have been able to replicate the HLA genetic relatedness between a family of cells lines. This provides some support for the use of MAE for isolating immunologically relevant ligands from antigen-presenting cells, even though its specificity towards MHC II epitopes is clearly inferior to MHC I epitopes. This problem is not a new one [32] – identifying a 9-mer core region within the longer peptide lengths of MHC II epitopes by comparing against known ligands is prone to alignment errors that lowers the accuracy of prediction algorithms [28]. This also highlights a potential downside of relying on tools such as NetMHCpan and NetMHCIIpan for benchmarking epitopes’ binding specificity, since their accuracy is highly dependent on the quality of the training data taken from epitope databases [33]. When we tested these tools using hundreds of randomly generated soluble peptides [34] of 9–11 (S5 File) or 12–20 residues (S6 File) against the HLA haplotypes of our seven cell lines, NetMHCpan falsely tagged an average of 13% of these negative control sequences as a specific ligand of at least one of the queried MHC I alleles, while NetMHCIIpan erroneously labeled 15% as specific binders for at least one relevant searched allele or allele pair. Clearly, using informatics tools to predict ligands or benchmark experimental data comes with an innate possibility for error. One can reasonably argue that the immunoprecipitation method, which yields a purer sample of MHC ligands [35], is a better way to validate MAE-derived epitopes. Nevertheless, the ultimate validation is the use of biological assays that test for CD8 T cell activation by measuring cell proliferation or cytokine release. These tests are critical to linking any immunopeptidomics-based ligand discovery methodology with vaccine design. Epitopes that we identified here in this manuscript have not been validated by functional tests for their immunogenicity, and as such, our approach can only be regarded as a ligand and binding motif prediction tool at this time.

One of the most surprising aspects of the work presented here is that our relatively small dataset was able to recapitulate virtually all the known binding motifs for the relevant MHC I alleles that have been built up from a decade or more of dozens of labs collecting immunopeptidomic data [33]. Results were less clear for MHC II, yet the lack of clear consensus is not necessarily a failing of the methodology but reflects the ability of the MHC IIs’ binding groove to accept a variety of secondary structures that contributes to the immune system’s sensitivity for a range of foreign antigens. This propensity for promiscuous binding implies that there may be no real way to limit contamination in MHC II MAE isolates. However, since the quality of tools to isolate and analyze MHC II presented peptides continues to lag far behind those for MHC I, any attempt in improvement should be taken as moving towards right direction. Furthermore, the work here is only demonstrated through a single family of seven donor cell lines, and their genetic bias with highly overlapping HLA genotypes among its members rather limits our methodology’s ability to deconvolute MAE-derived epitopes back to their originating MHC. A better approach would be to use cell lines that share some alleles but fewer than that among parents and children. Investigating cell lines sampled from a defined ethnic group should fit the purpose, since they tend to express some alleles in common while maintaining a degree of genetic diversity. To illustrate how this can be highly effective, using our approach on merely 10 more cell lines to supplement the existing seven-donor family data will enable the epitopes to theoretically be traced back to an additional 13 MHC II alleles, up from just one from investigating exclusively the seven consanguineous cells (Fig 7, details in S7 File).

Fig 7. Effects of adding more cell lines to the epitope clustering approach.

Fig 7

This graph demonstrates that for each cell line (available at the International Histocompatibility Working Group) added to the epitope clustering approach, it increases the possibility to trace a ligand back to its origin MHC by approximately one more allele.

Expanding our research to a much larger population by sourcing hundreds of cells from the 1000 Genomes Project [36] is not merely an aspiration but is already underway at our laboratory. MAE has a significant speed advantage over traditional MHC immunoenrichment that it becomes vital to such a large undertaking. While antibody-based ligand isolates will always be higher in purity [39], another benefit of MAE is the elution of ligands in their native environment (i.e., embedded on the plasma membrane), and that the isolated peptides reflect their natural abundance on the cell surface. HLA proteins are not expressed in equal amounts (as shown in S2 Table and elsewhere [37]). So, while it is unfortunate that we did not detect any consensus sequences associated with HLA-C (expressed in low levels with roles in pregnancy [38]) and HLA-DP (poorly defined and implicated in some autoimmune diseases [39]), MAE-derived data perhaps make better representations of immunodominant and highly abundant epitopes that are consequential for vaccine development. Admittedly, non-specific peptide contamination will always be a major disadvantage in MAE [30,35,40]; especially for MHC II ligands, stringent filtering with epitope clustering have made only small improvements in this regard.

This study introduces a novel data analysis strategy for MAE-derived immunopeptidomes, demonstrating some ability to differentiate true MHC-bound peptides from contaminants, particularly for MHC I ligands. We were able to successfully recapitulate many documented HLA binding consensus sequences, so this approach can potentially refine existing ones or even predict new motifs. By applying this methodology to individuals with shared genotypes, along with epitope clustering and sequence alignment, we were able to partially deconvolute epitope origins without relying on HLA knockdowns. Taken together, insights can be gained by applying this approach to hundreds of cell lines from the global population, facilitated by the method’s speed and ease. Ultimately, this work paves the way for developing more immunogenically inclusive vaccines at the preclinical stage, prior to involving human subjects in costly clinical trials.

Supporting information

S1 Fig. Western blot for protein HLA-A2.

shRNA knockdown of HLA-A2 in the cell line GM2709 was performed and compared against a control knockdown. Blot against calnexin served as a loading control.

(TIF)

pone.0332950.s001.tif (53.8KB, tif)
S1 File. Original blot of S1 Fig.

(PDF)

pone.0332950.s002.pdf (67.3KB, pdf)
S2 Fig. Log2 fold change values of the six HLA alleles in a whole cell lysate of the WT and MHC-mutant cell lines.

(TIF)

pone.0332950.s003.tif (577.5KB, tif)
S3 Fig. Haplotype data for the WT (LBL-721) and MHC-mutant (LBL-721–174) cell lines.

Shaded alleles indicate the absence of detection of that allele in the mutant cells by NGS (N/D = not detected).

(TIF)

pone.0332950.s004.tif (171.8KB, tif)
S4 Fig. Benchmarking MHC epitopes from individual cell analyses with NetMHCpan or NetMHCIIpan.

(TIF)

pone.0332950.s005.tif (482.3KB, tif)
S2 File. Example of multiple sequence alignment of an epitope from the MHC I and aligned with no gap openings and a maximum allowable length of 12 residues.

(TXT)

pone.0332950.s006.txt (222B, txt)
S3 File. Example of multiple sequence alignment of an epitope from the MHC II group and aligned with no gap openings to a maximum allowable length of 30 residues.

(TXT)

pone.0332950.s007.txt (286B, txt)
S5 Fig. Benchmarking MHC epitopes from combined cell analysis with NetMHCpan or NetMHCIIpan.

(TIF)

S5 File. List of randomly generated 9–11 amino acid peptides used for benchmarking NetMHCpan.

(TXT)

pone.0332950.s009.txt (3.5KB, txt)
S6 File. List of randomly generated 12–20 amino acid peptides used for benchmarking NetMHCIIpan.

(TXT)

pone.0332950.s010.txt (7.9KB, txt)
S6 Fig. Cell lines added to the epitope clustering approach, in the order shown in this table, improves allele deconvolution as shown in Fig 7.

(TIF)

pone.0332950.s011.tif (448.7KB, tif)

Data Availability

All mass spectrometry raw data, search files, and in-house scripts have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD058267 (https://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD058267).

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Arita I, Francis D. Is it time to destroy the smallpox virus?. Science. 2014;345(6200):1010. doi: 10.1126/science.345.6200.1010-a [DOI] [PubMed] [Google Scholar]
  • 2.Anwar E, Goldberg E, Fraser A, Acosta CJ, Paul M, Leibovici L. Vaccines for preventing typhoid fever. Cochrane Database Syst Rev. 2014;(1):CD001261. doi: 10.1002/14651858.CD001261.pub3 [DOI] [PubMed] [Google Scholar]
  • 3.Kulski JK, Shiina T, Dijkstra JM. Genomic Diversity of the Major Histocompatibility Complex in Health and Disease. Cells. 2019;8(10):1270. doi: 10.3390/cells8101270 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kim A, Sadegh-Nasseri S. Determinants of immunodominance for CD4 T cells. Curr Opin Immunol. 2015;34:9–15. doi: 10.1016/j.coi.2014.12.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Sturm T, Sautter B, Wörner TP, Stevanović S, Rammensee H-G, Planz O, et al. Mild Acid Elution and MHC Immunoaffinity Chromatography Reveal Similar Albeit Not Identical Profiles of the HLA Class I Immunopeptidome. J Proteome Res. 2021;20(1):289–304. doi: 10.1021/acs.jproteome.0c00386 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Marsh S, Parham P, Barber L. The HLA factsbook. 1999. Available: https://books.google.com/books?hl=en&lr=&id=ZmZYk2FQiuUC&oi=fnd&pg=PP1&ots=aANtwQPYpJ&sig=QcLQCcmZMxFWytwOX1O7K8rPOYU
  • 7.Nielsen M, Andreatta M. NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets. Genome Med. 2016;8(1):33. doi: 10.1186/s13073-016-0288-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Amengual-Rigo P, Guallar V. NetCleave: an open-source algorithm for predicting C-terminal antigen processing for MHC-I and MHC-II. Sci Rep. 2021;11(1):13126. doi: 10.1038/s41598-021-92632-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rappsilber J, Ishihama Y, Mann M. Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal Chem. 2003;75(3):663–70. doi: 10.1021/ac026117i [DOI] [PubMed] [Google Scholar]
  • 10.Diedrich JK, Pinto AFM, Yates JR 3rd. Energy dependence of HCD on peptide fragmentation: stepped collisional energy finds the sweet spot. J Am Soc Mass Spectrom. 2013;24(11):1690–9. doi: 10.1007/s13361-013-0709-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Perez-Riverol Y, Bai J, Bandla C, García-Seisdedos D, Hewapathirana S, Kamatchinathan S, et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 2022;50(D1):D543–52. doi: 10.1093/nar/gkab1038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol. 2008;26(12):1367–72. doi: 10.1038/nbt.1511 [DOI] [PubMed] [Google Scholar]
  • 13.Yu F, Teo GC, Kong AT, Fröhlich K, Li GX, Demichev V, et al. Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform. Nat Commun. 2023;14(1):4154. doi: 10.1038/s41467-023-39869-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hsiao Y, Zhang H, Li GX, Deng Y, Yu F, Valipour Kahrood H. Analysis and visualization of quantitative proteomics data using FragPipe-Analyst. J Proteome Res. 2024. doi: 10.1021/ACS.JPROTEOME.4C00294/SUPPL_FILE/PR4C00294_SI_003.XLSX [DOI] [PubMed] [Google Scholar]
  • 15.Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, et al. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23(21):2947–8. doi: 10.1093/bioinformatics/btm404 [DOI] [PubMed] [Google Scholar]
  • 16.Crooks GE, Hon G, Chandonia J-M, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14(6):1188–90. doi: 10.1101/gr.849004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Buck D, Greene AE, Coriell LL, Mulivor RA. Juvenile onset diabetes. Cytogenet Cell Genet. 1980;28(3):213–6. doi: 10.1159/000131533 [DOI] [PubMed] [Google Scholar]
  • 18.Cellosaurus cell line GM02709 (CVCL_CY16). [cited 8 Oct 2024]. Available: https://www.cellosaurus.org/CVCL_CY16
  • 19.Jurtz V, Paul S, Andreatta M, Marcatili P, Peters B, Nielsen M. NetMHCpan-4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J Immunol. 2017;199(9):3360–8. doi: 10.4049/jimmunol.1700893 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.DeMars R, Chang CC, Shaw S, Reitnauer PJ, Sondel PM. Homozygous deletions that simultaneously eliminate expressions of class I and class II antigens of EBV-transformed B-lymphoblastoid cells. I. Reduced proliferative responses of autologous and allogeneic T cells to mutant cells that have decreased expression of class II antigens. Hum Immunol. 1984;11(2):77–97. doi: 10.1016/0198-8859(84)90047-8 [DOI] [PubMed] [Google Scholar]
  • 21.Cellosaurus cell line LCL 721.174 (CVCL_6260). [cited 8 Oct 2024]. Available: https://www.cellosaurus.org/CVCL_6260
  • 22.Kavathas P, Bach FH, DeMars R. Gamma ray-induced loss of expression of HLA and glyoxalase I alleles in lymphoblastoid cells. Proc Natl Acad Sci U S A. 1980;77(7):4251–5. doi: 10.1073/pnas.77.7.4251 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Cellosaurus cell line LCL 721 (CVCL_2102). [cited 8 Oct 2024]. Available: https://www.cellosaurus.org/CVCL_2102
  • 24.Keefe AJ, Caldwell KB, Nowinski AK, White AD, Thakkar A, Jiang S. Screening nonspecific interactions of peptides without background interference. Biomaterials. 2013;34(8):1871–7. doi: 10.1016/j.biomaterials.2012.11.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Reynisson B, Barra C, Kaabinejadian S, Hildebrand WH, Peters B, Nielsen M. Improved Prediction of MHC II Antigen Presentation through Integration and Motif Deconvolution of Mass Spectrometry MHC Eluted Ligand Data. J Proteome Res. 2020;19(6):2304–15. doi: 10.1021/acs.jproteome.9b00874 [DOI] [PubMed] [Google Scholar]
  • 26.Choi H, Fermin D, Nesvizhskii AI. Significance analysis of spectral count data in label-free shotgun proteomics. Mol Cell Proteomics. 2008;7(12):2373–85. doi: 10.1074/mcp.M800203-MCP200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.de Hoon MJL, Imoto S, Nolan J, Miyano S. Open source clustering software. Bioinformatics. 2004;20(9):1453–4. doi: 10.1093/bioinformatics/bth078 [DOI] [PubMed] [Google Scholar]
  • 28.Dönnes P, Kohlbacher O. SVMHC: a server for prediction of MHC-binding peptides. Nucleic Acids Res. 2006;34(Web Server issue):W194-7. doi: 10.1093/nar/gkl284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Vita R, Overton JA, Greenbaum JA, Ponomarenko J, Clark JD, Cantrell JR, et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 2015;43(Database issue):D405-12. doi: 10.1093/nar/gku938 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Fortier M-H, Caron E, Hardy M-P, Voisin G, Lemieux S, Perreault C, et al. The MHC class I peptide repertoire is molded by the transcriptome. J Exp Med. 2008;205(3):595–610. doi: 10.1084/jem.20071985 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lee JM, Watts TH. On the dissociation and reassociation of MHC class II-foreign peptide complexes. Evidence that brief transit through an acidic compartment is not sufficient for binding site regeneration. The Journal of Immunology. 1990;144(5):1829–34. doi: 10.4049/jimmunol.144.5.1829 [DOI] [PubMed] [Google Scholar]
  • 32.You R, Qu W, Mamitsuka H, Zhu S. DeepMHCII: a novel binding core-aware deep interaction model for accurate MHC-II peptide binding affinity prediction. Bioinformatics. 2022;38(Suppl 1):i220–8. doi: 10.1093/bioinformatics/btac225 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Vita R, Mahajan S, Overton JA, Dhanda SK, Martini S, Cantrell JR, et al. The Immune Epitope Database (IEDB): 2018 update. Nucleic Acids Res. 2019;47(D1):D339–43. doi: 10.1093/nar/gky1006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.SolyPep: a fast generator of soluble peptides. [cited 25 May 2025]. Available: https://bioserv.rpbs.univ-paris-diderot.fr/services/SolyPep/
  • 35.Lanoix J, Durette C, Courcelles M, Cossette É, Comtois-Marotte S, Hardy M-P, et al. Comparison of the MHC I Immunopeptidome Repertoire of B-Cell Lymphoblasts Using Two Isolation Methods. Proteomics. 2018;18(12):e1700251. doi: 10.1002/pmic.201700251 [DOI] [PubMed] [Google Scholar]
  • 36.1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kubiniok P, Marcu A, Bichmann L, Kuchenbecker L, Schuster H, Hamelin DJ, et al. Understanding the constitutive presentation of MHC class I immunopeptidomes in primary tissues. iScience. 2022;25(2):103768. doi: 10.1016/j.isci.2022.103768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Papúchová H, Meissner TB, Li Q, Strominger JL, Tilburgs T. The Dual Role of HLA-C in Tolerance and Immunity at the Maternal-Fetal Interface. Front Immunol. 2019;10:2730. doi: 10.3389/fimmu.2019.02730 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.van Lith M, McEwen-Smith RM, Benham AM. HLA-DP, HLA-DQ, and HLA-DR have different requirements for invariant chain and HLA-DM. J Biol Chem. 2010;285(52):40800–8. doi: 10.1074/jbc.M110.148155 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Kuznetsov A, Voronina A, Govorun V, Arapidi G. Critical Review of Existing MHC I Immunopeptidome Isolation Methods. Molecules. 2020;25(22):5409. doi: 10.3390/molecules25225409 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Shahina Akter

15 Apr 2025

PONE-D-25-08174Towards a strategic approach to vaccine development in defined populations through mild acid elution-based immunopeptidomicsPLOS ONE

Dear Dr. Foster,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 30 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Shahina Akter, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. PLOS ONE now requires that authors provide the original uncropped and unadjusted images underlying all blot or gel results reported in a submission’s figures or Supporting Information files. This policy and the journal’s other requirements for blot/gel reporting and figure preparation are described in detail at https://journals.plos.org/plosone/s/figures#loc-blot-and-gel-reporting-requirements and https://journals.plos.org/plosone/s/figures#loc-preparing-figures-from-image-files. When you submit your revised manuscript, please ensure that your figures adhere fully to these guidelines and provide the original underlying images for all blot or gel data reported in your submission. See the following link for instructions on providing the original image data: https://journals.plos.org/plosone/s/figures#loc-original-images-for-blots-and-gels.   

In your cover letter, please note whether your blot/gel image data are in Supporting Information or posted at a public data repository, provide the repository URL if relevant, and provide specific details as to which raw blot/gel images, if any, are not available. Email us at plosone@plos.org if you have any questions.

3. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Comment to the authors

The manuscript presents an innovative and valuable approach to population-specific vaccine development using mild acid elution-based immunopeptidomics. The study is well-conceived, and the results are promising, particularly regarding MHC I peptide identification. However, the manuscript would benefit from clearer articulation of objectives, justification of key methodological parameters, and improved statistical validation. The discussion could be strengthened by addressing the limitations of MAE, especially for MHC II ligands, and by comparing the method to existing approaches. Structural improvements are also needed, such as reducing overlap between results and methods sections. With these revisions, the manuscript would be significantly enhanced in clarity, rigor, and impact.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript titled "Towards a strategic approach to vaccine development in defined populations through mild acid elution-based immunopeptidomics" is nicely presented and would be valuable addition to the literature. I congratulate the authors for conducting such nice study. However, I have a few comments, if addressed, would add up more interest for the reader.

If the authors would like, the title is informative but slightly long. It could be made more concise while retaining clarity. For example, "Mild Acid Elution-Based Immunopeptidomics for Population-Specific Vaccine Development" might be a more direct option. The abstract provides a strong summary of the study's objectives and importance. However, the methodology could be described more concisely, and key results regarding the reproducibility and validation of the method should be highlighted more explicitly.

The methodology section provides a detailed protocol, but could you please explain the justification for key parameters such as peptide identification thresholds and FDR values? Without this, it is difficult to assess the rigor of data filtering. Given that the sample selection is based on a single family, could you address the potential for genetic bias and discuss how this might affect the broader applicability of the method? Additionally, how do you control for the potential contamination of non-MHC peptides, especially in the case of MHC II ligands? While the LC-MS workflow is well described, could you provide more context on how instrument settings were optimized for peptide recovery, particularly regarding the injection volumes for different peptide concentrations? To improve reproducibility, could you include a clearer justification for analytical choices, incorporate statistical validation, and explicitly acknowledge the limitations of MAE in detecting MHC II peptides?

The results section presents strong evidence for the specificity of MAE in isolating MHC I peptides, particularly through HLA-A*02 knockdown validation, but could you clarify why the specificity for MHC II peptides appears lower? Given that a significant proportion of identified peptides may not be true MHC ligands, what steps were taken to verify their relevance? The introduction of Peptide Index (Pi) and Epitope Index (Ei) is an interesting addition, but could you provide statistical support for these indices, such as variance measures or confidence intervals? The claim that the epitope clustering approach can be applied to large population studies is intriguing, but could you explain how this conclusion is drawn from a dataset of only seven related individuals? Additionally, how do you account for the potential impact of proteolytic cleavage and peptide stability on the observed results, as these factors could introduce confounding variables? To strengthen the results, could you incorporate statistical tests for peptide abundance comparisons, include specificity controls for MHC II peptides, and temper claims about the scalability of the approach?

The discussion effectively highlights the potential applications of MAE-based immunopeptidomics for vaccine design, but could you reconsider the generalizability of the findings? While predicting allele-specific binding motifs is an important advancement, could you clarify whether these epitopes have been validated for their immunogenicity? Without functional validation, how can their relevance for vaccine development be confirmed? Additionally, could you provide a more critical evaluation of MAE’s limitations, including potential contamination, scalability challenges, and the method’s reduced specificity for MHC II ligands? Given that benchmarking against immunoprecipitation-based approaches is missing, could you explain how MAE’s performance compares to existing methods? Would it be possible to include a discussion on how this technique could be externally validated in more genetically diverse populations? A more balanced discussion should address these points to provide a clearer perspective on the method’s strengths and limitations.

Reviewer #2: The article is informative, but the presentation of results appears rushed, lacking depth and clarity. The language needs refinement for accuracy, coherence, and readability, with better word choices and sentence structures. The organization and formatting do not align well with journal standards, requiring improvements in text sequence, size, and adherence to guidelines. Figures should be of higher resolution with clearer interpretations. Substantial revisions are necessary to enhance clarity, structure, and compliance with journal requirements. The authors should address these issues before resubmitting the manuscript.

Reviewer #3: This is a well written manuscript and quite informative.

Howevver the authous should note the following.

1. It would be nice to structure the abstract.

2. The objective/aim of the study was not stated both in the abstract and the main article. In the current form , these are implied and leaves the reader in suspesnse.

3. Regarding the data analysis The statement given in the first sentence of the first pagraph ie`` on line web application`` should be referenced.

4.Regarding the results The first pagraph is talking about methods and could fit better in the methods section ,Although the results were presented well,in most sections there was inclusion of methods, sometimes discusion in the results. It would be nice to minise them so that the reusts can clearly stand out. most of the methodology statements should be included in the methodology/data analysis and the discussion statements in the duscussion section..

5.The authours discussed their results well. However they did not make deefinate conclusions from their results and nor did they make adefinate recommendations.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

Reviewer #2: Yes:  sarfraz ahmed

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

Decision Letter 1

Shahina Akter

7 Sep 2025

Framework for analyzing MAE-derived immunopeptidomes from cell lines with shared HLA haplotypes

PONE-D-25-08174R1

Dear Dr. Leonard Foster,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support .

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Shahina Akter, Ph.D.

Academic Editor

PLOS ONE

Acceptance letter

Shahina Akter

PONE-D-25-08174R1

PLOS ONE

Dear Dr. Foster,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Shahina Akter

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Western blot for protein HLA-A2.

    shRNA knockdown of HLA-A2 in the cell line GM2709 was performed and compared against a control knockdown. Blot against calnexin served as a loading control.

    (TIF)

    pone.0332950.s001.tif (53.8KB, tif)
    S1 File. Original blot of S1 Fig.

    (PDF)

    pone.0332950.s002.pdf (67.3KB, pdf)
    S2 Fig. Log2 fold change values of the six HLA alleles in a whole cell lysate of the WT and MHC-mutant cell lines.

    (TIF)

    pone.0332950.s003.tif (577.5KB, tif)
    S3 Fig. Haplotype data for the WT (LBL-721) and MHC-mutant (LBL-721–174) cell lines.

    Shaded alleles indicate the absence of detection of that allele in the mutant cells by NGS (N/D = not detected).

    (TIF)

    pone.0332950.s004.tif (171.8KB, tif)
    S4 Fig. Benchmarking MHC epitopes from individual cell analyses with NetMHCpan or NetMHCIIpan.

    (TIF)

    pone.0332950.s005.tif (482.3KB, tif)
    S2 File. Example of multiple sequence alignment of an epitope from the MHC I and aligned with no gap openings and a maximum allowable length of 12 residues.

    (TXT)

    pone.0332950.s006.txt (222B, txt)
    S3 File. Example of multiple sequence alignment of an epitope from the MHC II group and aligned with no gap openings to a maximum allowable length of 30 residues.

    (TXT)

    pone.0332950.s007.txt (286B, txt)
    S5 Fig. Benchmarking MHC epitopes from combined cell analysis with NetMHCpan or NetMHCIIpan.

    (TIF)

    S5 File. List of randomly generated 9–11 amino acid peptides used for benchmarking NetMHCpan.

    (TXT)

    pone.0332950.s009.txt (3.5KB, txt)
    S6 File. List of randomly generated 12–20 amino acid peptides used for benchmarking NetMHCIIpan.

    (TXT)

    pone.0332950.s010.txt (7.9KB, txt)
    S6 Fig. Cell lines added to the epitope clustering approach, in the order shown in this table, improves allele deconvolution as shown in Fig 7.

    (TIF)

    pone.0332950.s011.tif (448.7KB, tif)
    Attachment

    Submitted filename: PLOS One rebuttal.docx

    pone.0332950.s013.docx (61.1KB, docx)

    Data Availability Statement

    All mass spectrometry raw data, search files, and in-house scripts have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD058267 (https://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD058267).


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES