Abstract
Immunopeptidomics has made tremendous contributions to our understanding of antigen processing and presentation, by identifying and quantifying antigenic peptides presented on the cell surface by Major Histocompatibility Complex (MHC) molecules. Large and complex immunopeptidomics datasets can now be routinely generated using Liquid Chromatography-Mass Spectrometry techniques. The analysis of this data – often consisting of multiple replicates/conditions – rarely follows a standard data processing pipeline, hindering the reproducibility and depth of analysis of immunopeptidomic data. Here, we present Immunolyser, an automated pipeline designed to facilitate computational analysis of immunopeptidomic data with a minimal initial setup. Immunolyser brings together routine analyses, including peptide length distribution, peptide motif analysis, sequence clustering, peptide-MHC binding affinity prediction, and source protein analysis. Immunolyser provides a user-friendly and interactive interface via its webserver and is freely available for academic purposes at https://immunolyser.erc.monash.edu/. The open-access source code can be downloaded at our GitHub repository: https://github.com/prmunday/Immunolyser. We anticipate that Immunolyser will serve as a prominent computational pipeline to facilitate effortless and reproducible analysis of immunopeptidomic data.
Keywords: Antigen processing and presentation, Immunopeptidomics, Peptide analysis
1. Introduction
The human immune system is composed of innate and adaptive immunity, both working together to defend the body against infectious agents. Compared to innate immunity which provides a fast and broad response to injury and infections upon the first encounter with pathogens, adaptive immunity sets in with a lag time but can respond in a highly specific and timely manner to previously encountered pathogens. Antigen processing and presentation is the central process that enables adaptive immunity to develop specificity. During antigen processing and presentation, proteins are processed into short peptides which are then loaded onto MHC complexes, in humans also known as Human Leukocyte Antigen (HLA). The 8–25 amino acid long peptides, which are presented as a complex with HLA class I and II molecules on the cell surface, convey specificity and facilitate ensuing antigen-specific responses [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. In view of the key role of antigens in the adaptive immune response, it is of great importance to identify and understand the repertoire of these peptide antigens (collectively termed the immunopeptidome) presented by HLA molecules under various conditions [11].
Liquid Chromatography-Mass Spectrometry (LC-MS) is the mainstream technique used to quantify and characterise immunopeptidomes [8]. Well-established and thoroughly documented LC-MS protocols and techniques have allowed for the acquisition of substantial amounts of immunopeptidomics data [8]. To effectively mine these datasets, several fundamental analyses need to be performed; these include the investigation of peptide length distribution, peptide motif analysis, clustering of peptides based on shared sequence features, and peptide-HLA binding prediction. Peptide length distribution is an important step to validate an immunopeptidomic experiment and demonstrates the characteristic length preference of certain HLA alleles [8], [12]. Another important test is to analyse for peptide groups displaying similar sequence characteristics (i.e., sequence motifs). A number of studies have established different preferences for certain amino acid residues at different positions of an HLA-bound peptide sequence [11], [13], [14], [15]. Such amino acid preferences reflect the allotype-specific peptide-HLA binding [16], [17], [18]. Finally, peptide-HLA binding prediction serves as an important way to investigate the theoretical binding affinities of a peptide for different HLA alleles, thereby providing crucial insights and evidence for delineating peptide-allele bonds in a multi-allele sample for further experimental validation. Despite the availability of numerous computational methods to conduct the above analyses separately, it remains challenging for those with little programming expertise to integrate, correlate, and collate the results from different tools, which consumes a significant amount of time. Furthermore, the need to manually conduct multiple analysis steps makes the overall results prone to human error and therefore may compromise the consistency and reproducibility of the analysis. To the best of our knowledge, MhcVizPipe is currently the only computational pipeline that attempts to automate a variety of analyses for the immunopeptidomic data [19]. However, current solutions for automating immunopeptidomic analysis can be further improved. For example, one challenge for biologists is to manually set up a pipeline in local computers across different operating systems. A user-friendly and interactive webserver would significantly facilitate data integration and analysis for immunologists.
Here we present Immunolyser, an automated web-based tool developed to analyse immunopeptidomics data in a seamless pipeline based on the integration of multiple tailored computational tools. The pipeline generates a comprehensive and interactive report summarising the sequence conservation, clustering of similar peptides into groups, prediction of binding affinity to various HLA allotypes, and localisation of peptides within their source proteins, based on the uploaded peptide data. Users only need to provide the peptide sequences, making it generically flexible for input from any major peptide search engine. Immunolyser has an interactive and user-friendly interface available online, with no requirement for programming expertise. We showcase the usability of Immunolyser by re-analysing the immunopeptidomic data generated by Pandey et al. [13], which was conducted to compare the impact of different ligand enrichment methods used in immunopeptidomics experiments. We anticipate this pipeline will help researchers to perform a rapid, standardised, and consistent immunopeptidomic analysis.
2. Materials and Methods
2.1. Data pre-processing
Immunolyser accepts input data with peptide information including peptide sequences and their UniProt accessions [20] (optional) in a comma-separated values (CSV) format exported from any MS search engine such as PEAKS™ [21], DIA-NN [22], Skyline [23], ProteinPilot™ (SCIEX), and Spectronaut®. The example inputs have been provided on the home page of Immunolyser (https://immunolyser.erc.monash.edu/; the “Demo and example input” section). The uploaded input datasets are subject to pre-screening by Immunolyser to ensure data quality for the ensuing computational analysis. Basic information will be recorded, including peptide sequence and accession number indicating the source protein. Most of the commonly used database search tools can identify the posttranslational modifications (PTM) in peptides and mark PTM automatically in the result file(s). PTM-related information in round or square brackets within the peptide sequence is removed by Immunolyser to be able to feed required unmodified peptides into the downstream analysis. The resulting duplicates are also removed. During the pre-processing step, if the UniProt accession [20] column is present, Immunolyser also identifies and deletes contaminant peptides derived from source proteins marked as “#CONTAM” or “#DECOY”. Peptides with a length shorter than 5 and longer than 30 amino acids are removed to reduce non-HLA-binding peptide contamination while retaining as many potentially relevant peptides as possible. Users also have the option to upload a control data file, allowing Immunolyser to conduct comparative analysis on datasets from different conditions, e.g., blank run controls.
2.2. Peptide length distribution and overlap analysis
Peptides presented by different HLA classes vary in length [8], [24]. Therefore, the peptide length distribution serves as an initial, critical validation and quality control measure for immunopeptidomic experiments and computational studies [9], [11], [12], [24]. Immunolyser generates an interactive histogram and chart of peptide length distribution (all 5–30-mers) in all samples using Plotly [25]. A toggle button is provided for users to smoothly switch between two versions of the plot. Standard deviation is calculated for samples submitted with more than one replicate and marked in both generated plots. Users can access the standard deviation values associated with every n-mer (i.e., peptides of n amino acids in length) by hovering the mouse over the associated bar in the graph. In addition, Immunolyser allows for a flexible and interactive demonstration of the data, such as the inclusion and exclusion of specific samples in the plot and visualisation of a subset of the submitted data. The bars in the generated chart are colour-coded to compare peptide length distributions across multiple datasets, in accordance with a corresponding legend indicating the sample names and their associated colours. We also implemented an UpSet plot (https://github.com/upsetjs/upsetjs_r/) to allow the user to interrogate common and unique peptides (i.e., 8–14-mers for HLA class I peptides and 12–20-mers for HLA class II peptides) among conditions.
2.3. Peptide motif analysis
Peptide motif analysis aims to offer insights into position-specific and HLA allotype-specific preferences of amino acids in the submitted peptides. Immunolyser visualises the over- and under-represented amino acids at each position within a typical 9-mer peptide core. Immunolyser uses Seq2Logo [26] to generate sequence logos for each input file to analyse the sequence conservation patterns of the peptides. For class II peptides, the 9-mer peptide cores are provided by MixMHC2pred [27], [28], which is a peptide-HLA-II binding predictor. While for class I, unique 9-mer peptides are used for generating the logos. Seq2Logo performs sequence weighting to identify the over- and under-represented amino acids. Based on the sequence weighting, it corrects the low number of observations using the data-dependent pseudo count method [29] and generates a two-sided representation of amino acid enrichment and depletion [26]. Sequence logos are generated in high-resolution JPEG format and tagged with the name of the peptide source file. Users can also hover the mouse on the logo to see the number of peptides used to generate it.
2.4. Peptide clustering analysis
Immunolyser applies GibbsCluster [30] to perform peptide sequence alignment and clustering to obtain a broad and unbiased view of the collective motifs in the sample. GibbsCluster deploys the Gibbs sampling algorithm [31] to cluster peptides into groups based on sequence similarities and amino acid hydrophobicity. The Kullback-Leibler Distance (KLD) [32], a measure of difference across two probability distributions, is then applied by GibbsCluster to optimise the clustering result by measuring the stability of the clusters. The clustering result with the highest stability (i.e., the highest KLD score) is selected and sequence logos are generated for each cluster. Similar to the peptide motif analysis, for class I peptides, unique 9-mers are used by GibbsCluster; while for class II peptides, 9-mer core peptides identified by MixMHC2pred are used. The peptide clustering results are placed into a special viewing pane designed to best analyse the clustering results. By default, sequence logos are generated for the cluster with the maximum KLD score, but users can view other clusters by selecting a different group from the drop-down menu provided or by selecting the ‘Clustering with higher KLD’ to view the most stable group again. A selection bar positioned left of the view pane is provided to show the results of the selected files.
2.5. Peptide-HLA binding prediction
Peptide-HLA binding prediction indicates whether peptides are likely to be successfully presented on the cell surface by selected HLA allotypes. Given that the prediction performance of different software tools varies significantly, we deployed three tools for peptide-HLA-I binding prediction and integrate the prediction results using a ‘majority-voting’ strategy to improve the predictive reliability of the results, which is a similar procedure to one of our previous studies [33]. Our comparative study has demonstrated that MixMHCpred and NetMHCpan achieved the best prediction performance for the peptide-HLA-I allotypes binding [9]. In addition, ANTHEM is a relatively new machine learning-based prediction tool that has been shown to generally outperform other available tools [34]. Therefore, Immunolyser uses MixMHCpred 2.1 [35], ANTHEM [34] and NetMHCpan 4.1 [36] to predict peptide (i.e., 8–14-mers) binding to selected HLA class I molecules. For peptide-HLA-II binding prediction, Immunolyser employs MixMHC2pred proposed in [27], [28] for 12–20-mers. Users can select HLA-I and HLA-II allotypes and alleles of interest prior to the prediction. When using NetMHCpan 4.1, a predicted rank ≤2 and>0.5 indicates a weak binder while a score ≤0.5 indicates a strong binder. While for MixMHCpred 2.1 and MixMHC2pred, we empirically set a predicted rank score (‘%Rank_bestAllele’ for MixMHCpred and ‘%Rank_best’ for MixMHC2pred) ≤10 and>2 for a weak binding and a score ≤2 for a strong binder. When using the ‘majority-voting’ strategy for class I peptides, a peptide is labelled as a binder if the predictors that vote the peptide to be a binder outnumber the predictors that do not (e.g., 2 out of 3 predict a peptide as a binder). Otherwise, even if there is a tie, the peptide is determined to be a non-binder. Note that different predictors have various limitations for HLA alleles. For example, Anthem can only predict 9-mer binders to HLA-B*15:11. Therefore, the majority-voting process is performed using the predictors that can predict binders for this specific HLA allele and the majority-voted binders should not be used for statistical purposes. In addition, to simplify the process, if a peptide is predicted as a ‘strong’ or a ‘weak’ binder, it will be regarded as a binder in the majority-voting process. To depict the prediction results, Immunolyser presents a selection menu on the left side of the screen which can be used to select an allele. This generates an UpSet plot for the specified allele, demonstrating the binders belonging to each sample and the binders overlapping among samples. By default, the UpSet plot is generated to demonstrate the majority-voted prediction results of combining all replicate files for peptide-HLA-I binding prediction. In addition, peptide-HLA binding results (in CSV format) are organised in an interactive table with download links. The cut-offs of MixMHCpred 2.1 and MixMHC2pred for peptide-HLA binding prediction can be made more rigorous to obtain more confident binders using the downloaded files. Users can also click the bars in the UpSet plot to investigate the peptide motif patterns and download the binders.
2.6. Peptide distribution analysis
Another module, namely Pepscanner, has been designed and implemented within the Immunolyser framework for peptide distribution analysis in the source proteins. The current beta version of the module is accessible under the ‘Pepscanner’ tab. The input file for this module only requires ‘Peptide’ and ‘Accession’ columns. Users can also select specific proteins to be plotted using the list of proteins provided in a drop-down menu. When no proteins are selected, Pepscanner plots the heatmap using the top ten occurring proteins in the uploaded file. For each peptide, Pepscanner refers all associated protein accessions to the data analysis. We have chosen this method because all the proteins, elected by the upstream search software (e.g., PEAKS™), as the exact source protein of a given peptide can not always be determined due to homologies and repetitive sequences among proteins. Our program uses the regular expression provided by UniProt to accurately extract the UniProt accessions for each peptide from the submitted files. Peptide distribution within proteins is determined by locating the source protein of each peptide within the human proteome data extracted from the UniProt database [20]. Once the position of the peptides within their source protein has been determined, a heatmap is generated to demonstrate the relative position of the peptides within their source protein after normalising the protein length. Peptides are displayed as coloured regions along the length of the heatmap, while the number of peptides found in a particular region is indicated by the value of the colour in that region. A darker colour indicates that more peptides are found in this specific region. Multiple proteins can be displayed on this plot and the accession ID of displayed proteins is shown on the left side of the y-axis. A ‘File metadata’ table shows the total number of unique proteins in the uploaded file and the list of the top 10 occurring proteins or selected proteins, if any, in the uploaded file. For each of the proteins in the table, Pepscanner provides the total count, gene name and species-related information.
2.7. Showcasing the usability of Immunolyser
To demonstrate the usability and functionality of Immunolyser, we used the datasets from a recent immunopeptidomic study by Pandey et al. [13]. The study presents and compares over 40,000 HLA class I peptides from the Acute Myeloid leukemia (AML) cell line THP-1 (human monocytic leukemia), using LC-MS combined with Reverse Phase–High-Performance Liquid Chromatography (RP-HPLC) or Molecular Weight Cut-Off (MWCO) filtration. Each method was applied to three biological replicates, resulting in six individual data files. Specifically, HLA-A*02:01, B*15:11, and C*03:03 are all expressed in the THP-1 cell line replicates. In addition, B*27:05 is expressed in the third replicate of the TH-1 cell line, which was transfected [13].
3. Results
3.1. Immunolyser enables convenient analysis of immunopeptidomic data
Using Flask, an open-source micro-framework written in Python [38], Immunolyser has been deployed on the Nectar (National eResearch Collaboration Tools and Resources project) Cloud platform, managed by the Monash eResearch Centre. To enable the analysis and visualisation of immunopeptidomics data, Immunolyser has been equipped with an initialiser and analytics module (Fig. 1 and Supplementary Figure 1). The initialiser module, shown in Fig. 1A-B and Supplementary Figure 1C-D, is responsible for pre-screening uploaded datasets and processing the user-provided parameters, thereby ensuring data quality for the computational analysis (‘Materials and Methods’). Following this quality control, multiple analyses and predictions can be conducted in the analytics module, including peptide length distribution and overlap analysis, peptide sequence cluster analysis, allotype-specific HLA-peptide binding prediction, and peptide distribution analysis. To perform these analyses, the initialiser module allows users to configure a variety of experimental parameters. For example, users can nominate alleles of interest for peptide-HLA binding prediction. Alleles can be either selected from the shown list of class I and class II HLA alleles or can be customised and typed in the text box provided (Supplementary Figure 1D). All the analysis results and statistical tests are presented in an interactive and downloadable report. In addition, Pepscanner, a beta-version addition to the analytics module, can be used to analyse the localisation of the peptides within their source proteins. Pepscanner generates a heatmap demonstrating the alignment of peptides to their source proteins. It also generates a table of the top 10 occurring proteins, or selected proteins, if any, with related useful information. Users can find detailed guidelines on how to use Immunolyser in the ‘Help’ tab.
Fig. 1.
A schematic illustration of the Immunolyser framework. (A) Dataset upload and quality check; (B) the initialiser module handles data pre-processing and parameter configuration; (C) the analytics module assesses submitted peptides for length distribution, peptide clustering, peptide-HLA binding affinity prediction and peptide-to-protein alignment (via Pepscanner); (D) Immunolyser is accessible via a web application with a user-friendly interface.
3.2. Showcasing the usability of Immunolyser using the immunopeptidomic data from human AML cell line THP-1
Here we performed a case study using immunopeptidome data generated from the THP-1 cell line [13] to demonstrate the usability of Immunolyser. We used the exported peptide files from PEAKS™ as input to Immunolyser for computational analysis. We set up two samples (i.e., HPLC and MWCO) and for each sample, three replicate files were uploaded simultaneously. The initialiser module first pre-processed data files and removed all peptides with a length shorter than 5 and longer than 30 amino acids. As a result, over 70,000 peptides (5–30-mers) were uploaded and analysed in approximately 34 min, which is a substantial improvement to the use of the different tools individually. Fig. 2A-C demonstrates the peptide length distribution and peptide overlap generated for both conditions. It can be observed that in both samples, the 9-mer peptide length is most abundant (65.28%) for HPLC and 68.80% for MWCO). The array of peptides bound by HLA molecules varies in length, but the binding site of the peptide binding cleft of the HLA class I molecule usually binds to a 9 amino acid core sequence [24]. It is observed from Fig. 2A that when the RP-HPLC method was used, on average 9415 9-mers were identified whereas the MWCO method identified 6461 9-mers. These observations align with the finding of the original study and are clearly visualised using Immunolyser.
Fig. 2.
Immunolyser screenshots from analysis results for the HLPC and MWCO filtration datasets extracted from the study of Pandey et al. [13]. Peptide length distribution includes (A) the relative frequency distribution, and the frequency distribution of peptides in both datasets. Error bars represent the standard deviation for every n-mer from 3 replicates. The HLPC data is in blue and the MWCO data is in red. (B) Toggle button to switch between relative frequency or peptide number charts (here, both are shown to showcase the different charts made). (C) Peptide overlap analysis using an UpSet plot across conditions. (D) Peptide motif analysis includes the sequence logos generated for all three replicates of peptides from the RP-HPLC and the MWCO Filtration methods, respectively. (E) Peptide clustering results and the corresponding sequence logos generated by GibbsCluster 2.0 [30] for the RP-HPLC dataset. The results for all three HPLC replicates are shown. Each replicate is represented with a bar graph of KLD scores for clustering attempts using from 1 to 5 subgroups and the sequence logos for the sub-groups of the cluster with the highest KLD score. On the top right is a selection menu to select samples to be displayed. Users can drag motifs or use the dots and arrows to navigate to the remaining logos. (F) A drop-down menu to re-generate clustering results using a different number of sub-groups from 1 to 5 or to reset back to the number yielding the maximum KLD score. (G) A pannel for users to select to remove/add clustering results of different samples.
Fig. 2D demonstrates that amino acid stacks at P2 and P9 positions are significantly higher compared to other positions, illustrating the strong preference and enrichment of certain amino acids at these anchor positions. It can be observed at position nine across all replicates and samples that C-terminal hydrophobic amino acids are predominant, which is a consequence of antigen processing and common to the majority of immunopeptides. In both samples, the third replicate differs at the second position (P2). That is, in the first two replicates across both isolation methods, leucine (L), alanine (A) and proline (P) are overrepresented at the P2 position, whereas for the third replicate, arginine (R) is also prominent at P2. Furthermore, it can be observed from the sequence logos of the first two replicates that valine (V) is enriched at P2 in the HPLC dataset but is depleted in the MWCO method dataset at P2 (Fig. 2D). Such changes have been clearly visualised by Immunolyser, suggesting that Immunolyser can efficiently demonstrate the consistency and discrepancy between replicates or samples in immunopeptidomic experiments.
Fig. 2E demonstrates clustering outputs by GibbsCluster for each HPLC replicate. Users can manually shift and align the sequence logos with the known HLA allotypes of the cell line used, assuming that the HLA haplotype of the cell line/tissues used in the experiment has been previously determined through HLA typing. GibbsCluster was able to identify two repeating sub-groups generated for each of the replicates: one logo with [A/P] and [Y/L/F] conserved at P2 and P9, respectively, and the other logo with hydrophobic residues conserved at P2 [L/V/I/M] and P9 [L/V/I/A], respectively. The first group maps to the HLA-B*15:11 and HLA-C*03:03 allomorphs whereas the second group delineates peptides presented by the HLA-A*02:01 allele, which is in agreement with the alleles known to be expressed by the THP-1 cell line. As per the sequence consensus motif analysis above (Fig. 2D), cluster analysis found a difference in the third replicate in both samples: GibbsCluster identified and isolated an extra sub-group characterised by arginine (R) present at the P2 position. As reported in the original paper [13], the third replicate was derived from an engineered THP-1 cell line that also carries HLA-B*27:05. The extra sub-group visualised by sequence consensus motif analysis and revealed by GibbsCluster corresponds to HLA-B*27:05. Thus, Immunolyser’s analytical multi-step process can successfully deconvolute the core binding motifs of HLA alleles present in each replicate file and compares samples and replicates. We also observed that the additional group of peptides introduced by the expression of the HLA-B*27:05 allotype in the third experiment was successfully separated from peptides assigned to HLA alleles expressed in the parental THP1 cell line. Thus, Immunolyser can quickly and easily alert users to differences in peptide motifs between replicates and/or samples.
Immunolyser then analysed the predicted peptide-HLA binding affinities and categorised the uploaded peptides as a binder, weak binder or non-binder using three prediction tools. To achieve a highly confident consensus decision for a given peptide, Immunolyser performs majority voting based on the prediction results of all the predictors and generates an UpSet plot view to illustrate the overlap of majority-voted binders for the HPLC and MWCO filtration datasets. Fig. 3 demonstrates the overlap of the predicted binders from the HPLC and MWCO filtration methods for the HLA-B*15:11 allele. Of note, analysis of peptides predicted to bind to HLA-A*02:01 and HLA-C*03:03 alleles can also be selected (Fig. 3A). By default, all predicted binders from all replicate files are combined and used for the UpSet plot (Fig. 3B). A bar represents a subset, on top of which the number shows the total number of unique peptides in this subset. The count of peptides in each group can be seen by hovering the mouse over any of the bars. The UpSet plot enables users to view the binding motif (Fig. 3D) of various peptide sets and download the list of binders belonging to any subset (Fig. 3E). As a further option for individualised analysis, a drop-down menu (Fig. 3C) can be used to switch from majority-voted results to the prediction results of any specific prediction tool of preference. The UpSet plot will be updated upon predictor selection accordingly. For example, users can download exclusively the HLA-B*15:11 allele-specific majority-voted binders which are derived from the intersection of HPLC and MWCO filtration datasets by clicking on the leftmost bar (i.e., 1575) as highlighted in Fig. 3. In addition, a pop-up screen can be generated upon clicking any of the bars in the UpSet plot. The resulting sequence logo is generated using the binders belonging to the selected subset. In this example, 1297 9-mers out of the total 1575 predicted binders were used to generate the sequence logo (Fig. 3F). The binding prediction analysis performed by Immunolyser identified that more predicted binders were generally detected in the HPLC dataset, which is not surprising given the overall higher number of peptides detected with this method. For example, the number of HLA-A*02:01 binders found by each of the prediction tools from the overall dataset ranged from 16,464 to 17,765. By switching between alleles, we can observe that for HLA-B*15:11 and HLA-C*03:03, a smaller number of binders were identified in comparison to HLA-A*02:01. This is partially because that Anthem is only able to predict 9-mer HLA-B*15:11 binders and a limited length range of peptides for HLA-C*03:03, and due to a relatively high expression of HLA-A in THP-1 cells and a generally low physiological expression of HLA-C [39]. For example, in the HPLC dataset, 10,553 majority-voted binders were predicted for the HLA-A*02:01 allele whereas only 2741 and 4501 binders were identified in the same dataset for the HLA-B*15:11 and HLA-C*03:03 alleles, respectively. A similar trend was observed in the MWCO Filtration dataset. All the prediction results are downloadable via the extended table by clicking the “Download binders” link (Fig. 3G). If a control file(s) is submitted to Immunolyser, any presence of the predicted binders in sample files which are additionally found in the control file will be indicated with “Y” in the additional “control” column in the downloaded binder CSV files. This aims to help the user judge the validity of predicted binders.
Fig. 3.
The UpSet plot generated using HLA-allotype binding prediction results for the HLA-B*15:11 allele. (A) The selection menu to choose the allele to be viewed. (B) The interactive UpSet plot showing the numbers of predicted binders. (C) The dropdown menu to select between majority-voted binders or any of the three binding prediction tools. (D) The pop-up dialogue box showing the sequence logo generated using the binders belonging to the selected bar (can be any subset or set). (E) A link to download the list of binders belonging to the selected bar. (F) The name of the subset and the number of total 9-mer binders used to generate the sequence logo out of the total binders belonging to the selected bar. (G) Downloadable peptide-MHC binding results generated by prediction tools against each allele selected.
Finally, Immunolyser uses the Pepscanner module (beta version) to understand the sequence contextual information by mapping the peptides to their source proteins. The Pepscanner module can be accessed by clicking on the ‘Pepscanner’ tab (Fig. 4A) followed by uploading the input CSV file from the browse button (Fig. 4D). Users have the option to select proteins of interest by using the drop-down menu in Fig. 4E. In this case, we did not select any proteins and the top ten proteins found in the upload file were automatically used to generate the heatmap (Fig. 4C). The mapped proteins are placed horizontally in the heatmap while the specific regions are coloured based on the number of peptides identified. Different shades of colour represent the number of peptides present at that location as specified in the vertical legend bar on the right side of the heatmap. Note that for usability and comparison purposes, the length of every protein is normalised to 100 as reflected in the x-axis. Also, the metadata table shows the total number of unique proteins found in the uploaded file. It also lists the top 10 proteins present in the file along with peptide count within each protein (Fig. 4B), the name of the gene and links to the download subsection of the uploaded file where the source protein of the peptide was found.
Fig. 4.
Using Pepscanner to analyse contextual information of submitted peptides and their location within protein sequences. (A) The ‘Pepscanner’ tab. (B) Metadata table of the uploaded input file that includes information on the top 10 occurring proteins in the file. (C) The generated heatmap demonstrating the distributions of peptides in different proteins. (D) Upload of data in the CSV format for the alignment to source proteins. (E) Selecting specific proteins of interest to investigate (for human proteins only). A demo has been provided at the bottom of the Pepscanner page.
4. Discussion
Traditional analysis of immunopeptidomic data usually involves installing and configuring a variety of third-party computational tools, which requires significant technical and domain knowledge. Automated immunopeptidome analysis significantly reduces the potential for analytic error and improves the consistency of the reported results. With this goal in mind, we constructed an automated computational pipeline to perform routine analysis of immunopeptidomic data, such as peptide length distribution, peptide motif analysis, peptide clustering analysis, and peptide-HLA binding prediction using the experimental immunopeptidomic data. Thus, we present Immunolyser, which is readily accessed via the Immunolyser webserver or can easily be installed locally. Parallel processing techniques have been utilised to reduce the average run time of the jobs. Immunolyser applies a queue-management framework to organise the submitted requests. It provides users with a job ID to access the analysis results anytime once the task is finished. Regarding the time consumed for running a submitted request, in the case study of THP-1 cell analysis presented above, the job took approximately 30 min to analyse six files yielding over 70,000 peptides for analysis. Most of the processing time is consumed by GibbsCluster for peptide clustering. We showed that the results by Immunolyser are consistent with the findings by Pandey et al. [13]. More importantly, our case study has demonstrated that, compared to manual analysis, Immunolyser significantly reduces the effort needed to analyse immunopeptidomic data. It can improve the efficiency and expedite analysis by integrating these routinely performed steps together – users only need to simply upload the datasets and hit the ‘Submit’ button.
To the best of our knowledge, MhcVizPipe is the first reported pipeline for the analysis and visualisation of HLA I and II ligand datasets [19]. However, for MhcVizPipe local server setup, users may need to download the third-party tools and the local version needs to be configured differently across operating systems. As Immunolyser resides on the Nectar server, no computing resource is needed from the user’s local PC. Regarding usability, Immunolyser provides a more interactive results page to help users interrogate the analyses of different samples and conditions. Another highlight of Immunolyser is that for peptide-HLA-I binding affinity prediction, we applied three state-of-the-art predictors and a ‘majority voting’ strategy to ensemble the outcomes of the predictors, thereby providing highly confident consensus prediction results. For peptide-HLA-II prediction, we are dedicated to upgrading Immunolyser when more accurate predictors for class II peptides become available, as peptide-HLA-II binding prediction is still a fast-evolving field. Importantly, Immunolyser maintains the uploaded experimental information of the peptides derived from a search, such as m/z, retention time, accession, peptide modification, area, and any other additional columns. Such accessory information can be of crucial consideration in experimental design to interpret the analysis results, e.g., detection levels of predicted HLA binders across different conditions. The integration of an export of subsets depicted in the UpSet plot together with the retention of all provided accessory information allows the user to focus downstream analysis efforts on, for example, peptides common between two samples or unique to a treatment condition. In addition, Immunolyser can analyse the protein-level contextual information of the submitted peptides via peptide-to-source protein mapping provided by the Pepscanner module. This analysis provides a statistical overview of peptide distributions in their source proteins, providing useful insights into the ensuing functional analysis of peptide-rich proteins. In the future, we will endeavour to incorporate label-free quantification analysis of immunopeptidomic data in Immunolyser, thereby providing more options for analysing the relative abundance of certain peptides between conditions.
5. Conclusion
In this study we describe Immunolyser, a user-friendly web-based pipeline to automate routine analysis of immunopeptidomic data from LC-MS/MS experiments. Immunolyser offers a ‘one-stop’ solution for in-depth analysis of immunopeptidomic data including peptide length distribution, peptide sequence pattern and motif analysis, peptide clustering, predicted peptide-HLA binding affinity, and peptide source protein analysis. With the interactive and user-friendly interface, we believe that Immunolyser can assist life scientists to save a considerable amount of time performing these fundamental analyses and quality control checks on immunopeptidomic data, leaving more time for in-depth scrutiny of experimental results. In the future, we will endeavour to upgrade the pipeline to accelerate the running speed by efficient utilisation of processing units and job distribution. We also plan to include a separate protein module in Immunolyser to provide functional annotations of the source proteins of the identified peptides. This would include biological pathway analysis and other relevant biological information, by cross-referencing external resources automatically, such as IEDB [40], the SysteMHC Atlas [41], [42], UniProt [20], and KEGG [43]. Taken together, Immunolyser represents an instrumental and time-saving toolkit for the immunopeptidomics field to aid a better understanding of immune responses and we hope will help to contribute to the development of precision medicine.
CRediT authorship contribution statement
Anthony W. Purcell, Asolina Braun, and Chen Li: Conceptualization, Methodology, Supervision. Prithvi Raj Munday and Joshua Fehring: Software, Validation. Jerico Revote: Resources. Kirti Pandey and Mohammad Shahbazy: Data curation. Prithvi Raj Munday, Asolina Braun and Chen Li: Writing - original draft. Katherine E. Scull, Sri H. Ramarathinam, Pouya Faridi, Nathan P. Croft, and Anthony W. Purcell: Validation, Writing - review & editing. Anthony W. Purcell: Funding acquisition.
Conflict of Interest
The authors declare that they have no competing interests.
Acknowledgements
Computational resources were supported by the R@CMon/Monash Node of the Nectar Research Cloud, an initiative of the Australian Government’s Super Science Scheme and the Education Investment Fund. A.W.P. is supported by an Australian National Health and Medical Research Council (NHMRC) Principal Research Fellowship (1137739). C.L. was supported by an NHMRC CJ Martin Early Career Research Fellowship (1143366). A.B. was supported by the National Psoriasis Foundation (817907) and the Rebecca L. Cooper Medical Research Foundation (PG2020775).
Footnotes
Supplementary data associated with this article can be found in the online version at doi:10.1016/j.csbj.2023.02.033.
Contributor Information
Asolina Braun, Email: Asolina.Braun@monash.edu.
Chen Li, Email: Chen.Li@monash.edu.
Anthony W. Purcell, Email: Anthony.Purcell@monash.edu.
Appendix A. Supplementary material
Supplementary material
.
References
- 1.Axelrod M.L., Cook R.S., Johnson D.B., et al. Biological consequences of MHC-II expression by tumor cells in cancer. Clin Cancer Res. 2019;25:2392–2402. doi: 10.1158/1078-0432.CCR-18-3200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Li C., Revote J., Ramarathinam S.H., et al. Resourcing, annotating, and analysing synthetic peptides of SARS-CoV-2 for immunopeptidomics and other immunological studies. Proteomics. 2021;21 doi: 10.1002/pmic.202100036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Liepe J., Sidney J., Lorenz F.K., et al. Mapping the MHC class I–spliced immunopeptidome of cancer cells. Cancer Immunol Res. 2019;7:62–76. doi: 10.1158/2326-6066.CIR-18-0424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mumberg D., Monach P.A., Wanderling S., et al. CD4+ T cells eliminate MHC class II-negative cancer cells in vivo by indirect effects of IFN-γ. Proc Natl Acad Sci. 1999;96:8633–8638. doi: 10.1073/pnas.96.15.8633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Vyas J.M., Van der Veen A.G., Ploegh H.L. The known unknowns of antigen processing and presentation. Nat Rev Immunol. 2008;8:607–618. doi: 10.1038/nri2368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Faridi P., Li C., Ramarathinam S.H., et al. A subset of HLA-I peptides are not genomically templated: evidence for cis- and trans-spliced peptide ligands. Sci Immunol. 2018:3. doi: 10.1126/sciimmunol.aar3947. [DOI] [PubMed] [Google Scholar]
- 7.Faridi P., Woods K., Ostrouska S., et al. Spliced peptides and cytokine-driven changes in the immunopeptidome of melanoma. Cancer Immunol Res. 2020;8:1322–1334. doi: 10.1158/2326-6066.CIR-19-0894. [DOI] [PubMed] [Google Scholar]
- 8.Purcell A.W., Ramarathinam S.H., Ternette N. Mass spectrometry–based identification of MHC-bound peptides for immunopeptidomics. Nat Protoc. 2019;14:1687–1707. doi: 10.1038/s41596-019-0133-y. [DOI] [PubMed] [Google Scholar]
- 9.Mei S., Li F., Leier A., et al. A comprehensive review and performance evaluation of bioinformatics tools for HLA class I peptide-binding prediction. Brief Bioinforma. 2020;21:1119–1135. doi: 10.1093/bib/bbz051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Silverstein A.M. History of immunology, e LS 2001.
- 11.Faridi P., Purcell A.W., Croft N.P. In immunopeptidomics we need a sniper instead of a shotgun. Proteomics. 2018;18:1700464. doi: 10.1002/pmic.201700464. [DOI] [PubMed] [Google Scholar]
- 12.Trolle T., McMurtrey C.P., Sidney J., et al. The length distribution of class I–restricted T cell epitopes is determined by both peptide supply and MHC allele–specific binding preference. J Immunol. 2016;196:1480–1487. doi: 10.4049/jimmunol.1501721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Pandey K., Mifsud N.A., Sian T.C.L.K., et al. In-depth mining of the immunopeptidome of an acute myeloid leukemia cell line using complementary ligand enrichment and data acquisition strategies. Mol Immunol. 2020;123:7–17. doi: 10.1016/j.molimm.2020.04.008. [DOI] [PubMed] [Google Scholar]
- 14.Lundegaard C., Lund O., Nielsen M. Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers. Bioinformatics. 2008;24:1397–1398. doi: 10.1093/bioinformatics/btn128. [DOI] [PubMed] [Google Scholar]
- 15.Chang S.T., Ghosh D., Kirschner D.E., et al. Peptide length-based prediction of peptide–MHC class II binding. Bioinformatics. 2006;22:2761–2767. doi: 10.1093/bioinformatics/btl479. [DOI] [PubMed] [Google Scholar]
- 16.Maenaka K., Jones E.Y. MHC superfamily structure and the immune system. Curr Opin Struct Biol. 1999;9:745–753. doi: 10.1016/s0959-440x(99)00039-1. [DOI] [PubMed] [Google Scholar]
- 17.Zhang J., Chen Y., Qi J., et al. Narrow groove and restricted anchors of MHC class I molecule BF2* 0401 plus peptide transporter restriction can explain disease susceptibility of B4 chickens. J Immunol. 2012;189:4478–4487. doi: 10.4049/jimmunol.1200885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Cole D.K., Edwards E.S., Wynn K.K., et al. Modification of MHC anchor residues generates heteroclitic peptides that alter TCR binding and T cell recognition. J Immunol. 2010;185:2600–2610. doi: 10.4049/jimmunol.1000629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kovalchik K.A., Ma Q., Wessling L., et al. MhcVizPipe: a quality control software for rapid assessment of small-to large-scale immunopeptidome datasets. Mol Cell Proteom. 2022:21. doi: 10.1016/j.mcpro.2021.100178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023;51:D523–D531. doi: 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Xin L., Qiao R., Chen X., et al. A streamlined platform for analyzing tera-scale DDA and DIA mass spectrometry data enables highly sensitive immunopeptidomics. Nat Commun. 2022;13:3108. doi: 10.1038/s41467-022-30867-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Demichev V., Messner C.B., Vernardis S.I., et al. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat Methods. 2020;17:41–44. doi: 10.1038/s41592-019-0638-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.MacLean B., Tomazela D.M., Shulman N., et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics. 2010:966–968. doi: 10.1093/bioinformatics/btq054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Gfeller D., Guillaume P., Michaux J., et al. The length distribution and multiple specificity of naturally presented HLA-I ligands. J Immunol. 2018;201:3705–3716. doi: 10.4049/jimmunol.1800914. [DOI] [PubMed] [Google Scholar]
- 25.Plotly Technologies Inc . Collaborative data science. QC: Plotly Technologies Inc; Montréal: 2015. [Google Scholar]
- 26.Thomsen M.C.F., Nielsen M. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res. 2012;40:W281–W287. doi: 10.1093/nar/gks469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Racle J., Michaux J., Rockinger G.A., et al. Robust prediction of HLA class II epitopes by deep motif deconvolution of immunopeptidomes. Nat Biotechnol. 2019;37:1283–1286. doi: 10.1038/s41587-019-0289-6. [DOI] [PubMed] [Google Scholar]
- 28.Racle J., Guillaume P., Schmidt J. et al. Machine learning predictions of MHC-II specificities reveal alternative binding mode of class II epitopes, bioRxiv 2022. [DOI] [PubMed]
- 29.Tatusov R.L., Altschul S.F., Koonin E.V. Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci. 1994;91:12091–12095. doi: 10.1073/pnas.91.25.12091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Andreatta M., Alvarez B., Nielsen M. GibbsCluster: unsupervised clustering and alignment of peptide sequences. Nucleic Acids Res. 2017;45:W458–W463. doi: 10.1093/nar/gkx248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gelfand A.E. Gibbs sampling. J Am Stat Assoc. 2000;95:1300–1304. [Google Scholar]
- 32.Kullback S., Leibler R.A. On information and sufficiency. Ann Math Stat. 1951;22:79–86. [Google Scholar]
- 33.Li C., Clark L.V.T., Zhang R., et al. Structural Capacitance in Protein Evolution and Human Diseases. J Mol Biol. 2018;430:3200–3217. doi: 10.1016/j.jmb.2018.06.051. [DOI] [PubMed] [Google Scholar]
- 34.Mei S., Li F., Xiang D., et al. Anthem: a user customised tool for fast and accurate prediction of binding between peptides and HLA class I molecules. Brief Bioinforma. 2021 doi: 10.1093/bib/bbaa415. [DOI] [PubMed] [Google Scholar]
- 35.Bassani-Sternberg M., et al. Deciphering HLA motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity. PLoS Comp Bio. 2017 doi: 10.1371/journal.pcbi.1005725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Reynisson B., Alvarez B., Paul S., et al. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res. 2020;48:W449–W454. doi: 10.1093/nar/gkaa379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Grinberg M. O'Reilly Media, Inc; 2018. Flask Web Development: Developing Web Applications With Python. [Google Scholar]
- 39.Zemmour J., Parham P. Distinctive polymorphism at the HLA-C locus: implications for the expression of HLA-C. J Exp Med. 1992;176:937–950. doi: 10.1084/jem.176.4.937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Vita R., Mahajan S., Overton J.A., et al. The Immune Epitope Database (IEDB): 2018 update. Nucleic Acids Res. 2019;47:D339–D343. doi: 10.1093/nar/gky1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Shao W., Caron E., Pedrioli P., et al. The SysteMHC Atlas: a Computational Pipeline, a Website, and a Data Repository for Immunopeptidomic Analyses. Methods Mol Biol. 2020;2120:173–181. doi: 10.1007/978-1-0716-0327-7_12. [DOI] [PubMed] [Google Scholar]
- 42.Shao W., Pedrioli P.G.A., Wolski W., et al. The SysteMHC Atlas project. Nucleic Acids Res. 2018;46:D1237–D1247. doi: 10.1093/nar/gkx664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kanehisa M., Furumichi M., Sato Y., et al. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 2021;49:D545–D551. doi: 10.1093/nar/gkaa970. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary material




