MuPeXI: prediction of neo-epitopes from tumor sequencing data

Anne-Mette Bjerregaard; Morten Nielsen; Sine Reker Hadrup; Zoltan Szallasi; Aron Charles Eklund

doi:10.1007/s00262-017-2001-3

. 2017 Apr 20;66(9):1123–1130. doi: 10.1007/s00262-017-2001-3

MuPeXI: prediction of neo-epitopes from tumor sequencing data

Anne-Mette Bjerregaard ^1,^✉, Morten Nielsen ^1,², Sine Reker Hadrup ³, Zoltan Szallasi ^1,⁴, Aron Charles Eklund ^1,^✉

PMCID: PMC11028452 PMID: 28429069

Abstract

Personalization of immunotherapies such as cancer vaccines and adoptive T cell therapy depends on identification of patient-specific neo-epitopes that can be specifically targeted. MuPeXI, the mutant peptide extractor and informer, is a program to identify tumor-specific peptides and assess their potential to be neo-epitopes. The program input is a file with somatic mutation calls, a list of HLA types, and optionally a gene expression profile. The output is a table with all tumor-specific peptides derived from nucleotide substitutions, insertions, and deletions, along with comprehensive annotation, including HLA binding and similarity to normal peptides. The peptides are sorted according to a priority score which is intended to roughly predict immunogenicity. We applied MuPeXI to three tumors for which predicted MHC-binding peptides had been screened for T cell reactivity, and found that MuPeXI was able to prioritize immunogenic peptides with an area under the curve of 0.63. Compared to other available tools, MuPeXI provides more information and is easier to use. MuPeXI is available as stand-alone software and as a web server at http://www.cbs.dtu.dk/services/MuPeXI.

Electronic supplementary material

The online version of this article (doi:10.1007/s00262-017-2001-3) contains supplementary material, which is available to authorized users.

Keywords: Neo-epitopes, Neo-antigens, Immunotherapy, Prediction, Mutation, Sequencing

Introduction

Recent successes in several types of immunotherapy have demonstrated that exploitation of a patient’s own immune system is a promising strategy to eliminate cancer [1]. To understand, improve, and personalize this approach, it is important to characterize the specific targets recognized by the immune system. T cells may recognize tumor cells based on peptide epitopes derived from intracellular proteins and presented on the tumor cell surface in complex with human leukocyte antigen (HLA) class I protein. Tumor-associated epitopes may derive from abnormal protein expression or non-synonymous genetic alterations giving rise to new peptide sequences (neo-epitopes) [2]. Such neo-epitopes have recently received increasing focus as valuable targets for cancer-specific immune reactivity, promoted by the finding that mutational load and number of predicted neo-epitopes are strong correlators of clinical response to immune checkpoint inhibition in several cancer types, including non-small cell lung cancer (NSCLC) [3, 4] and melanoma [5, 6].

The somatic mutational landscape is largely heterogeneous across different patients, even with the same tumor origin. Consequently, neo-epitopes should be identified at a personalized level. Identifying a set of peptides that are likely to serve as targetable neo-epitopes in a given patient can be achieved by analyzing next generation sequencing (NGS) data acquired from an individual tumor [3, 7]. Conceptually, neo-epitope prediction based on NGS data can be divided into three steps: (a) convert a list of genomic mutations into protein sequences, and retain all possible mutation-containing “neo-peptides” of appropriate lengths; (b) predict binding to the patient-specific HLA alleles; and (c) assess the immunogenicity of each peptide based on features such as predicted binding, expression level and sequence similarity to unmutated self-proteins. Several studies have predicted neo-epitopes in various cancer types using these or similar considerations [3, 8].

Several publicly available tools for neo-epitope prediction following this approach have been described, including EpiToolKit [9], Epi-Seq [10] and pVac-Seq [11]. These tools all include many of the features expected to be relevant for identification of potential neo-epitopes. However, none of these tools provide all levels of information valuable for identification and prediction of neo-epitopes, either in terms of which peptides can be identified (i.e., including indels and frameshifts or not), a comparison to self-peptides, or the manner in which the binding to HLA is reported (binding affinities versus percentile ranks). Moreover, only one of these tools is available as a web server, and only one has implemented a ranking of the neo-peptides (Table 1).

Table 1.

Features of current neo-epitope prediction tools

	Epi-Seq	EpiToolKit	pVac-Seq	MuPeXI
Derives mutant peptides resulting from SNVs	+	+	+	+
Derives corresponding nearest normal peptides	+	−	+	+
Derives mutant peptides resulting from indels and frameshifts	−	+	+	+
Derives corresponding nearest normal peptides	−	−	−	+
Accounts for gene expression	+	−	+	+
Accounts for variant allele frequency	+	−	+	+
Available via web server	−	+	−	+
Available as software that runs locally	+	−	+	+
Ranks peptides according to predicted immunogenicity	+	−	−	+

Open in a new tab

Here, we present MuPeXI (the mutant peptide extractor and informer), a tool for neo-epitope identification aiming to address these shortcomings. MuPeXI generates lists of mutant peptides from mutation calls, including both single nucleotide variants (SNVs) and indels, and prioritizes these based on HLA binding affinity, expression level, similarity to self-peptides and mutant allele frequency. The tool provides the user with a sorted list of the full set of identified neo-peptides, with several informative annotations that can be filtered to select potential neo-epitopes.

Materials and methods

Non-small cell lung cancer (NSCLC) data

We obtained whole exome sequencing (WXS) data from three tumor regions (R1, R2, R3) from each of three NSCLC patients (L011, L012 and L013). Tumor infiltrating T lymphocytes (TILs) from patient L011 and L012 had been screened for T cell reactivity towards a peptide library of potential neo-epitopes originating from SNV calling and selected for a binding affinity of <500 nM determined by NetMHCpan 2.8. A multimer staining screen revealed three peptides positive for reactivity: L011; FAFQEYDSF, L012; SPMIVGSPW, LLLDIVAPK [4]. Screening of T cells reactive towards neo-peptide libraries using barcode-labeled multimers in patient L011 and L013 revealed nine peptides positive for reactivity, one overlapping with the previous screen: L011; GTSAPRKKK, SVTNEFCLK, VQFLSQVLWSR, RSMRTVYGLF, FAFQEYDSF, GPEELGLPM, L013; YSNYYCGLRY, ALQSRLQAL, KVCCCQILL [12]. In total, 11 peptides tested positive for T cell reactivity out of 995 peptides tested for all three patients. RNA-seq data was not available for any of these samples.

NGS data pre-processing

First, we processed the WXS data using Trim Galore [13], which combines the functions of Cutadapt [14] and FastQC [15]: trimming the reads below an average phred score of 20 (default value), cutting out standard adaptors such as those from Illumina, and running FastQC to evaluate data quality. Variant calling was performed following the Genome Analysis Toolkit (GATK) best practice guidelines for somatic variant detection [16]. Reads were aligned to the human genome (GRCh38) using the Burrows–Wheeler Aligner [17] version 0.7.10 with default mem options and with a read group provided for each sample for compatibility with the following steps. Duplicate reads were marked using picard-tools version 2.6.0 MarkDuplicates. Indel realignment and base recalibration were performed with GATK version 3.3.0 to reduce false positive variant calls. SNV and indel calls were made using MuTect2 from GATK version 3.5 [18]. MuTect2 is designed to call variants, both SNVs and indels, from tumor samples evaluating reads from matched tumor and normal samples. HLA alleles of each patient were inferred from the WXS data using OptiType [19] with default settings after filtering the reads aligning to the HLA region with RazerS version 3.4.0 [20].

Run of MuPeXI 1.1

The Variant Call Format (VCF) files produced by MuTect2, including the variant calls of R1 from the three patients, were run in parallel with the peptide length of 9 and for all six patient-specific HLA alleles.

Run of pVac-Seq 3.0.5

To run pVac-Seq version 3.0.5, modification of VCF files was necessary. Since most VCF files from somatic variant callers include both somatic and germline variant information, but pVac-Seq only allows somatic sample information, germline information was discarded from the files, following selection of passed variants. The ensembl variant effect predictor (VEP) [21] was run as described by pVac-Seq with the provided plugins, resulting in an annotated VCF file (VEP–VCF). The VEP–VCF files were given as input to the core script “pvacseq run” with standard settings. Due to loss of IEDB server connection, which we speculated could be due to overloading, the files were run sequentially. Files were split into ten variants per file to avoid this error, although this was still not 100% successful. The few files with unsuccessful completion were re-run manually until no error occurred, which took several days. It is recommended to run filtering of coverage on the standard output file. Therefore, read counts were generated using GATK version 3.5 ASEReadCount. A script was generated adding the read counts to the corresponding mutated peptide in the pVac-Seq binding filtered output files. The filtering script “pvacseq coverage_filter” was run for each file to generate the final output, with several different filtering combinations. pVac-Seq output was filtered on all possible combinations of binding affinity (50, 100, 200, 400, 500 or 800 nM), genomic mutant variant allele frequency (0, 1, 5, 10, 20, or 40%), and fold change between normal and mutated peptide binding affinity (0 or 1).

Comparative analysis

Peptides identified by pVac-Seq and MuPeXI corresponding to the experimentally evaluated peptides from the previous study and matching the HLA it was ordered for were selected. The MuPeXI output from all three patients was combined into one table, immunogenic peptides were annotated, and the priority score was recalculated at high precision (Supplementary Table 2). Sensitivity and specificity were calculated from the pVac-Seq output files originating from various filtering combinations (Supplementary Table 3). A receiver operator characteristic (ROC) curve was generated from the MuPeXI priority score and plotted together with the individual point of sensitivity and specificity from pVac-Seq. We note that the priority score was defined prior to the benchmarking analysis.

Calculation of priority score

For each peptide/HLA-allele combination, we obtain the following values:

R _m: The % rank affinity of the mutant peptide, as output by NetMHCpan.

R _n: The % rank affinity of the normal peptide, as output by NetMHCpan.

E: Expression level of the corresponding gene, in transcripts per million (TPM).

M: The number of mismatches between the mutant and normal peptides.

$A$ : Mutant allele frequency, as detected by the variant caller (MuTect or MuTect2).

N: Normal exact match penalty: 0 if mutated peptide matches 100% to any peptide in the reference proteome, else 1.

The “normal peptide” is here defined as either (1) the unmutated version of the peptide, if the mutant peptide is derived from a missense mutation; or (2) the most similar peptide of the same length in the normal proteome, if the mutant peptide is not derived from a missense mutation.

The priority score $P$ is defined as:

P = [L (R_{m}) A tanh (E)] [N (1 - 2^{- M} L (R_{n}))]

where L is the logistic function given by:

L (x) = \frac{1}{1 + e^{5 (x - 2)}}

The first term in square brackets in the priority score is related to the abundance of the peptide relative to the entire pool of peptides presented by the HLA, and the second term is related to the potential decrease in immunogenicity of the peptide due to negative selection against cross-reacting T cells. L is a negative logistic function, giving a value approaching zero for high rank affinity, a midpoint at a rank affinity of 2, and a value approaching one for low rank affinity. The constant 2 defines the inflection point and was chosen because a % rank of 2 is the recommended cutoff for peptide binding. The constant five affects the steepness of the function and was chosen by intuition. The hyperbolic tangent function (tanh) gives values proportional to E at low E, and values approaching 1 for high E. The tanh function has an implicit scaling constant of 1 (i.e. instead of $tanh (E)$ , we could have used the more general form tanh(E/k) where k is a constant) which can be justified by the previously used cutoff of 1 fragment per kilobase per million mapped reads (FPKM) (e.g., as described in [22]). This is approximately equivalent to the belief that a mutant peptide present at 1 part per million inside the cell should be sufficient to generate a T cell response (assuming the peptide binds the HLA molecule and a recognizing T cell exists), and higher concentrations of the mutant peptide will only marginally increase the probability of response.

Results

Data preparation and input files

Before running MuPeXI, tumor and normal specimens should be whole-genome or -exome sequenced, and somatic mutations identified, preferably using MuTect2 which identifies both SNVs and short indels [18]. The class I HLA alleles must be determined, either from the sequencing data or by other methods. Several tools exist for calling HLA alleles from sequencing data; both OptiType [19] and Polysolver [23] reportedly achieve ~97% accuracy, which is often sufficient for research use. The minimum input to MuPeXI is a VCF file and a list of HLA alleles. If available, a transcriptional profile of the tumor should also be provided (Supplementary Fig. 1).

MuPeXI algorithm

The MuPeXI algorithm consists of the following seven steps (Fig. 1):

(A)
Effect prediction: The Ensembl Variant Effect Predictor (VEP) [21] is used to identify non-synonymous mutations (SNVs and indels) and infer cDNA and protein sequence changes that result from each genomic alteration (Fig. 1a).
(B)
Neo-peptide extraction: Somatic mutation position information from the VEP file is used to identify neo-peptides. For in-frame mutations, the mutant protein sequence is inferred directly from the Ensembl protein reference. For frameshift mutations, the mutant protein sequence is inferred by translating the mutant cDNA sequence. All alteration-containing substrings (of user-defined lengths) of the altered protein are retained as potential neo-peptides (Fig. 1b).
(C)
Similarity to normal peptides: All neo-peptides identical to a peptide in the human proteome are penalized in the prioritization, as these are likely non-immunogenic due to central tolerance. Then, for each neo-peptide, the most similar normal peptide is identified from the unmutated amino acid sequence (for SNVs) or by searching the reference proteome for the most similar peptide with up to four mismatches (for indels) (Fig. 1c).
(D)
Prediction of HLA binding: Binding predictions are made between peptides and the (up to six) patient-specific HLA types using NetMHCpan version 3.0 [24]. Both the predicted binding affinity and the percent rank score are annotated, although only rank score is used for prioritization due to this measure being more unbiased when comparing binding between multiple HLA alleles [24] (Fig. 1d).
(E)
Gene expression summation: Either gene-level or transcript-level expression data can be provided. In the latter case (which we recommend), the sum of all transcripts including the specific peptides is annotated (Fig. 1e).
(F)
Annotation: Relevant information is combined into an output table according to each peptide-HLA, including (Supplementary Table 1) (a) Mutated and normal peptide sequence (column 2 and 5); (b) predicted binding affinity and percent rank score for the mutant and normal peptides (column 3–4 and 6–7); (c) gene expression level of the corresponding transcript(s) or gene (column 21); (d) gene symbol and cancer driver status of the corresponding gene (column 18, 19) and (e) mutation position in the peptide, protein and genome (column 13–16) (Fig. 1f).
(G)
Priority score: MuPeXI implements a simple ranking system of peptides based on features used in earlier studies [5, 8]. Briefly, we aimed to prioritize peptides which are likely to be abundantly presented by MHCs on the cell surface and whose recognizing T cells, if they exist, are unlikely to be depleted by negative selection. To enable prioritization (rather than selection) of peptides, we used smooth functions (rather than thresholds) of all relevant parameters. For each peptide-HLA combination, a priority score is calculated based on the product of a term related to peptide-HLA presentation and abundance and a term related to dissimilarity between the mutant peptide and the most similar normal peptide (Fig. 1g; details in “Materials and methods” section). Instead of using a traditional binary binding call based on a fixed affinity threshold, we apply a sigmoidal logistic function to the rank peptide-HLA binding affinity. Similarly, we apply a hyperbolic tangent function to the expression level of the corresponding gene, which is proportional to expression level at low expression values but asymptotically approaches one at high expression values. The second term is related to the potential decrease in immunogenicity of the peptide due to negative selection against cross-reacting T cells. This is estimated by the affinity of the most similar normal peptide, and assuming that each mismatch between normal and mutant peptides reduces the probability of cross-reactivity by a factor of two [25]. Since the exact determinants of immunogenicity are poorly understood, the priority score is intended only as a rough guide based on current knowledge. The priority score will be developed further when increasing knowledge related to neo-epitope immunogenicity becomes available.

This information-rich output gives the user the ability to make an informed selection of neo-peptides with high potential of being neo-epitopes, using either the priority score or other criteria.

Assessment and comparison to existing tools

Of the three previously described tools for neo-epitope prediction, only pVac-Seq and MuPeXI take all the following factors into account: expression of the mutated gene, similarity of neo-peptides to self-peptides, and indels as a source of neo-peptides (Table 1). Epi-Seq includes expression and a prioritization of the neo-peptides, but does not include indels and frameshifts. EpiToolKit, which is based on the FRED-2 framework for epitope detection [26], does not account for neither expression level, mutant allele frequency, nor the corresponding normal peptide. Compared to pVac-Seq, MuPeXI offers (1) a comprehensive search for the most similar non-mutated peptide (for indels), (2) a priority score for ranking peptides, and (3) availability via web server.

Since MuPeXI and pVac-Seq have similar features, we performed a comparative analysis between these two tools using data from two previously published studies, including the analyses of T cell reactivity towards predicted neo-peptides in three NSCLC patients (L011, L012 and L013) [4, 12]. In these studies, a total of 995 unique 9–11mer neo-peptides were identified and used to screen for T cell reactivity using fluorescent- or DNA barcode-labeled MHC multimers. T cell recognition was identified in the corresponding patient samples for 11 of the predicted neo-peptides [4, 12]. These peptides were identified solely on SNVs (not indels), using a cutoff of 500 nM binding affinity determined by NetMHCpan 2.8. Raw WXS FASTQ files from Region 1 of the three NSCLC patients were obtained, NGS analysis was run to determine somatic mutations, and the resulting VCF files were processed either through MuPeXI (following the processes described in Fig. 1) or through pVac-Seq, focusing on 9mer peptides and following the previously published and online available instructions.

We compared the output of pVac-Seq and MuPeXI and found that of the 322 unique 9mer peptides identified in the original studies, only 190 were identified by pVac-Seq and MuPeXI, most likely due to differences in the NGS analysis, especially the variant calling. One difference in the output provided by the two methods is that pVac-Seq does not provide the most similar normal peptide for indels as MuPeXI does. A second difference in the output is that pVac-Seq provides a list of predicted neo-epitopes based on a set of filtering settings, whereas MuPeXI provides all neo-peptides sorted according to a priority score. Notably, the MuPeXI output can also be easily filtered according to specific criteria, e.g., in a spreadsheet program; thus the MuPeXI output does not lack any essential functionality.

We sought to assess whether the MuPeXI priority score discriminates between neo-peptides that elicited a T cell response and those that did not, and to compare this to the filtering approach as implemented by pVac-Seq. First, we constructed an ROC curve based on the MuPeXI priority score (Fig. 2, Supplementary Table 2). Even within this set of peptides that was pre-selected based on predicted HLA binding, we found that the MuPeXI priority score provided additional predictive information (AUC = 0.635). We next calculated sensitivity and specificity from pVac-Seq, using all combinations of a set of distinct filtering values based on binding affinity, genomic mutant allele frequency, and fold change between normal and mutated peptide binding affinity (Fig. 2, Supplementary Table 3). In general, the pVac-Seq filtering approach also provided additional predictive information, with some sets of filtering criteria providing better performance and some providing worse performance (Fig. 2). Note, however, that pVac-Seq using default filtering values identified zero of the immunogenic peptides (red triangle in Fig. 2), underlining the great challenge in defining appropriate and universal thresholds for epitope identification.

Fig. 2 — MuPeXI and pVac-Seq performance comparison. The MuPeXI performance is plotted as a ROC curve based on the priority score. pVac-Seq performance is plotted as individual calculations of sensitivity and specificity according to various filtering combinations. The *red triangle* indicates pVac-Seq performance using default filtering criteria

Aside from the performance and technical considerations, an additional important difference between MuPeXI and pVac-Seq was observed in the usability. MuPeXI is available both as a web server and as a downloadable command-line tool, which enables both advanced and less advanced users to extract neo-peptides and select potential neo-epitopes. MuPeXI takes the raw VCF file from a variant caller and selects the variants flagged as PASS, before formatting the file for VEP compatibility; therefore the VCF file can be passed directly to MuPeXI. In contrast, pVac-Seq requires the ability to run command-line tools and to do extensive file manipulation to run the complete recommended filtering analysis (see “Materials and methods” section for details). Furthermore, when the NSCLC samples were run through pVac-Seq with default filtering values (binding affinity: 500 nM, allele frequency: 40% and fold change: 0), no peptides passed the filtering criteria (Fig. 2, red triangle). Therefore, a successful application of pVac-Seq requires user insights about the data analyzed and about which selection/filtering criteria would be appropriate for data from a particular specimen. Without such insights, it becomes challenging for the user to define the optimal filter values and obtain a list of relevant peptides.

Availability

MuPeXI is available as a command-line tool for Linux or MacOS at https://github.com/ambj/MuPeXI. MuPeXI is implemented primarily in Python 2.7, with the nearest normal peptide search implemented in C. MuPeXI requires Python 2.7 with the Python extensions Biopython, NumPy and pandas. Also, MuPeXI requires a local installation of NetMHCpan 3.0 and the Ensembl Variant Effect Predictor (VEP). The VEP version must match the reference genome version used.

MuPeXI is also available as a web server at http://www.cbs.dtu.dk/services/MuPeXI. The web server primarily supports VCF files generated using the human genome version GRCh38. VCF files based on HG19 alignment are also accepted and processed by lift-over to GRCh38. Other genome versions, and non-human genomes, are not currently supported. The VCF file size limit on the web server is 20 megabytes.

In the current implementations, mutant allele frequency will be considered by MuPeXI only if MuTect or MuTect2 was used for variant calling.

Discussion

Prediction of neo-epitopes from NGS data is a complex task involving several non-trivial steps of data parsing and bioinformatics. Our tool MuPeXI provides a powerful and thorough analysis with an easy-to-use web interface. Of the tools previously proposed to solve this task, pVac-Seq is closest in functionality to MuPeXI. Overall, we found that the two tools identify the same neo-peptides, but that MuPeXI provides more information than pVac-Seq, is easier to use, and prioritizes the peptides guiding the user to select the most likely immunogenic peptides. However, MuPeXI lacks some features present in other tools. For example, MuPeXI does MHC-binding prediction using NetMHCpan only, whereas pVac-Seq and EpiToolKit provide the option to use several alternative algorithms. Also, EpiToolKit and the related standalone framework FRED-2 provide several additional capabilities, such as interface to HLA typing algorithms and tools for vaccine design.

MuPeXI ranks peptides based on a novel but simple priority score intended to rank peptides according to likelihood of eliciting a T cell response. We found that this priority score provides a ranking of neo-peptides with predictive power that improves upon the use of NetMHCpan alone and that is comparable to pVac-Seq. However, we note that this analysis of the priority score was based on a very small amount of available data, and thus it is not possible to draw a strong conclusion about its true utility. Thus, we expect that many users will use MuPeXI output to select potential neo-epitopes based on their own filtering criteria. We anticipate that additional data, when available, will be needed to validate and/or optimize the priority score or other methods to predict immunogenicity of potential neo-epitopes.

To generate such data, and improve our understanding of neo-epitope immunogenicity, screening of large panels of neo-peptides is needed. Novel tools for assessing T cell recognition using DNA barcode-labeled MHC multimers may assist to provide such data [12]. In depth analyses on larger patient cohorts for which we have both T cell recognition profiling and NGS data will allow us to determine the impact on immunogenicity of other features, such as difference in HLA binding affinity and stability between mutant and normal peptide, position of the mutation in the peptide, neo-epitope expression level and mutation clonality, selective pressure and oncogenic properties. It is our belief that such a large data set combined with a tool such as MuPeXI will set the ground for moving the field of rational epitope identification forward and improve our understanding of peptide T cell immunogenicity in the context of cancer immunotherapy.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 205 kb)^{(205.8KB, pdf)}

Acknowledgements

We thank Charles Swanton and Nicholas McGranahan for providing the raw data from the two NSCLC studies; Sofie Ramskov, Rikke Lyngaa and Sunil Kumar Saini for their experimental work in these studies; Amalie Kai Bentzen for her contribution to methods development; and Thomas Trolle, Andrea Marquard and Marcin Krzystanek for helpful discussions.

Funding

This work was supported by the Danish Cancer Society under grant R72-A4618 (Aron Charles Eklund); the Novo Nordisk Foundation under Grant 16,854 (Zoltan Szallasi); the Breast Cancer Research Foundation (Zoltan Szallasi); and the Danish Council for Independent Research under Grant 1331-00283 (Sine Reker Hadrup, Zoltan Szallasi).

Abbreviations

AUC: Area under the curve
MuPeXI: Mutant peptide extractor and informer
NGS: Next generation sequencing
NSCLC: Non-small cell lung cancer
RNA-seq: RNA sequencing
ROC: Receiver operator characteristic
SNV: Single nucleotide variant
VCF: Variant call format
VEP: Variant effect predictor
WXS: Whole exome sequencing

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest. 

Contributor Information

Anne-Mette Bjerregaard, Phone: (+45) 452 56144, Email: ambj@cbs.dtu.dk.

Aron Charles Eklund, Phone: (+45) 452 56144, Email: eklund@cbs.dtu.dk.

References

1.Vormehr M, Diken M, Boegel S, et al. Mutanome directed cancer immunotherapy. Curr Opin Immunol. 2015;39:14–22. doi: 10.1016/j.coi.2015.12.001. [DOI] [PubMed] [Google Scholar]
2.Schumacher TN, Schreiber RD. Neoantigens in cancer immunotherapy. Science. 2015;348:69–74. doi: 10.1126/science.aaa4971. [DOI] [PubMed] [Google Scholar]
3.Rizvi NA, Hellmann MD, Snyder A, et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science. 2015;348:124–128. doi: 10.1126/science.aaa1348. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.McGranahan N, Furness AJS, Rosenthal R, et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science. 2016;351:1463–1469. doi: 10.1126/science.aaf1490. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Snyder A, Makarov V, Merghoub T, et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N Engl J Med. 2014;371:2189–2199. doi: 10.1056/NEJMoa1406498. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Hugo W, Zaretsky JM, Sun L, et al. Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell. 2016;165:35–44. doi: 10.1016/j.cell.2016.02.065. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Olsen LR, Campos B, Barnkob MS, et al. Bioinformatics for cancer immunotherapy target discovery. Cancer Immunol Immunother. 2014;63:1235–1249. doi: 10.1007/s00262-014-1627-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Rajasagi M, Shukla S, Fritsch EF, et al. Systematic identification of personal tumor-specific neoantigens in chronic lymphocytic leukemia. Blood. 2014;124:453–462. doi: 10.1182/blood-2014-04-567933. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Schubert B, Brachvogel H-P, Jurges C, Kohlbacher O. EpiToolKit–a web-based workbench for vaccine design. Bioinformatics. 2015;31:2211–2213. doi: 10.1093/bioinformatics/btv116. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Duan F, Duitama J, Al Seesi S, et al. Genomic and bioinformatic profiling of mutational neoepitopes reveals new rules to predict anticancer immunogenicity. J Exp Med. 2014;211:2231–2248. doi: 10.1084/jem.20141308. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Hundal J, Carreno BM, Petti AA, et al. pVAC-Seq: a genome-guided in silico approach to identifying tumor neoantigens. Genome Med. 2016;8:11. doi: 10.1186/s13073-016-0264-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Bentzen AK, Marquard AM, Lyngaa R, et al. Large-scale detection of antigen-specific T cells using peptide-MHC-I multimers labeled with DNA barcodes. Nat Biotechnol. 2016;34:1037–1045. doi: 10.1038/nbt.3662. [DOI] [PubMed] [Google Scholar]
13.Krueger F Trim Galore (2016) http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/. Accessed 19 Sep 2016
14.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
15.Andrews S FastQC (2016) http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Accessed 19 Sep 2016
16.Van der Auwera GA, Carneiro MO, Hartl C, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinforma. 2013;43:11. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Cibulskis K, Lawrence MS, Carter SL, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. 2013;31:213–219. doi: 10.1038/nbt.2514. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Szolek A, Schubert B, Mohr C, et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics. 2014;30:3310–3316. doi: 10.1093/bioinformatics/btu548. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Weese D, Holtgrewe M, Reinert K. RazerS 3: faster, fully sensitive read mapping. Bioinformatics. 2012;28:2592–2599. doi: 10.1093/bioinformatics/bts505. [DOI] [PubMed] [Google Scholar]
21.McLaren W, Gil L, Hunt SE, et al. The ensembl variant effect predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Gubin MM, Artyomov MN, Mardis ER, Schreiber RD. Tumor neoantigens: building a framework for personalized cancer immunotherapy. J Clin Invest. 2015;125(9):3413–3421. doi: 10.1172/JCI80008. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Shukla S, Rooney MS, Rajasagi M, et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat Biotechnol. 2015;33:1152–1158. doi: 10.1038/nbt.3344. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Nielsen M, Andreatta M. NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets. Genome Med. 2016;8:33. doi: 10.1186/s13073-016-0288-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Hoof I, Pérez CL, Buggert M, et al. Interdisciplinary analysis of HIV-specific CD8+ T cell responses against variant epitopes reveals restricted TCR promiscuity. J Immunol. 2010;184:5383–5391. doi: 10.4049/jimmunol.0903516. [DOI] [PubMed] [Google Scholar]
26.Schubert B, Walzer M, Brachvogel H-P, et al. FRED 2: an immunoinformatics framework for Python. Bioinformatics. 2016;32:2044–2046. doi: 10.1093/bioinformatics/btw113. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material 1 (PDF 205 kb)^{(205.8KB, pdf)}

[CR1] 1.Vormehr M, Diken M, Boegel S, et al. Mutanome directed cancer immunotherapy. Curr Opin Immunol. 2015;39:14–22. doi: 10.1016/j.coi.2015.12.001. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Schumacher TN, Schreiber RD. Neoantigens in cancer immunotherapy. Science. 2015;348:69–74. doi: 10.1126/science.aaa4971. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Rizvi NA, Hellmann MD, Snyder A, et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science. 2015;348:124–128. doi: 10.1126/science.aaa1348. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.McGranahan N, Furness AJS, Rosenthal R, et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science. 2016;351:1463–1469. doi: 10.1126/science.aaf1490. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Snyder A, Makarov V, Merghoub T, et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N Engl J Med. 2014;371:2189–2199. doi: 10.1056/NEJMoa1406498. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Hugo W, Zaretsky JM, Sun L, et al. Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell. 2016;165:35–44. doi: 10.1016/j.cell.2016.02.065. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Olsen LR, Campos B, Barnkob MS, et al. Bioinformatics for cancer immunotherapy target discovery. Cancer Immunol Immunother. 2014;63:1235–1249. doi: 10.1007/s00262-014-1627-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Rajasagi M, Shukla S, Fritsch EF, et al. Systematic identification of personal tumor-specific neoantigens in chronic lymphocytic leukemia. Blood. 2014;124:453–462. doi: 10.1182/blood-2014-04-567933. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Schubert B, Brachvogel H-P, Jurges C, Kohlbacher O. EpiToolKit–a web-based workbench for vaccine design. Bioinformatics. 2015;31:2211–2213. doi: 10.1093/bioinformatics/btv116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Duan F, Duitama J, Al Seesi S, et al. Genomic and bioinformatic profiling of mutational neoepitopes reveals new rules to predict anticancer immunogenicity. J Exp Med. 2014;211:2231–2248. doi: 10.1084/jem.20141308. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Hundal J, Carreno BM, Petti AA, et al. pVAC-Seq: a genome-guided in silico approach to identifying tumor neoantigens. Genome Med. 2016;8:11. doi: 10.1186/s13073-016-0264-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Bentzen AK, Marquard AM, Lyngaa R, et al. Large-scale detection of antigen-specific T cells using peptide-MHC-I multimers labeled with DNA barcodes. Nat Biotechnol. 2016;34:1037–1045. doi: 10.1038/nbt.3662. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Krueger F Trim Galore (2016) http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/. Accessed 19 Sep 2016

[CR14] 14.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]

[CR15] 15.Andrews S FastQC (2016) http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Accessed 19 Sep 2016

[CR16] 16.Van der Auwera GA, Carneiro MO, Hartl C, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinforma. 2013;43:11. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Cibulskis K, Lawrence MS, Carter SL, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. 2013;31:213–219. doi: 10.1038/nbt.2514. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Szolek A, Schubert B, Mohr C, et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics. 2014;30:3310–3316. doi: 10.1093/bioinformatics/btu548. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Weese D, Holtgrewe M, Reinert K. RazerS 3: faster, fully sensitive read mapping. Bioinformatics. 2012;28:2592–2599. doi: 10.1093/bioinformatics/bts505. [DOI] [PubMed] [Google Scholar]

[CR21] 21.McLaren W, Gil L, Hunt SE, et al. The ensembl variant effect predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Gubin MM, Artyomov MN, Mardis ER, Schreiber RD. Tumor neoantigens: building a framework for personalized cancer immunotherapy. J Clin Invest. 2015;125(9):3413–3421. doi: 10.1172/JCI80008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Shukla S, Rooney MS, Rajasagi M, et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat Biotechnol. 2015;33:1152–1158. doi: 10.1038/nbt.3344. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Nielsen M, Andreatta M. NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets. Genome Med. 2016;8:33. doi: 10.1186/s13073-016-0288-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Hoof I, Pérez CL, Buggert M, et al. Interdisciplinary analysis of HIV-specific CD8+ T cell responses against variant epitopes reveals restricted TCR promiscuity. J Immunol. 2010;184:5383–5391. doi: 10.4049/jimmunol.0903516. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Schubert B, Walzer M, Brachvogel H-P, et al. FRED 2: an immunoinformatics framework for Python. Bioinformatics. 2016;32:2044–2046. doi: 10.1093/bioinformatics/btw113. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

MuPeXI: prediction of neo-epitopes from tumor sequencing data

Anne-Mette Bjerregaard

Morten Nielsen

Sine Reker Hadrup

Zoltan Szallasi

Aron Charles Eklund

Abstract

Electronic supplementary material

Introduction

Table 1.

Materials and methods

Non-small cell lung cancer (NSCLC) data

NGS data pre-processing

Run of MuPeXI 1.1

Run of pVac-Seq 3.0.5

Comparative analysis

Calculation of priority score

Results

Data preparation and input files

MuPeXI algorithm

Fig. 1.

Assessment and comparison to existing tools

Fig. 2.

Availability

Discussion

Electronic supplementary material

Acknowledgements

Funding

Abbreviations

Compliance with ethical standards

Conflict of interest

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases