Skip to main content
. 2020 Oct 7;12(10):2879. doi: 10.3390/cancers12102879

Table 1.

The list of currently available computational pipelines for neoantigen prediction *.

Pipeline Source, Required Input Data and Otput: Workflow and Features: Refs.
EpiToolkit
2015
Source: http://www.epitoolkit.de (not available)
Description: Web-based pipeline focused on vaccine design. It includes simplified interfaces allowing to combine tools into a workflow.
Input: Not described.
Output: Interactive presentation of the results as HTML and Internal representation (List of predicted peptides with scores).
  1. MHC genotyping (OptyType)

  2. Epitope and neoepitope prediction (SYFPEITHI, BIMAS, SVMHC, NetMHC family, UniTope, TEPTITOPEpzn)

  3. Epitope selection for vaccine design

  4. Epitope assembly

[137]
FRED2
(FRamework for Epitope Detection)
2016
Source: https://github.com/FRED-2/Fred2
Description: Computational pipeline for T-cell epitope detection and vaccine design implemented in Python. Can be extended by additional tools.
Input: Sequencing reads (FASTA format).
Output: Not described.
  1. HLA typing (OptiType, Polysolver, seq2HLA, ATHLATES)

  2. T-cell epitope prediction
    • Epitope prediction (NetMHC 3.0)
    • TAPPrediction
    • CleavagePrediction (NetChop)
  3. Epitope selection (OptiTope)

  4. Epitope assembly (String-of-Beads, Spacer Design)

[138]
TepiTool
2016
Source: http://tools.iedb.org/tepitool/
Description: Web-based user-friendly computational pipeline for T cell epitope prediction hosted by IEDB. It is applicable to human, chimpanzee, cow, gorilla, macaque, mouse and pig. The web-tool associated article contains a step-by-step protocol of analysis with a comprehensive description of each step, recommendations to do, and a description of anticipated results.
Input: Protein sequences in single-letter amino acid code (FASTA format), the list of HLA alleles.
Output: Tables with peptide sequences with predicted features.
  1. Provide sequence data

  2. Select the host species and MHC allele class

  3. Select the alleles for binding prediction

  4. Select peptides to be included in the prediction

  5. Select preferred methods for binding prediction and peptide selection and cutoff values (for MHC class I—Consensus (IEDB recommended 2006), NetMHCpan 2.8, NetMHC 3.4, etc; for MHC class II - Consensus (IEDB recommended 2006), NetMHCIIpan 3.0, NetMHCII 2.2, etc.)

  6. Review selection, enter job details and submit data

[139]
Vaxrank
2017
Source: https://github.com/openvax/vaxrank
Description: Computational framework for selecting neoantigens for vaccine peptides based on tumor mutations, tumor RNA sequencing and HLA type data. It was designed and used in the Personalized Genomic Vaccine Phase I trial (NCT02721043).
Input: Tumor mutations (VCF format), tumor RNA-seq (BAM format), patient HLA alleles.
Output: Set of vaccine peptides.
  1. Determination of RNA abundance and extraction of mutated protein sequences

  2. Predicting MHC binding (MHCtools)

  3. Ranking mutant sequences

  4. Optimizing sequences for peptide synthesis

[66,67]
neoantigeneR
2017
Source: https://rdrr.io/github/tangshao2016/neoantigenR/
Description: R-based pipelines for neoantigen prediction using raw NGS data.
Input: DNA-Seq, RNA-Seq, ExomeSeq (tumor and/or normal) short or long sequence reads (FASTA format), GFF annotation.
Output: The list of high-affinity HLA class I binding neoantigen candidates.
  1. Sequence alignment and isoform calling (Bowtie2, Cufflinks)

  2. Epitope prediction: extracting putative novel peptide sequences

  3. Candidate scoring by MHC binding prediction (NetMHC 3.4)

[140]
CloudNeo
2017
Source: https://github.com/TheJacksonLaboratory/CloudNeo
Description: Cloud-based (implemented on CWL) workflow for neoantigen identification using NGS data.
Input: VCF format (list of non-synonymous mutations), BAM format (for HLA typing).
Output: HLA binding affinity predictions for all mutated peptides.
  1. VCF processing and extraction of mutated peptide sequences (Protein_Translator)

  2. HLA typing (Polysolver, HLAminer)

  3. Peptide-MHC affinity prediction (NetMHCpan 3.0)

[141]
MuPexi (Mutant peptide extractor and informer)
2017
Source: http://www.cbs.dtu.dk/services/MuPeXI/
Description: Web-based tool for neo-epitope identification using somatic mutation calls (SNV, INDELs) and obtaining information about HLA binding affinity, expression level, similarities to self-peptides and mutant allele frequency for each mutated peptide. Supplemented by brief instructions and output format description.
Input: Somatic mutation calls (VCF format), list of HLA types, gene expression profile (optional).
Output: Table with all tumor-specific peptides derived from substitutions, insertions and deletions with annotation (HLA binding affinity and similarity to normal peptides).
  1. Effect prediction (The Ensembl Variant Effect Predictor)—selecting of non-synonymous mutations

  2. Neo-peptide extraction

  3. The similarity to normal peptide estimation: removing mutated peptides similar to peptides in the human proteome from prioritization

  4. Prediction of HLA binding (NetMHCpan 3.0)

  5. Gene expression profiling

  6. Annotation

  7. Prioritization

[142]
TIminer (Tumor Immunology miner)
2017
Source: https://icbi.imed.ac.at/software/timiner/timiner.shtml (not available)
Description: Computational framework that provides complex immunogenomic analysis including HLA typing, neoantigens prediction, characterization of immune infiltrates and quantification of tumor immunogenicity.
Input: RNA-seq reads (FASTQ format), somatic DNA mutations (VCF format).
Output: Not described.
  1. HLA genotyping (Optitype)

  2. Prediction of tumor neoantigens (NetMHCpan 3.0)

  3. Characterization of tumor-infiltrating immune cells from bulk RNA-seq data (kallisto)

  4. Quantification of tumor immunogenicity from expression data

[143]
TSNAD
(Tumor-specific neoantigen detector)
2017
Source: https://github.com/jiujiezz/tsnad
Description: Pipeline with GUI allowing to identify tumor-specific mutant proteins according to GATK best practices. It provides two strategies: 1.Extraction of extracellular mutations from membrane proteins; 2. MHC affinity prediction for class I MHC. Allows us to start from raw NGS data.
Input: Pair-ended sequencing data (FASTQ format) from WES.
Output: List of somatic mutations with annotations, extracellular mutations of the membrane proteins and the MHC-binding information (TXT format).
  1. Detection of cancer somatic mutations according to GATK best practices (Trimmomatic, BWA, samtools, Picard tools, GATK tools, ANNOVAR)

  2. Prediction of neoantigens (TMHMM—for extracellular mutations, NetMHCpan 2.8—for MHC-binding affinity prediction for class I MHC).

[144]
INTEGRATE-neo
2017
Source: https://github.com/ChrisMaherLab/INTEGRATE-Neo
Description: The pipeline is focused on the discovery of neoantigens derived from gene fusions.
Input: Reads in FASTQ format, the human reference genome in FASTA format, gene models in GenePred format, genes fusion in BEDPE format predicted by INTEGRATE.
Output: BEDPE format file.
  1. Gene fusion peptide prediction

  2. HLA allele prediction (HLAminer)

  3. Gene fusion neoantigen discovery (NetMHC 4.0)

[88]
NeoepitopePred
2017
Source: https://github.com/stjude/NeoepitopePred
Description: Workflow for identification of putative neoepitopes derived from SNV and gene fusions based on WGS data.
Input: FASTQ format (PE or SE) or BAM format files,
Output: Not described.
  1. HLA typing—stjude-hlatype applet (OptiType)

  2. Predict affinity of peptides to HLA—stjude-epitope applet (NetMHCcons 1.1)

  3. Identification of Fusion junctions (CICERO)

[145]
Neopepsee
2018
Source: https://sourceforge.net/projects/neopepsee/
Description: Machine learning-based neoantigen prediction tool for NGS data.
Input: Raw RNA-seq data (FASTQ format) and list of somatic mutations (VCF format), clinical HLA typing (if available)
Output: mutated peptide sequences and gene expression levels, determination of immunogenic neoantigens.
  1. Transcript isoform prediction

  2. HLA type prediction (HLAminer)

  3. MHC binding affinity prediction (IEDB-Peptide binding to MHC class I molecules)

  4. Feature calculation

  5. Immunogenicity classification (IEDB-T cell class I pMHC immunogenicity predictor)

[65]
ScanNeo
2019
Source: https://github.com/ylab-hi/ScanNeo
Description: Computational pipeline for the identification of short and large indels-derived neoantigens utilizing RNA-seq data. ScanNeo consists of independent modules implementing three analysis steps.
Input: RNA-seq data in BAM format.
Output: Ranked set of neoantigens.
  1. Indels discovery:
    • duplicated reads removal (Picard tools)
    • spliced reads removal (sambamba)
    • realignment (BWA-MEM)
    • indels calling (transIndel)
  2. Annotation and filtering
    • Putative PCR slippage derived indels removal
    • Indel annotation (Variant Effect Predictor)
    • Germline indels removal
  3. Neoantigen prediction
    • Indel-derived peptide sequences generation (pVac-seq)
    • High-affinity peptides prediction (NetMHC 3.0 and NetMHCpan 3.0)
    • Prediction results merging and filtering
Note: HLA typing carries out using yara aligner and OptiType tool or HLA type provides by the user.
[146]
DeepHLApan
2019
Source: http://biopharm.zju.edu.cn/deephlapan/
Description: Deep learning approach for neoantigen prediction considering both HLA-peptide binding (binding model) and immunogenicity (immunogenicity model) of peptide-HLA complex.
Input: CSV format files with head of “Annotation,HLA,peptide”. Only HLA-A,B,C alleles.
Output: Binding score (ranges from 0 to 1, the probability that peptide binds with HLA), Immunogenicity score (ranges from 0 to 1; 0.5 is the threshold to select the predicted immunogenic pHLA).
  1. The binding model for predicting the probability of the peptide being presented to the tumor cell membrane by HLA

  2. Immunogenicity model for predicting the potential of pHLA eliciting T-cell activation.

[147]
pTuneous
(prioritizing tumor
neoantigens from next-generation sequencing data)
2019
Source: https://github.com/bm2-lab/pTuneos
Description: In silico tool to predict the immunogenicity of SNV-derived neoepitopes that consider MHC presentation and T-cell recognition ability. It is based on experimentally validated neoantigens.
It contains Pre&RecNeo module—learning-based framework allowing to predict and prioritize neoepitopes recognized by T cells and RefinedNeo module—neoepitope scoring schema allowing to evaluate the naturally processed and presented neoepitope immunogenicity
Input: PairMatchDNA (WES) mode accept WES and RNA-seq sequencing data (FASTQ format), VCF mode accepts VCF format file with mutation set, expression profile (e.g., obtained by kallisto), copy number profile (e.g., obtained by sequenza).
Output: TSV files (snv_neo_model.tsv and indel_neo_models.tsv) containing extracted mutated peptides derived from non-synonimous SNV and INDELs and corresponding immunity score measures.
WES mode:
  • Sequencing quality control (Trimmomatic)

  • Mutation calling (Strelka)

  • HLA typing (Optitype)

  • Expression profiling (kallisto)

  • Neoantigen prediction, filtering and annotation (NetMHCpan 4.0)

VCF mode:
  • Neoantigen prediction, filtering and annotation

[148]
NeoPredPipe
2019
Source: https://github.com/MathOnco/NeoPredPipe
Description: Pipeline that provides predictions on multi-region sequence data and assessing intra-tumor heterogeneity (IHC) of the antigenic landscape of tumors.
Input: Multi- or single region VCF files (with a set of somatic mutations), Patient HLA Types (optional)
Output: Annotated variants, predicted neoantigens, predicted recognition potential, a summary of IHC statistics
  1. Variant annotation (ANNOVAR)

  2. Neoantigen prediction (NetMHCpan 4.0)

  3. Peptide matching

  4. Neoantigen recognition potential

[68]
pVACtools
2020
Source: https://pvactools.readthedocs.io/en/latest/
Description: Computational toolkit allowing identification of altered peptides derived from SNV, INDELs, gene fusions and providing prediction of peptide-MHC binding for MHC class I and class II.
Input: VCF format files, FASTA with peptides
Output: A set of files containing information about predicted epitopes before and after the filtering process supplying information about binding affinity scores and other parameters.
  1. Prediction of neoantigens from somatic alterations (pVACseq and pVACfuse (gene fusions))

  2. Prediction of neoantigens for peptides in a FASTA file

  3. Prioritization and selection (pVACviz with graphics-based interface)

  4. Design of DNA and synthetic long peptide-based vaccines (pVACvector)

[64,149]
ProGeo-neo
2020
Source: https://github.com/kbvstmd/ProGeo-neo
Description: Neoantigen prediction workflow that integrates genomic and mass spectrometry data. It consists of three modules: construction of customized protein sequence database, HLA allele prediction, neoantigen prediction and filtration.
Input: RNA-seq data (FASTQ format), Genomic variants (VCF format), LC-MS/MS data (Raw format).
Output: List of candidate peptides
  1. HLA typing (OptiType)

  2. Identification tumor-specific antigens for NGS data (WES/RNA-seq) (BWA, GATK tools)

  3. MHC binding prediction (NetMHCpan 4.0)

  4. Verifying MHC-peptides using mass spectrometry data (MaxQuant)

  5. Checking potential immunogenicity of T-cell-recognition

[150]
Neoepiscope
2020
Source: https://github.com/pdxgx/neoepiscope
Description: Neoepitope identification pipeline that incorporates germline context and considers variant phasing for SNV and indels. Requires DNA-sequencing data.
Input: Set of somatic and germline mutations (VCF format), BAM files.
Output: TSV file with the information of mutations and neoepitopes
  1. VCF files preprocessing (merging somatic and germline variants)

  2. Haplotype phasing (HapCUT2)

  3. Neoepitope prediction (MHCflurry, MHCnuggets, etc.)

[80]
neoANT-HILL
2020
Source: https://github.com/neoanthill/neoANT-HILL
Description: User-friendly python-based toolkit that combines several pipelines that ensure fully-automated identification of potential neoantigens with a graphical interface. It allows starting from raw NGS data as well as ready-to-use variant calls.
Input: Somatic variants (VCF format) and/or RNA-seq data (raw or aligned)
Output: User-defined generic directory that contains variant calling data, FASTA with WT and MT sequences, predicted HLA types, gene expression estimates, tumor-infiltrating immune cells quantifications.
  1. Expression estimation (kallisto)

  2. Variant discovery (GATK tools)

  3. HLA typing (OptiType)

  4. Tumor-infiltrating immune-cell estimation (quanTIseq)

  5. Variant annotation (snpEff)

  6. MHC binding affinity prediction (IEDB tools, MHCflurry)

[151]
INeo-Epp
2020
Source: http://www.biostatistics.online/INeo-Epp/antigen.php
Description: User-friendly web-tool implementing T-cell HLA class I immunogenicity prediction method based on sequence-related amino acid features utilizing the random forest algorithm.
Input: Candidate peptide sequences (8-12 aa recommended), HLA allotype
Output: Table containing peptides sequences annotated with score, %rank and prediction.
  1. Providing peptide sequences and HLA types.

  2. Annotation of peptides with score metrics.

  3. Selecting immunogenic peptides with a score > 0.5 as recommended.

[152]

* The descriptions of the pipelines presented in the table are based on information provided in associated articles and obtained from the web-based source descriptions that are available on source websites. It is limited by highlighting the main features that distinguished the pipelines from each other. The date of the pipeline appearance is based on the publishing date of the supported article if other information is not provided. The source link is cited as “not available” if the website was not available at the time of writing. The output and input descriptions are presented as described in supporting articles or web-based sources (if available). In cases where a clear description was lacking, these fields were cited as “Not described”. “Workflow and features” field contains information on the main steps that are available within the workflow. The main tools utilized as a part of the described workflows are also provided if they are described in supporting articles or in web-based sources.