Table 1.
Pipeline | Source, Required Input Data and Otput: | Workflow and Features: | Refs. |
---|---|---|---|
EpiToolkit 2015 |
Source: http://www.epitoolkit.de (not available) Description: Web-based pipeline focused on vaccine design. It includes simplified interfaces allowing to combine tools into a workflow. Input: Not described. Output: Interactive presentation of the results as HTML and Internal representation (List of predicted peptides with scores). |
|
[137] |
FRED2 (FRamework for Epitope Detection) 2016 |
Source: https://github.com/FRED-2/Fred2 Description: Computational pipeline for T-cell epitope detection and vaccine design implemented in Python. Can be extended by additional tools. Input: Sequencing reads (FASTA format). Output: Not described. |
|
[138] |
TepiTool 2016 |
Source: http://tools.iedb.org/tepitool/ Description: Web-based user-friendly computational pipeline for T cell epitope prediction hosted by IEDB. It is applicable to human, chimpanzee, cow, gorilla, macaque, mouse and pig. The web-tool associated article contains a step-by-step protocol of analysis with a comprehensive description of each step, recommendations to do, and a description of anticipated results. Input: Protein sequences in single-letter amino acid code (FASTA format), the list of HLA alleles. Output: Tables with peptide sequences with predicted features. |
|
[139] |
Vaxrank 2017 |
Source: https://github.com/openvax/vaxrank Description: Computational framework for selecting neoantigens for vaccine peptides based on tumor mutations, tumor RNA sequencing and HLA type data. It was designed and used in the Personalized Genomic Vaccine Phase I trial (NCT02721043). Input: Tumor mutations (VCF format), tumor RNA-seq (BAM format), patient HLA alleles. Output: Set of vaccine peptides. |
|
[66,67] |
neoantigeneR 2017 |
Source: https://rdrr.io/github/tangshao2016/neoantigenR/ Description: R-based pipelines for neoantigen prediction using raw NGS data. Input: DNA-Seq, RNA-Seq, ExomeSeq (tumor and/or normal) short or long sequence reads (FASTA format), GFF annotation. Output: The list of high-affinity HLA class I binding neoantigen candidates. |
|
[140] |
CloudNeo 2017 |
Source: https://github.com/TheJacksonLaboratory/CloudNeo Description: Cloud-based (implemented on CWL) workflow for neoantigen identification using NGS data. Input: VCF format (list of non-synonymous mutations), BAM format (for HLA typing). Output: HLA binding affinity predictions for all mutated peptides. |
|
[141] |
MuPexi (Mutant peptide extractor and informer) 2017 |
Source: http://www.cbs.dtu.dk/services/MuPeXI/ Description: Web-based tool for neo-epitope identification using somatic mutation calls (SNV, INDELs) and obtaining information about HLA binding affinity, expression level, similarities to self-peptides and mutant allele frequency for each mutated peptide. Supplemented by brief instructions and output format description. Input: Somatic mutation calls (VCF format), list of HLA types, gene expression profile (optional). Output: Table with all tumor-specific peptides derived from substitutions, insertions and deletions with annotation (HLA binding affinity and similarity to normal peptides). |
|
[142] |
TIminer (Tumor Immunology miner) 2017 |
Source: https://icbi.imed.ac.at/software/timiner/timiner.shtml (not available) Description: Computational framework that provides complex immunogenomic analysis including HLA typing, neoantigens prediction, characterization of immune infiltrates and quantification of tumor immunogenicity. Input: RNA-seq reads (FASTQ format), somatic DNA mutations (VCF format). Output: Not described. |
|
[143] |
TSNAD (Tumor-specific neoantigen detector) 2017 |
Source: https://github.com/jiujiezz/tsnad Description: Pipeline with GUI allowing to identify tumor-specific mutant proteins according to GATK best practices. It provides two strategies: 1.Extraction of extracellular mutations from membrane proteins; 2. MHC affinity prediction for class I MHC. Allows us to start from raw NGS data. Input: Pair-ended sequencing data (FASTQ format) from WES. Output: List of somatic mutations with annotations, extracellular mutations of the membrane proteins and the MHC-binding information (TXT format). |
|
[144] |
INTEGRATE-neo 2017 |
Source: https://github.com/ChrisMaherLab/INTEGRATE-Neo Description: The pipeline is focused on the discovery of neoantigens derived from gene fusions. Input: Reads in FASTQ format, the human reference genome in FASTA format, gene models in GenePred format, genes fusion in BEDPE format predicted by INTEGRATE. Output: BEDPE format file. |
|
[88] |
NeoepitopePred 2017 |
Source: https://github.com/stjude/NeoepitopePred Description: Workflow for identification of putative neoepitopes derived from SNV and gene fusions based on WGS data. Input: FASTQ format (PE or SE) or BAM format files, Output: Not described. |
|
[145] |
Neopepsee 2018 |
Source: https://sourceforge.net/projects/neopepsee/ Description: Machine learning-based neoantigen prediction tool for NGS data. Input: Raw RNA-seq data (FASTQ format) and list of somatic mutations (VCF format), clinical HLA typing (if available) Output: mutated peptide sequences and gene expression levels, determination of immunogenic neoantigens. |
|
[65] |
ScanNeo 2019 |
Source: https://github.com/ylab-hi/ScanNeo Description: Computational pipeline for the identification of short and large indels-derived neoantigens utilizing RNA-seq data. ScanNeo consists of independent modules implementing three analysis steps. Input: RNA-seq data in BAM format. Output: Ranked set of neoantigens. |
|
[146] |
DeepHLApan 2019 |
Source: http://biopharm.zju.edu.cn/deephlapan/ Description: Deep learning approach for neoantigen prediction considering both HLA-peptide binding (binding model) and immunogenicity (immunogenicity model) of peptide-HLA complex. Input: CSV format files with head of “Annotation,HLA,peptide”. Only HLA-A,B,C alleles. Output: Binding score (ranges from 0 to 1, the probability that peptide binds with HLA), Immunogenicity score (ranges from 0 to 1; 0.5 is the threshold to select the predicted immunogenic pHLA). |
|
[147] |
pTuneous (prioritizing tumor neoantigens from next-generation sequencing data) 2019 |
Source: https://github.com/bm2-lab/pTuneos Description: In silico tool to predict the immunogenicity of SNV-derived neoepitopes that consider MHC presentation and T-cell recognition ability. It is based on experimentally validated neoantigens. It contains Pre&RecNeo module—learning-based framework allowing to predict and prioritize neoepitopes recognized by T cells and RefinedNeo module—neoepitope scoring schema allowing to evaluate the naturally processed and presented neoepitope immunogenicity Input: PairMatchDNA (WES) mode accept WES and RNA-seq sequencing data (FASTQ format), VCF mode accepts VCF format file with mutation set, expression profile (e.g., obtained by kallisto), copy number profile (e.g., obtained by sequenza). Output: TSV files (snv_neo_model.tsv and indel_neo_models.tsv) containing extracted mutated peptides derived from non-synonimous SNV and INDELs and corresponding immunity score measures. |
WES mode:
|
[148] |
NeoPredPipe 2019 |
Source: https://github.com/MathOnco/NeoPredPipe Description: Pipeline that provides predictions on multi-region sequence data and assessing intra-tumor heterogeneity (IHC) of the antigenic landscape of tumors. Input: Multi- or single region VCF files (with a set of somatic mutations), Patient HLA Types (optional) Output: Annotated variants, predicted neoantigens, predicted recognition potential, a summary of IHC statistics |
|
[68] |
pVACtools 2020 |
Source: https://pvactools.readthedocs.io/en/latest/ Description: Computational toolkit allowing identification of altered peptides derived from SNV, INDELs, gene fusions and providing prediction of peptide-MHC binding for MHC class I and class II. Input: VCF format files, FASTA with peptides Output: A set of files containing information about predicted epitopes before and after the filtering process supplying information about binding affinity scores and other parameters. |
|
[64,149] |
ProGeo-neo 2020 |
Source: https://github.com/kbvstmd/ProGeo-neo Description: Neoantigen prediction workflow that integrates genomic and mass spectrometry data. It consists of three modules: construction of customized protein sequence database, HLA allele prediction, neoantigen prediction and filtration. Input: RNA-seq data (FASTQ format), Genomic variants (VCF format), LC-MS/MS data (Raw format). Output: List of candidate peptides |
|
[150] |
Neoepiscope 2020 |
Source: https://github.com/pdxgx/neoepiscope Description: Neoepitope identification pipeline that incorporates germline context and considers variant phasing for SNV and indels. Requires DNA-sequencing data. Input: Set of somatic and germline mutations (VCF format), BAM files. Output: TSV file with the information of mutations and neoepitopes |
|
[80] |
neoANT-HILL 2020 |
Source: https://github.com/neoanthill/neoANT-HILL Description: User-friendly python-based toolkit that combines several pipelines that ensure fully-automated identification of potential neoantigens with a graphical interface. It allows starting from raw NGS data as well as ready-to-use variant calls. Input: Somatic variants (VCF format) and/or RNA-seq data (raw or aligned) Output: User-defined generic directory that contains variant calling data, FASTA with WT and MT sequences, predicted HLA types, gene expression estimates, tumor-infiltrating immune cells quantifications. |
|
[151] |
INeo-Epp 2020 |
Source: http://www.biostatistics.online/INeo-Epp/antigen.php Description: User-friendly web-tool implementing T-cell HLA class I immunogenicity prediction method based on sequence-related amino acid features utilizing the random forest algorithm. Input: Candidate peptide sequences (8-12 aa recommended), HLA allotype Output: Table containing peptides sequences annotated with score, %rank and prediction. |
|
[152] |
* The descriptions of the pipelines presented in the table are based on information provided in associated articles and obtained from the web-based source descriptions that are available on source websites. It is limited by highlighting the main features that distinguished the pipelines from each other. The date of the pipeline appearance is based on the publishing date of the supported article if other information is not provided. The source link is cited as “not available” if the website was not available at the time of writing. The output and input descriptions are presented as described in supporting articles or web-based sources (if available). In cases where a clear description was lacking, these fields were cited as “Not described”. “Workflow and features” field contains information on the main steps that are available within the workflow. The main tools utilized as a part of the described workflows are also provided if they are described in supporting articles or in web-based sources.