Abstract
Tumor T cell antigens are both diagnostically and therapeutically valuable molecules. A large number of new peptides are examined as potential tumor epitopes each year, yet there is no infrastructure for storing and accessing the results of these experiments. We have retroactively cataloged more than 1000 tumor peptides from 368 different proteins, and implemented a web-accessible infrastructure for storing and accessing these experimental results. All peptides in TANTIGEN are labeled as one of the four categories: (1) peptides measured in vitro to bind the HLA, but not reported to elicit either in vivo or in vitro T cell response, (2) peptides found to bind the HLA and to elicit an in vitro T cell response, (3) peptides shown to elicit in vivo tumor rejection, and (4) peptides processed and naturally presented as defined by physical detection. In addition to T cell response, we also annotate peptides that are naturally processed HLA binders, e.g., peptides eluted from HLA in mass spectrometry studies. TANTIGEN provides a rich data resource for tumor-associated epitope and neoepitope discovery studies and is freely available at http://cvc.dfci.harvard.edu/tantigen/ or http://projects.met-hilab.org/tadb (mirror).
Keywords: Immunotherapy, Neoepitopes, Tumor Antigens, T cell epitope prediction, Cancer vaccine, Bioinformatics
Introduction
Tumor-derived molecules that interact with cells and products of the immune system are known as tumor antigens (TAs). TAs can be subdivided into two main groups: tumor-specific antigens (TSAs), which are exclusively found in tumors and not in normal tissue; and tumor-associated antigens (TAAs), which are overexpressed in tumor cells relative to their healthy counterparts [1]. TAs hold great therapeutic and diagnostic potential [2] and have been extensively studied for decades. In this article, we focus on TAs recognized by T cells of the adaptive immune system, i.e., peptidic epitopes intracellularly processed and presented on the cell surface by human leukocyte antigen (HLA) class I and class II molecules [3]. T cells play a significant role in tumor rejection [4] and their therapeutic potential has been utilized in different forms of cancer immunotherapy [5, 6]. In recent years, the therapeutic focus has shifted away from TAAs to the use of TSAs in the form of T cell neoepitopes—a field in rapid development, with numerous reports of personalized neoepitope targets being published yearly [7].
Potential neoepitopes can be elucidated by mass spectrometry-based physical detection of HLA binding peptides in the tumor tissue [8–10], or by comparative analysis of the tumor and cognate normal tissue genomes to identify nonsynonymous mutations, followed by HLA binding predictions of all peptides resulting from somatic mutations in the tumor tissue [11]. The prediction of potential HLA binders is driven by supervised machine learning methods, such as artificial neural networks, trained on the measured binding affinity of several hundreds of thousand peptides [12], thus underscoring the value of comprehensive peptide data resources. Depending on the tumor mutation load, HLA binding predictions sometimes result in a large number of potential neoepitopes, of which only a fraction is capable of eliciting an in vivo cytotoxic T lymphocyte (CTL) response [13]. Different theories attempt to explain the failure of predicted and even validated HLA binders to elicit immune response. A number of groups have reported different methods for estimating the T cell response against the pool of predicted HLA binders. Methods include ranking based on the difference between the predicted binding affinity of the neopeptide and the native peptide [14], ranking by mRNA expression of the neopeptide harboring protein [15, 16], assessing homology to known pathogens [17], and homology to the host and host microbiome [18].
Broadly predicting which HLA binders will engender CTL response is hampered by an array of complex events governing tumor immune surveillance. Firstly, a myriad of cellular and molecular factors regulates T cell activation, including immune suppression in the tumor microenvironment by regulatory T cells and suppressive molecular immune checkpoints [19]. Secondly, a robust model of the highly specific interaction of the T cell receptor (TCR) with the peptide–HLA complex remains elusive. The latter is even difficult to measure in vitro as the force-based interactions of TCR with peptide–HLA complexes enable T cells capable of recognizing one or several non-self peptide–HLA complexes among 100.000 self-peptide–HLA complexes—conditions that are extremely difficult to mimic in vitro. Thus, we do not yet have an accurate model that explains what makes a tumor-specific peptide–HLA complex interact with TCR of circulating T cells, nor do we have robust experimental methods to estimate in vivo immune response [20].
To support the evaluation of emerging models of these phenomena, TANTIGEN was assembled in 2009, and has since then been the single most comprehensive database of experimentally validated HLA binders from different tumor tissues. TANTIGEN contains more than 1000 peptides labeled based on the method of detection. We have also annotated peptides that are naturally processed HLA binders, i.e. peptides eluted from HLA in mass spectrometry studies. This list of peptides with the experimentally assigned relevant immunological classes provides a comprehensive data resource to benchmark T cell response models, thereby facilitating the solution to one of the main computational challenges in the development of personalized neoepitope-based cancer immunotherapies.
Materials and methods
Data collection
The first build of the database was assembled by compiling the data from a number of smaller (some static) collections of tumor T cell epitopes, such as the Cancer Immunity Peptide Database [21], the “listing of human tumor antigens recognized by T cells” [22], SYFPEITHI [23], and early builds of the IEDB, who no longer curates cancer epitopes [24]. In later builds of TANTIGEN, the main source of data has been semi-automated collection from primary literature using a curation process similar to that employed by the Immune Epitope Database for collecting infectious disease epitopes [25]. Briefly, abstracts are downloaded from PubMed periodically and classified as either relevant or irrelevant based on a set of abstracts that we classified manually in the early stages of the data collection reports [26]. Data from the relevant articles are then manually extracted and organized in XML format.
Data annotation and organization
Peptides are classified into four major categories: (1) peptides measured in vitro to bind the HLA, but reported to not elicit either in vivo or in vitro T cell response, (2) peptides found to bind the HLA and to elicit an in vitro T cell response, (3) peptides shown to elicit in vivo tumor rejection, and (4) peptides processed and naturally presented as defined by physical detection. The peptides in category 4 may be with or without natural response. Peptides discovered from comparative analysis of tumor and normal tissue genomes also contain the sequence of the cognate native peptides. Lastly, the genomic origin of the TAs was classified as either substitution mutation (neoepitopes), alternative open reading frame (ORF), intron encoding, chromosomal translocation, internal tandem repeat, differentiation, overexpressed, or shared tumor specific, as defined previously [27].
Each antigen is represented as an individual entry with relevant information about the source gene from GeneCards [28], National Center for Biotechnology Information (NCBI) Gene and UniGene, information about the source protein from UniProt [29], and known SNPs and mutation from the Catalogue of Somatic Mutations in Cancer (COSMIC) [30], when available. Each entry also contains external links to each of the above databases. The TANTIGEN statistics, including the number of entries is shown in Table 1 and is accessible through the FAQ tab on TANTIGEN homepage.
Table 1.
Genomic origin | Proteins | In vitro T cell epitopes | In vivo T cell epitopes | HLA ligands | MS derived HLA ligands | Total peptides |
---|---|---|---|---|---|---|
Substitution mutation (neoepitopes) | 63 | 44 | 10 | 45 | 8 | 170 |
Alternative ORF | 4 | 10 | 0 | 0 | 0 | 14 |
Intron encoding | 1 | 2 | 0 | 0 | 0 | 3 |
Chromosomal translocation | 10 | 24 | 0 | 15 | 0 | 49 |
Internal tandem repeat | 1 | 1 | 0 | 0 | 0 | 2 |
Differentiation | 16 | 131 | 4 | 8 | 3 | 162 |
Overexpressed | 82 | 223 | 9 | 257 | 13 | 584 |
Shared tumor specific | 35 | 147 | 0 | 2 | 3 | 187 |
Post translationally modified antigens | 76 | 0 | 0 | 0 | 87 | 163 |
Unclassified | 80 | 122 | 1 | 25 | 11 | 239 |
Total | 368 | 704 | 24 | 352 | 125 | 1573 |
Note that some peptides bind multiple HLA alleles
Analysis tools
The discovery of tumor epitopes is increasingly supported by in silico tools [31]. We integrated a selection of bioinformatics tools for data analysis via the TANTIGEN webserver. Sequence similarity searches using BLAST (Basic local alignment search tool) [32] enables users to query the antigen database, and sequence homology can be examined by multiple sequence alignment using MAFFT [33] to compare multiple sequences. On-the-fly HLA binding prediction tools that enable peptide binding prediction to 15 common HLA class I and class II alleles were integrated to facilitate analysis of known immunogenicity in conjunction with predicted HLA binding, as well as predicting of additional potential epitopes. Class I predictions are performed using netMHCpan 2.8a [34] and class II predictions are done using netMHCIIpan 3.0 [35]. These algorithms were chosen based on previous and ongoing benchmarking of the accuracy of online HLA binding prediction servers [36, 37]. TANTIGEN is equipped with a set of visualization tools that enable display the locations of peptides within the proteins that harbor them. For each protein-containing point mutations, an interactive visualization tool displays a map of mutations in the tumor antigen sequences to provide a global view of all mutations reported in a given tumor antigen. For neoepitope entries, the difference in immunogenicity between neoepitopes and native peptides, if the antigens that contain the neoepitopes and the reference antigen are both included in TANTIGEN, HLA binding predictions of the mutated peptides and the reference peptides are shown side by side.
Web server
The TANTIGEN webserver interface was constructed using the KB-builder framework, which streamlines the development and deployment of web-accessible immunological databases [38]. The web interface of TANTIGEN uses a set of graphical user interface written in Perl, PHP, and CGI. It has been tested for functionality in all major browsers and operating systems.
Database maintenance and update
Since its conception in 2009, we have been updating TANTIGEN (both data and features) approximately annually, so at this point we have a mature pipeline in place for database updating [26]. We plan to actively maintain the database and update the data bi-annually.
Results
Using text mining-based article classification, we identified published 429 articles containing tumor peptide data. The current peptide content of TANTIGEN is summarized in Table 1. In addition to each peptide entry, TANTIGEN also contains an entry for each of the 368 proteins in which the peptides are located, containing links to relevant information about mRNA expression (EST profile from UniGene), splice variations (UniProt), known mutations (COSMIC), and predicted HLA binders and T cell epitopes. These peptides, information about their discovery, and visualization within their protein of origin is accessible through the TANTIGEN web page. All data are available for download if local analysis is preferred.
Discussion
TANTIGEN is a comprehensive database of tumor T cell antigens that provides a rich data resource for analysis of tumor antigens and their functional sites. For wet lab scientists, TANTIGEN provides a catalog of previously published work that can be explored before embarking on new studies, as well as a repository of validated epitopes for cross-referencing newly discovered peptides. For bioinformatics studies, TANTIGEN provides a comprehensive training set for prediction of HLA binders that are naturally processed, and construction of models for prediction of T cell response to HLA binders, an approach that is often used to rank predicted HLA binders in personalized neoepitope studies. TANTIGEN is representative of new generation of in silico tools that integrates data, domain knowledge, tailored bioinformatics tools, and simulation of experiments. It is a valuable resource for studies in cancer immunology and immunotherapy, cancer vaccine research, and cancer diagnostics.
Acknowledgements
This work was supported by The Danish Council for Independent Research Grant 4184-00211B (Lars Rønn Olsen), NIH Grant UO1 AI090043 and SU2C-AACR-DT13-14 (Ellis Reinherz) and Dana-Farber Cancer Institute, Cancer Vaccine Center Funds (Guang Lan Zhang and Vladimir Brusic).
Abbreviations
- BLAST
Basic local alignment search tool
- COSMIC
The catalogue of somatic mutations in cancer
- NCBI
National Center for Biotechnology Information
- ORF
Open reading frame
- TAs
Tumor antigens
- TSAs
Tumor-specific antigens
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
Footnotes
Lars Rønn Olsen and Songsak Tongchusak contributed equally to the work.
References
- 1.Boon T, Cerottini JC, van den Eynde B, et al. Tumor antigens recognized by T lymphocytes. Annu Rev Immunol. 1994;12:337–365. doi: 10.1146/annurev.iy.12.040194.002005. [DOI] [PubMed] [Google Scholar]
- 2.Olsen L, Campos B, Winther O, et al. Tumor antigens as proteogenomic biomarkers in invasive ductal carcinomas. BMC Med Genomics. 2014;7(Suppl 3):S2. doi: 10.1186/1755-8794-7-S3-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Blum JS, Wearsch PA, Cresswell P. Pathways of antigen processing. Annu Rev Immunol. 2013;31:443–473. doi: 10.1146/annurev-immunol-032712-095910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Boon T, van der Bruggen P. Human tumor antigens recognized by T lymphocytes. J Exp Med. 1996;183:725–729. doi: 10.1084/jem.183.3.725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Restifo NP, Dudley ME, Rosenberg Sa. Adoptive immunotherapy for cancer: harnessing the T cell response. Nat Rev Immunol. 2012;12:269–281. doi: 10.1038/nri3191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gill S, June CH. Going viral: chimeric antigen receptor T-cell therapy for hematological malignancies. Immunol Rev. 2015;263:68–89. doi: 10.1111/imr.12243. [DOI] [PubMed] [Google Scholar]
- 7.Srivastava PK. Neoepitopes of Cancers: Looking Back, Looking Ahead. Cancer. Immunol Res. 2015;3:969–977. doi: 10.1158/2326-6066.CIR-15-0134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Reinhold B, Keskin DB, Reinherz EL. Molecular Detection of Targeted Major Histocompatibility Complex I-Bound Peptides Using a Probabilistic Measure and Nanospray MS(3) on a Hybrid Quadrupole-Linear Ion Trap. Anal Chem. 2010;82:9090–9099. doi: 10.1021/ac102387t. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Reinherz EL, Keskin DB, Reinhold B. Forward vaccinology: CTL targeting based upon physical detection of HLA-bound peptides. Front Immunol. 2014;5:418. doi: 10.3389/fimmu.2014.00418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Castle JC, Kreiter S, Diekmann J, et al. Exploiting the mutanome for tumor vaccination. Cancer Res. 2012;72:1081–1091. doi: 10.1158/0008-5472.CAN-11-3722. [DOI] [PubMed] [Google Scholar]
- 11.Abelin JG, Keskin DB, Sarkizova S, et al. Mass Spectrometry Profiling of HLA-Associated Peptidomes in Mono-allelic Cells Enables More Accurate Epitope Prediction. Immunity. 2017;46:315–326. doi: 10.1016/j.immuni.2017.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lundegaard C, Lamberth K, Harndahl M, et al. NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8–11. Nucleic Acids Res. 2008;36:W509–W512. doi: 10.1093/nar/gkn202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.van Rooij N, van Buuren MM, Philips D, et al. Tumor exome analysis reveals neoantigen-specific T-cell reactivity in an ipilimumab-responsive melanoma. J Clin Oncol. 2013;31:e439–e442. doi: 10.1200/JCO.2012.47.7521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Duan F, Duitama J, Al Seesi S, et al. Genomic and bioinformatic profiling of mutational neoepitopes reveals new rules to predict anticancer immunogenicity. J Exp Med. 2014;211:2231–2248. doi: 10.1084/jem.20141308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Robbins PF, Lu Y-C, El-Gamil M, et al. Mining exomic sequencing data to identify mutated antigens recognized by adoptively transferred tumor-reactive T cells. Nat Med. 2013;19:747–752. doi: 10.1038/nm.3161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kreiter S, Vormehr M, van de Roemer N, et al. Mutant MHC class II epitopes drive therapeutic immune responses to cancer. Nature. 2015;520:692–696. doi: 10.1038/nature14426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Snyder A, Makarov V, Merghoub T, et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N Engl J Med. 2014;371:2189–2199. doi: 10.1056/NEJMoa1406498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bresciani A, Paul S, Schommer N, et al. T-cell recognition is shaped by epitope sequence conservation in the host proteome and microbiome. Immunology. 2016;148:34–39. doi: 10.1111/imm.12585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zou W. Regulatory T cells, tumour immunity and immunotherapy. Nat Rev Immunol. 2006;6:295–307. doi: 10.1038/nri1806. [DOI] [PubMed] [Google Scholar]
- 20.Reinherz EL. αβ TCR-mediated recognition: relevance to tumor-antigen discovery and cancer immunotherapy. Cancer Immunol Res. 2015;3:305–312. doi: 10.1158/2326-6066.CIR-15-0042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Vigneron N, Stroobant V, Van den Eynde BJ, van der Bruggen P. Database of T cell-defined human tumor antigens: the 2013 update. Cancer Immun Arch. 2013;13:15. [PMC free article] [PubMed] [Google Scholar]
- 22.Novellino L, Castelli C, Parmiani G. A listing of human tumor antigens recognized by T cells: March 2004 update. Cancer Immunol Immunother. 2005;54:187–207. doi: 10.1007/s00262-004-0560-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Rammensee H, Bachmann J, Emmerich NP, et al. SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics. 1999;50:213–219. doi: 10.1007/s002510050595. [DOI] [PubMed] [Google Scholar]
- 24.Vita R, Overton Ja, Greenbaum Ja, et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 2015;43:D405–D412. doi: 10.1093/nar/gku938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Seymour E, Damle R, Sette A, Peters B. Cost sensitive hierarchical document classification to triage PubMed abstracts for manual curation. BMC Bioinform. 2011;12:482. doi: 10.1186/1471-2105-12-482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Olsen L, Johan Kudahl U, Winther O, Brusic V. Literature classification for semi-automated updating of biological knowledgebases. BMC Genom. 2013;14(Suppl 5):S14. doi: 10.1186/1471-2164-14-S5-S14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Van den Eynde BJ, van der Bruggen P. T cell defined tumor antigens. Curr Opin Immunol. 1997;9:684–693. doi: 10.1016/S0952-7915(97)80050-7. [DOI] [PubMed] [Google Scholar]
- 28.Safran M, Dalah I, Alexander J, et al. GeneCards Version 3: the human gene integrator. Database. 2010;2010:baq020. doi: 10.1093/database/baq020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.UniProt Consortium UniProt: a hub for protein information. Nucleic Acids Res. 2015;43:D204–D212. doi: 10.1093/nar/gku989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Forbes SA, Beare D, Gunasekaran P, et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 2015;43:D805–D811. doi: 10.1093/nar/gku1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Olsen LR, Campos B, Barnkob MS, et al. Bioinformatics for cancer immunotherapy target discovery. Cancer Immunol Immunother. 2014;63:1235–1249. doi: 10.1007/s00262-014-1627-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 33.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Nielsen M, Lundegaard C, Blicher T, et al. NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS One. 2007;2:e796. doi: 10.1371/journal.pone.0000796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Nielsen M, Justesen S, Lund O, et al. NetMHCIIpan-2.0—improved pan-specific HLA-DR predictions using a novel concurrent alignment and weight optimization training procedure. Immunome Res. 2010;6:9. doi: 10.1186/1745-7580-6-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zhang GL, Ansari HR, Bradley P, et al. Machine learning competition in immunology—prediction of HLA class I binding peptides. J Immunol Methods. 2011;374:1–4. doi: 10.1016/j.jim.2011.09.010. [DOI] [PubMed] [Google Scholar]
- 37.Trolle T, Metushi IG, Greenbaum JA, et al. Automated benchmarking of peptide-MHC class I binding predictions. Bioinformatics. 2015;31:2174–2181. doi: 10.1093/bioinformatics/btv123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zhang GL, Sun J, Chitkushev L, Brusic V. Big data analytics in immunology: a knowledge-based approach. Biomed Res Int. 2014;2014:437987. doi: 10.1155/2014/437987. [DOI] [PMC free article] [PubMed] [Google Scholar]