Skip to main content
Systems and Synthetic Biology logoLink to Systems and Synthetic Biology
. 2014 Feb 15;8(1):27–39. doi: 10.1007/s11693-014-9135-9

Integrative immunoinformatics for Mycobacterial diseases in R platform

Rupanjali Chaudhuri 1, Deepika Kulshreshtha 1, Muthukurussi Varieth Raghunandanan 1, Srinivasan Ramachandran 1,
PMCID: PMC3933634  PMID: 24592289

Abstract

The sequencing of genomes of the pathogenic Mycobacterial species causing pulmonary and extrapulmonary tuberculosis, leprosy and other atypical mycobacterial infections, offer immense opportunities for discovering new therapeutics and identifying new vaccine candidates. Enhanced RV, which uses additional algorithms to Reverse Vaccinology (RV), has increased potential to reduce likelihood of undesirable features including allergenicity and immune cross reactivity to host. The starting point for MycobacRV database construction includes collection of known vaccine candidates and a set of predicted vaccine candidates identified from the whole genome sequences of 22 mycobacterium species and strains pathogenic to human and one non-pathogenic Mycobacterium tuberculosis H37Ra strain. These predicted vaccine candidates are the adhesins and adhesin-like proteins obtained using SPAAN at Pad > 0.6 and screening for putative extracellular or surface localization characteristics using PSORTb v.3.0 at very stringent cutoff. Subsequently, these protein sequences were analyzed through 21 publicly available algorithms to obtain Orthologs, Paralogs, BetaWrap Motifs, Transmembrane Domains, Signal Peptides, Conserved Domains, and similarity to human proteins, T cell epitopes, B cell epitopes, Discotopes and potential Allergens predictions. The Enhanced RV information was analysed in R platform through scripts following well structured decision trees to derive a set of nonredundant 233 most probable vaccine candidates. Additionally, the degree of conservation of potential epitopes across all orthologs has been obtained with reference to the M. tuberculosis H37Rv strain, the most commonly used strain in M. tuberculosis studies. Utilities for the vaccine candidate search and analysis of epitope conservation across the orthologs with reference to M. tuberculosis H37Rv strain are available in the mycobacrvR package in R platform accessible from the “Download” tab of MycobacRV webserver. MycobacRV an immunoinformatics database of known and predicted mycobacterial vaccine candidates has been developed and is freely available at http://mycobacteriarv.igib.res.in.

Electronic supplementary material

The online version of this article (doi:10.1007/s11693-014-9135-9) contains supplementary material, which is available to authorized users.

Keywords: Mycobacteria, Vaccine, Reverse Vaccinology, Enhanced RV

Introduction

Mycobacterial infections have emerged as a major health problem worldwide (Lienhardt et al. 2012; Mayer and Dukes 2010). Among the mycobacterial infections, Tuberculosis (TB) and Leprosy are ranked as the most dreaded diseases affecting mankind (Stone et al. 2009). In the year 2012, according to World Health Organization, there were about 8.6 million incident cases of TB, 1.3 million deaths from TB including 320,000 deaths from HIV-associated TB (Baddeley et al. 2013). These statistics establish the present scenario of devastating impact of TB. Leprosy also known as Hansen’s disease, caused by Mycobacterium leprae has ravaged humans for years and continues to be severe health problem in many developing countries (Dogra et al. 2013). Mycobacterium ulcerans, a member of Non Tuberculosis Mycobacteria (NTM), causes buruli ulcer and is considered the third most common mycobacterial disease of non-immunocompromised individuals after tuberculosis and leprosy. Among other mycobacterial infections, Mycobacterium avium complex (MAC), a member of NTM, consisting of Mycobacterium avium and Mycobacterium intracellulare is responsible for majority of these infections (Waller et al. 2006). MAC causes life-threatening opportunistic infections in immunosuppressed Acquired Immune Deficiency Syndrome (AIDS) patients. Another member of MAC complex, namely, Mycobacterium avium subsp. paratuberculosis causes Crohn’s disease and ulcerative colitis in humans. In immunocompromised patients, Mycobacterium abscessus can cause chronic lung disease, post-traumatic wound infections, and disseminated cutaneous diseases (Katoch 2004).

Currently the only licensed vaccine against tuberculosis is Mycobacterium bovis Bacille Calmette-Guérin (BCG), but it is known to confer highly variable protection (McShane 2011). Other mycobacterial infections caused by NTM are difficult to treat and do not respond to commonly used antituberculous drugs (Griffith 2010). BCG, though found effective against two other mycobacterial diseases Leprosy and Buruli ulcer, its effect remains skeptical (Nackers et al. 2006; Merle et al. 2010). Therefore new vaccines are needed to combat mycobacterial infections.

In this regard, various attempts have been made, resulting in development of new vaccine candidates, which are in different stages of clinical trials (Kaufmann 2011; Lockwood 2007; Marinova et al. 2013). Twelve potential vaccine candidates against tuberculosis targeting different stages of infection are being tested. Among these, two are canonical vaccines aiming to prevent active tuberculosis, two are therapeutic vaccines aiming to treat immunocompromised patients and the rest are preventive vaccines (Lockwood 2007). A few vaccine candidates against leprosy are also undergoing clinical trials for evaluation of their efficacy (Lockwood 2007).

As genome sequences of many mycobacterial species have become available, Reverse Vaccinology (RV) could be used for rapid vaccine candidate identification (Mora et al. 2003). The RV approach initiates vaccine target prediction by bioinformatics analysis of microbial genome sequences. Compared with the traditional methods, RV provides efficient alternative method for vaccine investigation saving both time and cost (Pizza et al. 2000; Rappuoli 2000; Sette and Rappuoli 2010). RV approach was first applied by Rino Rappuoli group to develop vaccine against serogroup B Neisseria meningitidis (MenB), the major cause of sepsis and meningitis in children and young adults. The initial step was the prediction of sub-cellular location, which aided in the identification of potential vaccine candidates (Pizza et al. 2000; Yu et al. 2010). Subsequently, the RV approach has been applied successfully to Streptococcus pneumonia and Chlamydia pneumonia (Maione et al. 2005; Thorpe et al. 2007) and it has been used for screening vaccine candidates for Bacillus anthracis, Porphyromonas gingivalis and Helicobacter pylori (Ariel et al. 2002; Ross et al. 2001; Chakravarti et al. 2000). Recently, immunoinformatics databases have been developed offering data to facilitate RV approach in designing towards new vaccines for malaria and fungal diseases (Chaudhuri et al. 2008, 2011). These resources use enhancements to the original RV approach by incorporating additional algorithms such as probability of a protein being an adhesin, topology of the protein (transmembrane regions) and similarity of the protein to host (human) proteins for selection of potential vaccine candidates for testing (Vivona et al. 2006, 2008; Sachdeva et al. 2005; Ansari et al. 2008). Some of these considerations were based on the initial experience gained while applying the RV approach at the experimental stage (Pizza et al. 2000).

In this work, we have used the enhanced RV approach to construct MycobacRV immunoinformatics datasets and database with potential vaccine candidates to facilitate rapid vaccine development against Mycobacteria. The database also houses the list of epitopes in the known vaccine candidates currently being tested (Vita et al. 2010) and the list of potential epitopes from the list of predicted adhesins and extracellular/surface localized proteins obtained using various epitope prediction servers. We have also included information on conservation of potential epitopes across orthologs towards facilitating epitope based vaccine development using the R platform approach described previously (Ramachandran et al. 2011). The datasets and the database house rich information and multiple features and provide researchers with a user-friendly interface for ease of navigation both in R platform and through web.

Materials and methods

Rationale

Immunogenicity is an important criterion for vaccine candidate selection. The selected candidates for immunization must elicit sufficiently high and sustained immune response in host. In the RV approach, selection of potential vaccine candidates in the first step is carried out using Bioinformatics analysis of protein sequences encoded in the pathogen genomes (Mora et al. 2003; Pizza et al. 2000; Rappuoli 2000; Sette and Rappuoli 2010). In accordance with this principle, the starting point of MycobacRV database construction was the collection of known vaccine candidates and a set of predicted vaccine candidates. These predicted vaccine candidates are proteins predicted as adhesins and adhesin-like proteins using SPAAN at Pad > 0.6 (Sachdeva et al. 2005) and screening for putative extracellular or surface localization characteristics using PSORTb v.3.0 (Yu et al. 2010). A slightly lower Pad value of 0.6 instead of 0.7 (Sachdeva et al. 2005) was used with SPAAN to include the well known mycobacterial adhesin Heparin Binding Hemagglutinnin (HBHA) across the selected pathogenic mycobacterial genomes (Menozzi et al. 1998) along with other characterized host cell binding proteins such as ESAT-6, Antigen 85A, Antigen 85B and Antigen 85C (Kinhikar et al. 2010; Armitige et al. 2000). The allowance of this slight relaxation was favored to increase the likelihood of mycobacterial adhesins being predicted. However, a stringent screening was set using PSORTb v.3.0 to screen for proteins with “Extracellular” or “Cell Wall” location predictions to facilitate selection of highly probable surface proteins with putative adhesin like characteristics. These protein sequences were subsequently analyzed by various algorithms of enhanced RV.

Homology

The Homology information component includes exhaustive search for orthologs, paralogs, conserved domains and similarity to the host proteins.

  1. Orthologs are genes present in different species that evolved from a common ancestor gene by the event of speciation (Koonin 2005). These genes usually retain the same function during the course of evolution. Ortholog information for a vaccine candidate hints at a similar function in the corresponding orthologous species and perhaps an equivalent immunogenic response from host. This knowledge is useful in the development of broad spectrum vaccines covering a wide range of species. In this work, we have used Reciprocal Best Hits (RBH) method (Altschul et al. 1990; Moreno-Hagelsieb and Latimer 2008) to fetch the orthologs. The RBH principle holds that two genes from different genomes are orthologous if they find each other as the best hit in BLAST search in the other genome. The BLASTP runs were carried out and results were screened at a maximum E-value threshold of 1 × 10−6, including Smith-Water algorithm and Soft-filtering (Altschul et al. 1990; Moreno-Hagelsieb and Latimer 2008).

  2. Paralogs are genes evolved through the event of gene duplication within a genome (Koonin 2005). In contrast to orthologs, paralogs evolve new functions, though they evolved from a common ancestor gene. Paralog information of the vaccine candidates in the same species illuminates on the total repertoire of related vaccine candidate genes of a given family. This information was obtained using BLASTCLUST run at a similarity threshold of 0.8 and minimum length coverage of 0.95 on the individual genomes of the selected mycobacterial species and strains (Kondrashov et al. 2002).

  3. Domains are conserved autonomously folding, functional unit of a protein (Marchler-Bauer et al. 2005). Conserved domain data of vaccine candidates provides information on functional domains. This information where available, is useful along with other complementary data. Conserved Domain Database Search (CDD) of National Center for Biotechnology Information (NCBI) was used to obtain the conserved domain information (Marchler-Bauer et al. 2005).

  4. An ideal vaccine candidate is desirable not to have any observable similarity to human proteins to avoid generation of potential auto immune response. An initial assessment of this feature can help in the avoidance of expensive dead-ends where a vaccine candidate having studied extensively is discovered to be toxic to the host. In this work, we have assessed the similarity of the individual candidate proteins to the human reference proteins RefSeq Release 54 by performing BLASTP using a maximum E-value threshold of 0.01, which borders on the limits of threshold similarity (Altschul et al. 1990). Setting this threshold will likely avoid collecting even remotely similar proteins in the database.

Motif and topology

  1. Betawrap motifs are right-handed parallel beta-helix supersecondary structural motifs present in some bacterial and fungal protein sequences such as toxins, virulence factors and adhesins. These motifs are present in virulence factors of various pathogens (Bradley et al. 2001). Therefore the presence of these motifs in a vaccine candidate value adds to their probable role in virulence. This information will be useful when prioritizing vaccine candidates. Betawrap predictions were obtained for the selected candidate proteins using the BetaWrap server based on three-dimensional dynamic profile method which generates interstrand pairwise correlations from a processive sequence wrap (Bradley et al. 2001).

  2. For topology we predicted the transmembrane domains of the candidate proteins. Transmembrane domains are the regions of membrane proteins, which traverse in and out, looping through the membrane. It has been observed that proteins with multiple transmembrane domains are generally difficult to express and purify (Vivona et al. 2006). Therefore this information also facilitates in prioritizing for vaccine candidates. We used TMHMM Server v. 2.0 to predict transmembrane helices (Krogh et al. 2001).

Subcellular location

  1. Subcellular location defines the putative location of the protein in the cell. This information forms important criteria for vaccine candidate selection because of the established fact that extracellular or cell surface located proteins are accessible to antibodies and the components of the immune system and hence could be useful. We used subcellular localization prediction server PSORTb v.3.0 for this purpose (Yu et al. 2010).

  2. Additionally, signal peptides were also predicted using SignalP 3.0 server (Bendtsen et al. 2004). Signal Peptide is a short stretch of sequence present at the N-terminus of the protein directing it to the secretory pathway. Membrane proteins destined for secretion are targeted to the appropriate intracellular membrane by their signal peptide (Rehm et al. 2001). Hence an assessment of presence of signal peptide in vaccine candidate would provide additional claim for extracellular location of the protein.

Immunoinformatics

Immunoinformatics deals with applying bioinformatics principles and tools to the molecular activities of the immune system. The focus of immunoinformatics has been to enable identification of antigens or epitopes capable of eliciting immune response. Immunoinformatics provides databases and predictive tools, which are used in discovering novel vaccines (Vivona et al. 2008). Epitope, also known as ‘antigenic determinant’ is a surface localized part of antigen capable of eliciting an immune response (Vivona et al. 2008).

We used various epitope prediction algorithms for prediction of B cell epitopes, discotopes (discontinuous B cell epitopes) and T cell (MHC Class I epitope and MHC Class II epitopes). The prediction algorithms are based on different computational approaches, each having an associated success rate of prediction. We therefore used multiple algorithms to fetch epitope predictions, to obtain enriched information.

  1. Linear B cell epitopes: Linear epitope constitutes a single continuous stretch of amino acids within a protein sequence antigen recognized by soluble or membrane bound antibodies (Vivona et al. 2008). Among the algorithms used for linear B cell epitope prediction, ABCPred is based on artificial neural networks and BcePred uses physico-chemical properties for epitope prediction (Kolaskar and Tongaonkar 1990; Saha and Raghava 2006a, b, 2007).

  2. Discontinuous B cell Epitopes: Epitopes whose residues are distantly placed in the sequence brought together by physico-chemical folding, recognized by soluble or membrane bound antibodies, constitute discontinuous epitopes (Vivona et al. 2008). Discontinuous epitopes were predicted using Discotope 1.2, CEP and BEPro servers based on available crystal structures of antigens (Andersen et al. 2006; Kulkarni-Kale et al. 2005; Sweredoski and Baldi 2008).

  3. The immune response against M. tuberculosis infection is mainly cell mediated response with involvement of MHC Class I and MHC Class II molecules (Kaufmann 2002).

    MHC Class I T cell epitopes: These are short regions presented on the surface of an antigen-presenting cell, where they are bound to MHC Class I molecules. Among the algorithms used to predict T cell epitopes belonging to MHC Class I, NetMHC 3.0 uses artificial neural networks (ANNs) and weight matrices, Bimas is based on a predicted half-time of dissociation to HLA class I molecules, ARB (Average Relative Binding Method) of Immune Epitope Database (IEDB) uses half maximal inhibitory concentration calculation and IEDB-consensus Method combines NetMHC, Stabilized matrix method (SMM), Scoring Matrices derived from Combinatorial Peptide Libraries (CombLib) algorithms to predict epitopes (Zhang et al. 2008; Bui et al. 2005; Wang et al. 2010; Moutaftsi et al. 2006; Parker et al. 1994; Lundegaard et al. 2008).

  4. MHC Class II T cell epitopes: These are short regions presented on the surface of an antigen-presenting cell, where they are bound to MHC Class II molecules. T cell epitopes belonging to MHC Class II were predicted using Propred, which uses quantitative matrices derived from published literature for epitope prediction, IEDB-ARB (Average Relative Binding Method) uses half maximal inhibitory concentration calculations and IEDB-consensus Method combines NN-align, SMM-align, and CombLib algorithms to predict epitopes (Zhang et al. 2008; Bui et al. 2005; Wang et al. 2010; Moutaftsi et al. 2006; Singh and Raghava 2001).

  5. Allergens: We also fetched potential allergen information, as it is desirable for a vaccine candidate to be non-allergic in a general sense. For this purpose Algpred, an allergen prediction algorithm, was used with combined approach. This combined approach included finding similarity to known allergic epitopes, searching Multiple EM for Motif Elicitation (MEME)/Motif Alignment and Search Tool (MAST) allergen motifs using MAST, search based on SVM modules and BLAST search against 2890 allergen-representative peptides obtained from Bjorklund et al. 2005 (Saha and Raghava 2006a, 2006b). Additionally, Allermatch, an allergen prediction algorithm based on “Codex alimentarius and FAO/WHO Expert consultation on allergenicity of foods derived through modern biotechnology” was used (Fiers et al. 2004).

Data layout

The whole proteome sequences of 22 selected pathogenic mycobacterial strains and species and a non-pathogenic Mycobacterium tuberculosis H37Ra strain, were sourced from various databases (Table 1) (Cooper et al. 2010; McCarthy 2005; Ioerger et al. 2009). The 742 adhesin and adhesin like protein sequences from 23 strains and species of the selected mycobacteria were analyzed with 20 algorithms of enhanced RV listed in Table 2. The data emerging from these algorithms were structured into “First Layer” and “Second Layer”. “Motif and topology”, “Subcellular location” and “Homology” data were organized into “First Layer” and “Immunoinformatics” data (epitopes and allergens) were organized into “Second Layer”. This strategy was adopted to limit exhaustive epitope analysis to only selected proteins by users. Also the experimentally known epitopes of these proteins were characterized (Vita et al. 2010) and arranged into “Second Layer”. Researchers can interrogate with optimal selection criteria to screen for potential vaccine candidates and the list of conserved epitopes. As an example to this selection process we show suitable decision criteria with the help of well structured decision trees to select a set of most probable vaccine candidates. The implementation process is summarized below.

Table 1.

Summary of the human pathogenic mycobacterial proteomes analyzed

Species Source Reference Adhesin and adhesin like proteins1 Most Probable/Top Vaccine Candidates2
Mycobacterium abscessus ATCC 19977 NCBI Cooper et al. 2010 39 29
Mycobacterium avium 104 NCBI Cooper et al. 2010 30 27
Mycobacterium avium subsp. paratuberculosis K-10 NCBI Cooper et al. 2010 34 28
Mycobacterium bovis BCG str. Pasteur 1173P2 NCBI Cooper et al. 2010 37 19
Mycobacterium bovis BCG str. Tokyo 172 NCBI Cooper et al. 2010 37 19
Mycobacterium bovis AF2122/97 NCBI Cooper et al. 2010 40 15
Mycobacterium intracellulare ATCC 13950 NCBI Cooper et al. 2010 32 30
Mycobacterium leprae TN NCBI Cooper et al. 2010 6 4
Mycobacterium leprae Br4923 NCBI Cooper et al. 2010 7 5
Mycobacterium ulcerans Agy99 NCBI Cooper et al. 2010 36 26
Mycobacterium tuberculosis CDC1551 NCBI Cooper et al. 2010 35 18
Mycobacterium tuberculosis F11 NCBI Cooper et al. 2010 40 19
Mycobacterium tuberculosis H37Rv NCBI Cooper et al. 2010 42 19
Mycobacterium tuberculosis H37Ra NCBI Cooper et al. 2010 42 20
Mycobacterium tuberculosis KZN 1435 NCBI Cooper et al. 2010 43 20
Mycobacterium tuberculosis 98-R604 Broad Institute McCarthy 2005 31 19
Mycobacterium tuberculosis C Broad Institute McCarthy 2005 19 10
Mycobacterium tuberculosis Haarlem Broad Institute McCarthy 2005 33 17
Mycobacterium tuberculosis KZN 4207 Broad Institute McCarthy 2005 37 17
Mycobacterium tuberculosis KZN 605 Broad Institute McCarthy 2005 37 16
Mycobacterium tuberculosis W-148 Broad Institute McCarthy 2005 31 19
Mycobacterium tuberculosis KZN R506 Supporting Information (Table S4 and Table S5) Ioerger et al. 2009 34 17
Mycobacterium tuberculosis KZN V2475 Supporting Information (Table S4 and Table S5) Ioerger et al. 2009 20 13

1Predicted Adhesin and adhesin like proteins having extracellular and surface localized characteristics using SPAAN at Pad > 0.6 and Psortb

2Most Probable Vaccine Candidates obtained following decision trees. Scripts were run for each selected mycobacterial species and strain individually

Table 2.

Algorithms used to analyze predicted adhesins with extracellular and surface localized characteristics for Immunoinformatics

Algorithm Principle Parameters Used Reference
1. BLASTCLUST Clusters protein or DNA sequences based on pairwise matches found using the BLAST algorithm in case of proteins or Mega BLAST algorithm for DNA. Paralog identification: Kondrashov et al. (2002)
S = 0.8 and L = 0.95
Nonredundant Sequence Set
1 S = 100, L = 1, b = T
2. BetaWrap Predicts the right-handed parallel beta-helix supersecondary structural motif in primary amino acid sequences by using beta-strand interactions learned from non-beta-helix structures. NA Bradley et al. 2001
3. Antigenic Predicts potentially antigenic regions of a protein sequence, based on occurrence frequencies of amino acid residue types in known epitopes Default parameters used: Kolaskar et al. (1990)
Minimum length of antigenic region- 6
4. SignalP 3.0 Predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks and hidden Markov models Organism group- Eukaryotes Bendtsen et al. (2004)
Output format- Short (no graphics)
Method- Input sequences may include Transmembrane regions
5. TMHMM Server v. 2.0 Predicts the transmembrane helices in proteins based on Hidden Markov Model Output format- One line per protein Krogh et al. 2001
6. Conserved Domain Database and Search Service, v2.22 The Database is a collection of multiple sequence alignments for ancient domains and full-length proteins. It is used to identify the conserved domains present in a protein query sequence Default parameters used: Marchler-Bauer et al. 2005
Search against database = CDD v3.10 - 44354 PSSMs
Expect Value threshold = 0.01
Apply low-complexity filter
Maximum number of hits = 500
Result mode = Concise
7. BlastP It uses the BLAST algorithm to compare an amino acid query sequence against a protein sequence database Ortholog Identification using2RBH method Altschul et al. 1990; Moreno et al. 2008
2E = 1 × 10−6, F = “m S”and s = T
Similarity to human proteins
3E = 0.01 and F = “F”
8. ABCPred Predict B cell epitope(s) in an antigen sequence, using artificial neural network. Immunoinformatics Database: Saha and Raghava (2006a, 2006b)
Threshold = 0.51 (default)
9. BcePred Predicts linear B-cell epitopes, using physico-chemical properties. Default parameters used: Saha and Raghava (2007)
Hydrophilicity = 2
Flexibility = 1.9
Accessibility = 2
Turns = 1.9
Exposed Surface = 2.4
Polarity = 2.3
Antegenic Propensity = 1.8
Combined = 1.9
10. Discotope 1.2 Predicts discontinuous B cell epitopes from protein three dimensional structures utilizing calculation of surface accessibility (estimated in terms of contact numbers) and a novel epitope propensity amino acid score. Threshold for epitope identification = -3.7 (default) Andersen et al. (2006)
11. CEP Predicts discontinuous B cell epitopes of protein antigens with known structures. It uses accessibility of residues and spatial distance cut-off to predict antigenic determinants (ADs), conformational epitopes (CEs) and sequential epitopes (SEs). NA Kulkarni-Kale et al. (2005)
12. BEPro BEPro, uses a combination of amino-acid propensity scores and half sphere exposure values at multiple distances to achieve state-of-the-art performance.  NA Sweredoski and Baldi (2008)
13. Propred Predicts MHC Class-II binding regions in an antigen sequence, using quantitative matrices derived from published literature. It assists in locating promiscous binding regions that are useful in selecting vaccine candidates. Immunoinformatics Database: Singh and Raghava (2001)
Threshold (%) = 3 (default)
14. IEDB-ARB (Average Relative Binding Method) Predicts IC(50) values allowing combination of searches involving different peptide sizes and alleles into a single global prediction Immunoinformatics Database: Zhang et al. (2008), Bui et al. (2005)
Threshold <= 500 nM
15. IEDB-consensus Method Predicts MHC Class-I binding regions by combining NetMHC, SMM, and CombLib. Predicts MHC Class-I binding regions by combining NN-align, SMM-align, and CombLib Immunoinformatics Database: Zhang et al. (2008), Wang et al. (2010), Moutaftsi et al. (2006)
All predicted epitopes selected
16. Bimas Ranks potential 8-mer, 9-mer, or 10-mer peptides based on a predicted half-time of dissociation to HLA class I molecules. The analysis is based on coefficient tables deduced from the published literature by Dr. Kenneth Parker, Children’s Hospital Boston. Immunoinformatics Database: Parker et al. (1994)
Predicted T(½) > = 50 min
17. NetMHC 3.0 Predicts binding of peptides to a number of different HLA alleles using artificial neural networks (ANNs) and weight matrices. Immunoinformatics Database: Lundegaard et al. (2008)
Strong Binders (SB) and Weak Binders (WB) selected
18. AlgPred Predicts allergens in query protein based on similarity to known epitopes, searching MEME/MAST allergen motifs using MAST and assign a protein allergen if it have any motif, search based on SVM modules and search with BLAST search against 2890 allergen-representative peptides obtained from Bjorklund et al. 2005 and assign a protein allergen if it has a BLAST hit. Hybrid Approach (SVMc + IgE epitope + ARPs BLAST + MAST) selected Saha and Raghava (2006a, b)
19. Allermatch Predicts the potential allergenicity of proteins by bioinformatics approaches as recommended by the Codex alimentarius and FAO/WHO Expert consultation on allergenicity of foods derived through modern biotechnology. CutOff = 35 and Wordlength = 6 Fiers et al. (2004)

1 ‘S’ refers to similarity threshold, ‘L’ to minimum length coverage, ‘b’ to both

2 ‘RBH’ refers to Reciprocal Best HitsE’ refers to Expect Value threshold, -F “m S” -s T options refer to no masking of low-information sequences during the alignment phase, with Smith–Waterman alignment

3 ‘E’ refers to Expect Value threshold and ‘F’ refers to filter option

Potential vaccine candidates identification (top candidates using enhanced RV)

This list of most probable vaccine candidates can be accessed through the “Probable Vaccine Candidate” checkbox in the “Vaccine Candidate Search” tab of MycobacRV Database.

The decision trees describing these processes are presented in Figs. 1, 2 and 3. All analysis following the decision trees can be carried out through scripts in R in object oriented mode. R is a programming language integrated with an R environment, facilitating easy and rapid data analysis with the help of its integrated suite of software facilities (R Core Team 2013). We have developed a package mycobacrvR containing utilities for the vaccine candidate search using various criteria such as SPAAN score, localization of protein, number of transmembrane helix, human reference hits and allergen property as well as for the epitope conservation study. A list of all the function of this package is available in supplementary Table 1.

Fig. 1.

Fig. 1

Decision tree to identify non-allergen proteins fulfilling all first layer conditions. The ‘union’ operator was used for combining data and the ‘intersect’ operator was used for extracting common elements. For example, the ‘union’ operator was used to combine all the non-allergens predicted by allergen prediction algorithms and the ‘intersect’ operator was used to acquire the common candidates of all non-allergens fulfilling “First Layer” conditions

Fig. 2.

Fig. 2

Decision tree to identify proteins having both B cell and T cell epitopes. The set of proteins possessing both B cell and T cell epitopes were obtained using ‘intersect’ operator as shown

Fig. 3.

Fig. 3

The final set of most probable vaccine candidates. This set was obtained by intersecting the results of the two decision trees described in Figs. 1 and 2. These candidates meet the criteria of being non-allergic, having less than two transmembrane helices, no similarity to human proteins and having both B cell and T cell epitopes from 23 strains and species of the selected mycobacteria

Case 1: First layer data and allergen prediction

Through this process we obtained a set of 426 protein sequences as most probable adhesin vaccine candidates meeting the criteria set in the scripts. A stringent criterion (S = 100, L = 1, b = T, where ‘S’ refers to similarity threshold, ‘L’ to minimum length coverage, ‘b’ to both) specified in the BLASTCLUST computer program (Cooper et al. 2010) was used to identify redundancy in the set of 426 protein sequences, thereby providing a non-redundant set of 233 most probable adhesin vaccine candidates. The non-redundant set of most probable vaccine candidates along with the example R scripts used for analysis can be obtained from the “Download” tab of the webserver. The decision criteria applied can be modified by researchers and implemented suitably by modifying the R scripts. Flow chart describing first layer data filtration and allergen prediction algorithm of mycobacrvR functions is presented in supplementary Table 1.

The “First Layer” data of 22 selected human mycobacterial pathogens was used to filter protein sequence candidates with less than two transmembrane helices and having no similarity to human reference proteins (Vivona et al. 2008). Thereafter the non-allergen candidates fulfilling “First Layer” conditions (having less than two transmembrane helices and having no similarity to human reference proteins) were selected. These candidates were further analysed for the presence of B cell epitopes and T cell epitopes. The candidates possessing both B cell and T cell epitopes were further selected.

The filtered candidates having both B cell and T cell epitopes, predicted non-allergens and fulfilling other “First Layer” criteria formed the final set of most probable vaccine candidates.

Case 2: Epitope conservation study

Mycobacterium tuberculosis H37Rv strain was chosen as reference for the epitope conservation study. This analysis was carried out using R scripts. For each of the vaccine candidate (adhesin and adhesin like proteins) from M.tuberculosis H37Rv strain, the predicted B cell and T cell epitopes were analyzed for epitope conservation. Identity score was measured as:

Number of Occurrences of the potential epitopes across all orthologs of the protein/Total number of orthologs of the protein.

Exact match approach was used for epitope conservation study. This was done so as to provide users with accurate conservation ratio for epitopes based on exact epitope sequence matches. This information was organized into MycobacRV database and can be accessed through the “Epitope Conservation Data” tab. The detailed description showing ortholog profile for presence or absence of the query epitopes in the selected mycobacterial species have also been provided. The epitope conservation data across species would aid in broad spectrum, epitope based rational vaccine development studies. The flow chart below describes the algorithm for epitope conservation across orthologs for the filtered ginumbers of a species with reference to Mycobacterium tuberculosis H37Rv strain using the epitope prediction data from B cell or T cell epitope prediction servers (e.g. ABCPred, Bcepred, Propred, NetMHC, IEDB server). The Flow chart of mycobacrvR function for epitope conservation study is presented in supplementary Table 2.

Database architecture

Database design

The GI number identification tags assigned to proteins were used as primary keys. The database was developed using MySQL version 4.1.20 at back end and operated in Red Hat Enterprise Linux ES release 4. The web interfaces have been developed in HTML and PHP 5.1.4, which dynamically execute the MySQL queries to fetch the stored data and is run through Apache2 server. The overall layout of MycobacRV is shown in Fig. 4.

Fig. 4.

Fig. 4

Tetrapodic Layout of MycobacRV. The primary keys were the ginumbers of the proteins. In Homology we include -orthologs, paralogs, absence of similarity against Human proteins, Motif and Topology consists of beta helix supersecondary structural motifs, conserved domains, transmembrane topologies, Subcellular location consists of signal peptides, subcellular localization prediction, Immunoinformatics consists of predicted antigenic regions, epitopes and potential non-allergens. This data was further analysed through decision tree. Also epitope conservation studies were made for conservation of epitopes across orthologs

Database access and interface

The tabs provided in MycobacRV web-server are- “Home”, “Vaccine Candidate Search”, “Advanced Search”, “Epitope Conservation Data”, “Known Vaccines”, “Download”, “Help” and “Contact”. The “Vaccine Candidate Search” tab provides complete data for 742 predicted vaccine candidates, organized into “First Layer” and “Second Layer” (Fig. 5). The data for most probable vaccine candidate for a selected species can be fetched by checking the checkbox provided and then clicking the submit button. The “Advanced Search” tab provides user with facility to filter data on the basis of Protein length, number of transmembrane spanning regions, presence or absence of betawraps, paralogs, orthologs, conserved domains, similarity to Human Reference proteins (retrieved from NCBI through ftp on January 22, 2013). The “First Layer” data filter criteria can also be exercised here. The “Epitope Conservation Data” tab provides users with the epitope conservation data analysis for Mycobacterium tuberculosis H37Rv strain. The “Known Vaccines” tab takes the user to the page containing the list of known vaccine candidates provided in tabular form along with the cited references. The “Download” tab of the webserver provides the non-redundant set of most probable vaccine candidates along with Rdata and the example R scripts used for analysis. Also the utilities for the vaccine candidate search and analysis of epitope conservation across the orthologs with reference to M. tuberculosis H37Rv strain available in the mycobacrvR package in R platform is available for download. Results obtained using any of the operations can be exported by users into text files.

Fig. 5.

Fig. 5

‘Immunoinformatics Data’ search tab of MycobacRV webserver. Searches can be performed using multiple options by clicking the checkboxes. The backend data consists of immunoinformatics data on 742 adhesin and adhesin like proteins with extracellular and surface localized characteristics. The non-redundant set of most probable vaccine candidates along with the example R scripts used for analysis can be obtained from the “Download” tab of the webserver

Results

MycobacRV provides comprehensive analysis data for 742 predicted and known adhesin and adhesin like proteins from 23 strains and species of Mycobacteria. Analysis of enhanced RV data through decision trees provided a list of 233 non-redundant set of most probable vaccine candidates from 23 strains and species of Mycobacteria. Recent trends of vaccinologists aim for epitope based vaccines (Patronov and Doytchinova 2013; Khan et al. 2007). Towards facilitating these efforts, we analyzed the information on epitopes in terms of their conservation among orthologs (Khan et al. 2007). The epitope conservation data for epitopes and its associated information including other molecular features of the protein provides facility for enablement of epitope based vaccine design.

The predicted vaccine candidates mainly include PE family, PPE family, PE-PGRS family, Mpt family, Cfp2, pstS2, ESAT-6, HBHA, Antigen 85A, Antigen 85B, Antigen 85C and hypothetical proteins. Some of these proteins (Antigen 85A, Antigen 85B, Antigen 85C and ESAT-6) are undergoing investigations for new vaccine development (Kaufmann 2011). These results show that the approach adopted by us in preparing MycobacRV will be useful for future developments in developing new vaccines for mycobacterial infections in general and tuberculosis in particular.

The development of the mycobacrvR package serves as a model for developing data on other pathogens. Because this is prepared in the open source mode, this allows further development in future by other users and developers towards expansion in order to rapidly facilititate the goal of epitope based vaccines.

Electronic supplementary material

Acknowledgments

SR thanks grants (BSC0121) from Council of Scientific and Industrial Research (CSIR). RC thanks The Indian Council of Medical Research for fellowship. Funding for IT infrastructure through CSIR-Institute of Genomics and Integrative Biology resources is acknowledged.

Conflict of interest

The authors declare that they have no competing interests.

Footnotes

MycobacRV can be accessed at http://mycobacteriarv.igib.res.in. It is best viewed with Explorer 8.0 or later and Mozilla firefox version 3.0 or later.

Contributor Information

Rupanjali Chaudhuri, Email: rupanjali.bhu@gmail.com.

Deepika Kulshreshtha, Email: deepikakul12@gmail.com.

Muthukurussi Varieth Raghunandanan, Email: raghu@igib.res.in.

Srinivasan Ramachandran, Phone: +91-11-27666156, FAX: +91-11-27667471, Email: ramuigib@gmail.com.

References

  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  2. Andersen PH, Nielsen M, Lund O. Prediction of residues in discontinuous B cell epitopes using protein 3D structures. Protein Sci. 2006;15:2358–2367. doi: 10.1110/ps.062405906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Ansari FA, Kumar N, Bala Subramanyam M, Gnanamani M, Ramachandran S. MAAP: malarial adhesins and adhesin-like proteins predictor. Proteins. 2008;70:659–666. doi: 10.1002/prot.21568. [DOI] [PubMed] [Google Scholar]
  4. Ariel N, Zvi A, Grosfeld H, Gat O, Inbar Y, Velan B, Cohen S, Shafferman A. Search for potential vaccine candidate open reading frames in the Bacillus anthracis virulence plasmid pXO1: in silico and in vitro screening. Infect Immun. 2002;70:6817–6827. doi: 10.1128/IAI.70.12.6817-6827.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Armitige LY, Jagannath C, Wanger AR, Norris SJ. Disruption of the genes encoding antigen 85A and antigen 85B of Mycobacterium tuberculosis H37Rv: effect on growth in culture and in macrophages. Infect Immun. 2000;68:767–778. doi: 10.1128/IAI.68.2.767-778.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Baddeley A, Dean A, Dias HM, Falzon D et al (2013) World Health Organization Global Tuberculosis Report. http://www.who.int/tb/publications/global_report/en/index.html. Accessed 1 November 2013
  7. Bendtsen JD, Nielsen H, von Heijne G, Brunak S. Improved prediction of signal peptides: SignalP 30. J Mol Biol. 2004;340:783–795. doi: 10.1016/j.jmb.2004.05.028. [DOI] [PubMed] [Google Scholar]
  8. Bradley P, Cowen L, Menke M, King J, Berger B. BETAWRAP: successful prediction of parallel beta-helices from primary sequence reveals an association with many microbial pathogens. Proc Natl Acad Sci USA. 2001;98:14819–14824. doi: 10.1073/pnas.251267298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bui HH, Sidney J, Peters B, Sathiamurthy M, Sinichi A, Purton KA, Mothé BR, Chisari FV, Watkins DI, Sette A. Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics. 2005;57:304–314. doi: 10.1007/s00251-005-0798-y. [DOI] [PubMed] [Google Scholar]
  10. Chakravarti DN, Fiske MJ, Fletcher LD, Zagursky RJ. Application of genomics and proteomics for identification of bacterial gene products as potential vaccine candidates. Vaccine. 2000;19:601–612. doi: 10.1016/S0264-410X(00)00256-5. [DOI] [PubMed] [Google Scholar]
  11. Chaudhuri R, Ahmed S, Ansari FA, Singh HV, Ramachandran S. MalVac: database of malarial vaccine candidates. Malar J. 2008;7:184. doi: 10.1186/1475-2875-7-184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Chaudhuri R, Ansari FA, Raghunandanan MV, Ramachandran S. FungalRV: adhesin prediction and immunoinformatics portal for human fungal pathogens. BMC Genom. 2011;12:192. doi: 10.1186/1471-2164-12-192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Dogra S, Narang T, Kumar B. Leprosy–evolution of the path to eradication. Indian J Med Res. 2013;137:15–35. [PMC free article] [PubMed] [Google Scholar] [Retracted]
  14. Fiers MW, Kleter GA, Nijland H, Peijnenburg AA, Nap JP, Van RC. Allermatch, a webtool for the prediction of potential allergenicity according to current FAO/WHO Codex alimentarius guidelines. BMC Bioinform. 2004;5:133. doi: 10.1186/1471-2105-5-133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Griffith DE. Nontuberculous mycobacterial lung disease. Curr Opin Infect Dis. 2010;23:185–190. doi: 10.1097/QCO.0b013e328336ead6. [DOI] [PubMed] [Google Scholar]
  16. Ioerger TR, Koo S, No EG, Chen X, Larsen MH, Jacobs WR, Jr, Pillay M, Sturm AW, Sacchettini JC. Genome analysis of multi- and extensively-drug-resistant tuberculosis from KwaZulu-Natal, South Africa. PLoS One. 2009;4:e7778. doi: 10.1371/journal.pone.0007778. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Katoch VM. Infections due to non-tuberculous mycobacteria (NTM) Indian J Med Res. 2004;120:290–304. [PubMed] [Google Scholar]
  18. Kaufmann SH. Protection against tuberculosis: cytokines, T cells, and macrophages. Ann Rheum Dis. 2002;61(Suppl 2):ii54–ii58. doi: 10.1136/ard.61.suppl_2.ii54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Kaufmann SH. Fact and fiction in tuberculosis vaccine research: 10 years later. Lancet Infect Dis. 2011;11:633–640. doi: 10.1016/S1473-3099(11)70146-3. [DOI] [PubMed] [Google Scholar]
  20. Khan AM, Miotto O, Heiny AT, Salmon J, Srinivasan KN, Nascimento EJ, Marques ET, Jr, Brusic V, Tan TW, August JT. A systematic bioinformatics approach for selection of epitope-based vaccine targets. Cell Immunol. 2007;244:141–147. doi: 10.1016/j.cellimm.2007.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kinhikar AG, Verma I, Chandra D, Singh KK, Weldingh K, Andersen P, Hsu T, Jacobs WR, Jr, Laal S. Potential role for ESAT6 in dissemination of M tuberculosis via human lung epithelial cells. Mol Microbiol. 2010;75:92–106. doi: 10.1111/j.1365-2958.2009.06959.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kolaskar AS, Tongaonkar PC. A semi-empirical method for prediction of antigenic determinants on protein antigens. FEBS Lett. 1990;276:172–174. doi: 10.1016/0014-5793(90)80535-Q. [DOI] [PubMed] [Google Scholar]
  23. Kondrashov FA, Rogozin IB, Wolf YI and Koonin EV (2002) Selection in the evolution of gene duplications. Genome Biol 3:RESEARCH0008 [DOI] [PMC free article] [PubMed]
  24. Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet. 2005;39:309–338. doi: 10.1146/annurev.genet.39.073003.114725. [DOI] [PubMed] [Google Scholar]
  25. Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001;305:567–580. doi: 10.1006/jmbi.2000.4315. [DOI] [PubMed] [Google Scholar]
  26. Kulkarni-Kale U, Bhosle S and Kolaskar AS (2005) CEP: a conformational epitope prediction server. Nucleic Acids Res 33(Web Server issue):W168–W171 [DOI] [PMC free article] [PubMed]
  27. Lienhardt C, Glaziou P, Uplekar M, Lönnroth K, Getahun H, Raviglione M. Global tuberculosis control: lessons learnt and future prospects. Nat Rev Microbiol. 2012;10:407–416. doi: 10.1038/nrmicro2797. [DOI] [PubMed] [Google Scholar]
  28. Lockwood DNJ (2007) Leprosy Clin Evid (Online) Apr 1; 2007 pii: 0915
  29. Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M. NetMHC-30: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11 . Nucleic Acids Res. 2008;36(Web Server):W509–W512. doi: 10.1093/nar/gkn202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Maione D, Margarit I, Rinaudo CD, Masignani V, Mora M, Scarselli M, Tettelin H, Brettoni C, Iacobini ET, Rosini R, et al. Identification of a universal Group B streptococcus vaccine by multiple genome screen. Science. 2005;309:148–150. doi: 10.1126/science.1109869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Liebert CA, Liu C, Lu F, Marchler GH, Mullokandov M, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Yamashita RA, Yin JJ, Zhang D and Bryant SH (2005) CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res 192–196 [DOI] [PMC free article] [PubMed]
  32. Marinova D, Gonzola-Asensio J, Aguilo N, Martin C. Recent developments in tuberculosis vaccines. Expert Rev Vaccines. 2013;12:1431–1438. doi: 10.1586/14760584.2013.856765. [DOI] [PubMed] [Google Scholar]
  33. Mayer KH, Dukes HC. Synergistic pandemics: confronting the global HIV and tuberculosis epidemics. Clin Infect Dis. 2010;3:S67–S70. doi: 10.1086/651475. [DOI] [PubMed] [Google Scholar]
  34. McCarthy AA. Broad institute: bringing genomics to real-world medicine. Chem Biol. 2005;12:717–718. doi: 10.1016/j.chembiol.2005.07.003. [DOI] [PubMed] [Google Scholar]
  35. McShane H. Tuberculosis vaccines: beyond bacille Calmette-Guerin. Philos Trans R Soc Lond B Biol Sci. 2011;366:2782–2789. doi: 10.1098/rstb.2011.0097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Menozzi FD, Bischoff R, Fort E, Brennan MJ, Locht C. Molecular characterization of the mycobacterial heparin-binding hemagglutinin, a mycobacterial adhesin. Proc Natl Acad Sci USA. 1998;13:12625–12630. doi: 10.1073/pnas.95.21.12625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Merle CS, Cunha SS, Rodrigues LC. BCG vaccination and leprosy protection: review of current evidence and status of BCG in leprosy control. Expert Rev Vaccines. 2010;9:209–222. doi: 10.1586/erv.09.161. [DOI] [PubMed] [Google Scholar]
  38. Mora M, Veggi D, Santini L, Pizza M, Rappuoli R. Reverse vaccinology. Drug Discov Today. 2003;8:459–464. doi: 10.1016/S1359-6446(03)02689-8. [DOI] [PubMed] [Google Scholar]
  39. Moreno-Hagelsieb G, Latimer K. Choosing BLAST options for better detection of orthologs as reciprocal best hits. Bioinformatics. 2008;24:319–324. doi: 10.1093/bioinformatics/btm585. [DOI] [PubMed] [Google Scholar]
  40. Moutaftsi M, Peters B, Pasquetto V, Tscharke DC, Sidney J, Bui HH, Grey H, Sette A. A consensus epitope prediction approach identifies the breadth of murine T(CD8 +)-cell responses to vaccinia virus. Nat Biotechnol. 2006;24:817–819. doi: 10.1038/nbt1215. [DOI] [PubMed] [Google Scholar]
  41. Nackers F, Dramaix M, Johnson RC, Zinsou C, Robert A, de Biurrun Bakedano E, Glynn JR, Portaels F, Tonglet R. BCG vaccine effectiveness against Buruli ulcer: a case-control study in Benin. Am J Trop Med Hyg. 2006;75:768–774. [PubMed] [Google Scholar]
  42. Parker KC, Bednarek MA, Coligan JE. Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. J Immunol. 1994;152:163–175. [PubMed] [Google Scholar]
  43. Patronov A, Doytchinova I. T-cell epitope vaccine design by immunoinformatics. Open Biol. 2013;3:120139. doi: 10.1098/rsob.120139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Cooper PS, Lipshultz D, Matten WT, McGinnis SD, Pechous S, Romiti ML, Tao T, Valjavec-Gratian M, Sayers EW (2010) Education resources of the National Center for Biotechnology Information. Brief Bioinform 11:563–569 [DOI] [PMC free article] [PubMed]
  45. Pizza M, Scarlato V, Masignani V, Giuliani MM, Aricò B, Comanducci M, Jennings GT, Baldi L, et al. Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. Science. 2000;287:1816–1820. doi: 10.1126/science.287.5459.1816. [DOI] [PubMed] [Google Scholar]
  46. Ramachandran S, Chaudhuri R, Verma SP, Shah AR, Paul C, Chakraborty S, Puniya BL and Mandal RS (2011) Biological Data Modelling and Scripting in R, Systems and Computational Biology - Bioinformatics and Computational Modeling, Prof Ning-Sun Yang (Ed), InTech. http://www.intechopen.com/books/systems-and-computational-biology-bioinformatics-and-computational-modeling/biological-data-modelling-and-scripting-in-r
  47. R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/
  48. Rappuoli R. Reverse vaccinology. Curr Opin Microbiol. 2000;3:445–450. doi: 10.1016/S1369-5274(00)00119-3. [DOI] [PubMed] [Google Scholar]
  49. Rehm A, Stern P, Ploegh HL, Tortorella D. Signal peptide cleavage of a type I membrane protein, HCMV US11, is dependent on its membrane anchor. EMBO J. 2001;20:1573–1582. doi: 10.1093/emboj/20.7.1573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Ross BC, Czajkowski L, Hocking D, Margetts M, Webb E, Rothel L, Patterson M, Agius C, Camuglia S, Reynolds E, Littlejohn T, Gaeta B, Ng A, Kuczek ES, Mattick JS, Gearing D, Barr IG. Identification of vaccine candidate antigens from a genomic analysis of Porphyromonas gingivalis. Vaccine. 2001;19:4135–4142. doi: 10.1016/S0264-410X(01)00173-6. [DOI] [PubMed] [Google Scholar]
  51. Sachdeva G, Kumar K, Jain P, Ramachandran S. SPAAN: a software program for prediction of adhesins and adhesin-like proteins using neural networks. Bioinformatics. 2005;21:483–491. doi: 10.1093/bioinformatics/bti028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Saha S, Raghava GP (2006) AlgPred: prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Res W202–W209 [DOI] [PMC free article] [PubMed]
  53. Saha S, Raghava GP. Prediction of continuous b-cell epitopes in an antigen using Recurrent Neural Network. Proteins. 2006;65:40–48. doi: 10.1002/prot.21078. [DOI] [PubMed] [Google Scholar]
  54. Saha S, Raghava GP. Prediction methods for B-cell epitopes. Methods Mol Biol. 2007;409:387–394. doi: 10.1007/978-1-60327-118-9_29. [DOI] [PubMed] [Google Scholar]
  55. Sette A, Rappuoli R. Reverse vaccinology: developing vaccines in the era of genomics. Immunity. 2010;4:530–541. doi: 10.1016/j.immuni.2010.09.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Singh H, Raghava GP. ProPred: prediction of HLA-DR binding sites. Bioinformatics. 2001;17:1236–1237. doi: 10.1093/bioinformatics/17.12.1236. [DOI] [PubMed] [Google Scholar]
  57. Stone AC, Wilbur AK, Buikstra JE, Roberts CA. Tuberculosis and leprosy in perspective. Am J Phys Anthropol. 2009;49:66–94. doi: 10.1002/ajpa.21185. [DOI] [PubMed] [Google Scholar]
  58. Sweredoski MJ, Baldi P. PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure. Bioinformatics. 2008;24:1459–1460. doi: 10.1093/bioinformatics/btn199. [DOI] [PubMed] [Google Scholar]
  59. Thorpe C, Edwards L, Snelgrove R, Finco O, Rae A, Grandi G, Guilio R, Hussell T. Discovery of a vaccine antigen that protects mice from Chlamydia pneumoniae infection. Vaccine. 2007;25:2252–2260. doi: 10.1016/j.vaccine.2006.12.003. [DOI] [PubMed] [Google Scholar]
  60. Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I, Salimi N, Damle R, Sette A, Peters B (2010) The immune epitope database 20. Nucleic Acids Res 38(Database issue): D854–D862 [DOI] [PMC free article] [PubMed]
  61. Vivona S, Bernante F, Filippini F. NERVE: new enhanced reverse vaccinology environment. BMC Biotechnol. 2006;6:35. doi: 10.1186/1472-6750-6-35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Vivona S, Gardy JL, Ramachandran S, Brinkman FS, Raghava GP, Flower DR, Filippini F. Computer-aided biotechnology: from immuno-informatics to reverse vaccinology. Trends Biotechnol. 2008;26:190–200. doi: 10.1016/j.tibtech.2007.12.006. [DOI] [PubMed] [Google Scholar]
  63. Waller EA, Roy A, Brumble L, Khoor A, Johnson MM, Garland JL. The expanding spectrum of Mycobacterium avium complex-associated pulmonary disease. Chest. 2006;130:1234–1241. doi: 10.1378/chest.130.4.1234. [DOI] [PubMed] [Google Scholar]
  64. Wang P, Sidney J, Kim Y, Sette A, Lund O, Nielsen M, Peters B. Peptide binding predictions for HLA DR, DP and DQ molecules. BMC Bioinform. 2010;11:568. doi: 10.1186/1471-2105-11-568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Yu NY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, Dao P, Sahinalp SC, Ester M, Foster LJ, Brinkman FS. PSORTb 30: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics. 2010;26:1608–1615. doi: 10.1093/bioinformatics/btq249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Zhang Q, Wang P, Kim Y, Haste-Andersen P, Beaver J, Bourne PE, Bui HH, Buus S, Frankild S, Greenbaum J, Lund O, Lundegaard C, Nielsen M, Ponomarenko J, Sette A, Zhu Z, Peters B. Immune epitope database analysis resource (IEDB-AR) Nucleic Acids Res. 2008;36(Web Server):W513–W518. doi: 10.1093/nar/gkn254. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Systems and Synthetic Biology are provided here courtesy of Springer Science+Business Media B.V.

RESOURCES