A Computational Vaccine Designing Approach for MERS-CoV Infections

Hiba Siddig Ibrahim; Shamsoun Khamis Kafi

doi:10.1007/978-1-0716-0389-5_4

. 2020 Mar 12;2131:39–145. doi: 10.1007/978-1-0716-0389-5_4

A Computational Vaccine Designing Approach for MERS-CoV Infections

Hiba Siddig Ibrahim ^3,^✉, Shamsoun Khamis Kafi ⁴

Editor: Namrata Tomar^*

PMCID: PMC7121163 PMID: 32162250

Abstract

The aim of this study was to use IEDB software to predict the suitable MERS-CoV epitope vaccine against the most known world population alleles through four selecting proteins such as S glycoprotein and envelope protein and their modification sequences after the pandemic spread of MERS-CoV in 2012. IEDB services is one of the computational methods; the output of this study showed that S glycoprotein, envelope (E) protein, and S and E protein modified sequences of MERS-CoV might be considered as a protective immunogenic with high conservancy because they can elect both neutralizing antibodies and T-cell responses when reacting with B-cell, T-helper cell, and cytotoxic T lymphocyte. NetCTL, NetChop, and MHC-NP were used to confirm our results. Population coverage analysis showed that the putative helper T-cell epitopes and CTL epitopes could cover most of the world population in more than 60 geographical regions. According to AllerHunter results, all those selected different protein showed non-allergen; this finding makes this computational vaccine study more desirable for vaccine synthesis.

Key words: Middle East respiratory syndrome coronavirus, Severe acute respiratory syndrome coronavirus, Federal Drug Administration, Immuno epitope database, FAO, AllerHunter

Introduction

Vaccine development was considered as the most important subjects to protect from a highly infectious disease especially when treatment is not available; nowadays, a new way for vaccine design was done by a new aspects called immune-informatics that depends on software program to determine the most immunogenic parts of the organisms (epitopes) like these software that were used in this study to try to develop more powerful immunogenic MERS-CoV vaccine because the previous MERS-CoV vaccine can be either inactivated coronavirus, live attenuated coronavirus, S protein-based, DNA vaccines, and combination vaccines against coronaviruses; as we know coronaviruses were first described in the 1960s from the nasal cavities of patients with common cold. These strains of coronaviruses were called HC-229E and HC-OC43; in 2003, following the outbreak of severe acute respiratory syndrome (SARS) that resulted in over 8000 infections, about 10% of which resulted in death, but in 24 September 2012, a first report of isolated new novel coronavirus like SARS-CoV by Egyptian virologist Dr. Ali Mohamed Zaki in Jeddah, Saudi Arabia, from the lungs of a 60-year-old male patient with acute pneumonia and acute renal failure becomes a new discovery that was recently called MERS-CoV; this finding was posted on ProMED-mail [1–3]. MERS-CoV belong to group C β-coronaviruses that characterize 30 KB genome, ssRNA virus, positive sense with 10 predicting open reading frames (ORFs) like E, M, S, enveloped. MERS-CoV can grow in a culture media; the genome size, organization, and sequence analysis revealed that the NCoV is most closely related to bat coronaviruses BtCoV-HKU4 and BtCoV-HKU5; a partial spike gene sequencing of South African Neoromicia bats was considered as close relative to MERS-Cov as illustrated by nucleotide percentage distance substitution model and the complete deletion option in MEGA; this makes the possibility of a common coronavirus vaccine more desirable [3–5].

This study depended on using S and E with modified S and E protein sequences through in silico approach to develop MERS-CoV vaccine in addition to study the side effects of mutation in those selected sequences on vaccine development. Spike glycoprotein is characterized by a trimeric, envelope-anchored, type I fusion glycoprotein that interfaces with human dipeptidyl peptidase 4 (DPP4) receptor; to mediate viral entry, it is composed of 2 subunits; they are S1, which contains the receptor-binding domain and determines cell tropism, and S2, the location of the cell fusion machinery, while E protein was considered as part of virus cell membrane [4, 6].

This study showed that S, E and their modified sequences can be considered safe and most promising MERS-CoV vaccine without any kinds of allergic reactions.

Materials and Methods

Protein Sequence Retrieval

A total number of 130 spike (S) glycoproteins and 41 envelope (E) proteins of MERS-CoV were retrieved from NCBI (http://www.ncbi.nlm.nih.gov/protein/) database in September 2016, which was actually collected from different parts of the world, such as Saudi Arabia, China, Thailand, United Kingdom, Qatar, Tunisia, and South Africa. The accession numbers of retrieved strains were listed in Supplementary Tables 1 and 2. All methods below were applied for S, E, modified S & E proteins; modified S and E proteins were made by randomly changing some amino acids in their reference sequences; see Table 1 envelope protein (E) with Table 2 spike glycoprotein (S) gene bank accession numbers.

Table 1.

Gene Bank Accession No of Envelope protein

Accession No of E protein	Date and place of collection	Type of specimen
YP_009047209.1	13-Jun-2012
AKJ80142.1	27-May-2015/China	Nasopharyngeal swab
AIZ74456.1	07-May-2013/France	Sputum on Vero E6
AIZ74443.1	07-May-2013/France	Induced sputum
AIZ74434.1	07-May-2013/France	Induced sputum
AIZ74422.1	26-Apr-2013/France	Broncho-alveolar lavage
AIZ74406.1	26-Apr-2013/France	Broncho-alveolar lavage
AID50423.1	10-Feb-2013/United Kingdom	Throat swab
AID50423.1	10-Feb-2013/United Kingdom	Throat swab
ALD51909.1	17-Jun-2015/Thailand	Sputum
AMQ49075.1	24-Aug-2015/Saudi Arabia	Respiratory secretions
AMQ49064.1	27-Aug-2015/Saudi Arabia	Respiratory secretions
AMQ49053.1	24-Aug-2015/Saudi Arabia	Respiratory secretions
AMQ49020.1	12-Jul-2015/Saudi Arabia	Respiratory secretions
AMQ49042.1	24-Aug-2015/Saudi Arabia	Respiratory secretions
AMQ49031.1	24-Aug-2015/Saudi Arabia	Respiratory secretions
ALW82736.1	02-Feb-2015/Saudi Arabia
ALW82714.1	05-Feb-2015/Saudi Arabia	Respiratory secretions
ALW82758.1	10-Feb-2015/Saudi Arabia	Respiratory secretions
ALW82747.1	13-Feb-2015/Saudi Arabia	Respiratory secretions
ALW82696.1	15-Feb-2015/Saudi Arabia	Respiratory secretions
ALW82685.1	07-Feb-2015/Saudi Arabia	Respiratory secretions
ALW82674.1	27-Mar-2015/Saudi Arabia	Respiratory secretions
AFY13312.1	11-Sep-2012/United Kingdom
AIG13101.1	2011/South Africa
AHY21474.1	Mammalian cell line Vero CCL81
AHY22569.1	Nov-2013/Saudi Arabia	nasal swab (camel)
AHB33331.1	07-May-2013/France	Vero E6 isolate/sputum
AHC74092.1	13-Oct-2013/Qatar
AHC74103.1	17-Oct-2013/Qatar
AHI48522.1	02-May-2013/Saudi Arabia
AHI48566.1	05-Aug-2013/Saudi Arabia
AHI48544.1	28-Aug-2013/Saudi Arabia
AHI48533.1	17-Jul-2013/Saudi Arabia
AHI48555.1	12-Jun-2013/Saudi Arabia
AHI48588.1	02-Jul-2013/Saudi Arabia
AHI48577.1	15-Aug-2013/Saudi Arabia
AHI48599.1	12-Jun-2013/Saudi Arabia
AHI48610.1	01-Mar-2013/Saudi Arabia

Open in a new tab

Table 2.

Gene Bank Accession No of S glycoprotein

Accession No of S glycoprotein	Date and place of collection	Type of specimen
YP_009047204.1	13-Jun-2012
AHX00721.1	30-Dec-2013/Saudi Arabia	Camel
AHX00711.1	30-Dec-2013/Saudi Arabia	Dromedary
AHX00731.1	30-Nov-2013/Saudi Arabia	Dromedary
AHZ90568.1	08-May-2013/Tunisia	Serum
AHX71946.1	16-Feb-2014/Qatar	Camelus dromedaries
ALJ54521.1	12-May-2015/Saudi Arabia	Respiratory secretions
ALJ54520.1	13-Jun-2015/Saudi Arabia	Respiratory secretions
ALJ54519.1	07-Jun-2015/Saudi Arabia	Respiratory secretions
ALJ54518.1	04-Jun-2015/Saudi Arabia	Respiratory secretions
ALJ54517.1	03-Jun-2015/Saudi Arabia	Respiratory secretions
ALJ54516.1	02-Jun-2015/Saudi Arabia	Respiratory secretions
ALJ54515.1	01-Jun-2015/Saudi Arabia	Respiratory secretions
ALJ54514.1	29-May-2015/Saudi Arabia	Respiratory secretions
ALJ54513.1	25-Apr-2015/Saudi Arabia	Respiratory secretions
ALJ54512.1	27-May-2015/Saudi Arabia	Respiratory secretions
ALJ54511.1	27-May-2015/Saudi Arabia	Respiratory secretions
ALJ54510.1	28-May-2015/Saudi Arabia	Respiratory secretions
ALJ54509.1	28-May-2015/Saudi Arabia	Respiratory secretions
ALJ54508.1	29-May-2015/Saudi Arabia	Respiratory secretions
ALJ54507.1	29-May-2015/Saudi Arabia	Respiratory secretions
ALJ54506.1	23-May-2015/Saudi Arabia	Respiratory secretions
ALJ54505.1	22-May-2015/Saudi Arabia	Respiratory secretions
ALJ54504.1	20-May-2015/Saudi Arabia	Rrespiratory secretions
ALJ54503.1	17-May-2015/Saudi Arabia	Respiratory secretions
ALJ54502.1	12-May-2015/Saudi Arabia	Respiratory secretions
ALJ54501.1	21-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54500.1	10-May-2015/Saudi Arabia	Respiratory secretions
ALJ54499.1	09-May-2015/Saudi Arabia	Respiratory secretions
ALJ54498.1	09-May-2015/Saudi Arabia	Respiratory secretions
ALJ54497.1	09-May-2015/Saudi Arabia	Respiratory secretions
ALJ54496.1	16-Apr-2015/Saudi Arabia	Respiratory secretions
ALJ54495.1	13-Apr-2015/Saudi Arabia	Respiratory secretions
ALJ54494.1	04-Apr-2015/Saudi Arabia	Respiratory secretions
ALJ54493.1	04-Apr-2015/Saudi Arabia	Respiratory secretions
ALJ54492.1	30-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54491.1	25-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54490.1	24-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54489.1	08-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54488.1	04-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54487.1	04-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54486.1	28-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54485.1	25-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54484.1	14-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54483.1	13-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54482.1	13-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54481.1	13-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54480.1	10-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54479.1	01-Apr-2015/Saudi Arabia	Respiratory secretions
ALJ54478.1	29-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54477.1	29-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54476.1	21-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54475.1	20-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54474.1	09-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54473.1	05-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54472.1	01-May-2015/Saudi Arabia	Respiratory secretions
ALJ54471.1	08-May-2015/Saudi Arabia	Respiratory secretions
ALJ54470.1	10-May-2015/Saudi Arabia	Respiratory secretions
AID55078.1	2014/Saudi Arabia
AID55077.1	2014/Saudi Arabia
AID55076.1	2014/Saudi Arabia
AID55075.1	2014/Saudi Arabia
AID55074.1	2014/Saudi Arabia
AID55073.1	22-Apr-2014/Saudi Arabia
AID55072.1	15-Apr-2014/Saudi Arabia
AID55071.1	21-Apr-2014/Saudi Arabia
AID55070.1	14-Apr-2014/Saudi Arabia
AID55069.1	12-Apr-2014/Saudi Arabia
AID55068.1	07-Apr-2014/Saudi Arabia
AID55067.1	2014/Saudi Arabia
AID55066.1	2014/Saudi Arabia
ALJ54469.1	13-May-2015/Saudi Arabia	Respiratory secretions
ALJ54468.1	10-May-2015/Saudi Arabia	Respiratory secretions
ALJ54467.1	12-May-2015/Saudi Arabia	Respiratory secretions
ALJ54466.1	12-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54465.1	07-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54464.1	08-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54463.1	01-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54462.1	Saudi Arabia	Respiratory secretions
ALJ54461.1	10-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54460.1	21-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54459.1	21-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54458.1	23-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54457.1	23-Feb-2015/Saudi Arabia	Respiratory secretions
AID55098.1	2014/Saudi Arabia
AID55097.1	2014/Saudi Arabia
AID55096.1	2014/Saudi Arabia
AID55095.1	2014/Saudi Arabia
AID55094.1	2014/Saudi Arabia
AID55093.1	2014/Saudi Arabia
AID55092.1	2014/Saudi Arabia
AID55091.1	2014/Saudi Arabia
AID55090.1	2014/Saudi Arabia
AID55089.1	2014/Saudi Arabia
AID55088.1	2014/Saudi Arabia
AID55087.1	2014/Saudi Arabia
AID55086.1	2014/Saudi Arabia
AID55085.1	2014/Saudi Arabia
AID55084.1	2014/Saudi Arabia
AID55083.1	2014/Saudi Arabia
AID55082.1	2014/Saudi Arabia
AID55081.1	2014/Saudi Arabia
AID55080.1	2014/Saudi Arabia
AID55079.1	2014/Saudi Arabia
ALJ54478.1	29-Mar-2015Saudi Arabia	Respiratory secretions
ALJ54477.1	29-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54473.1	05-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54472.1	01-May-2015/Saudi Arabia	Respiratory secretions
ALJ54471.1	08-May-2015/Saudi Arabia	Respiratory secretions
ALJ54470.1	10-May-2015/Saudi Arabia	Respiratory secretions
ALJ54469.1	13-May-2015/Saudi Arabia	Respiratory secretions
ALJ54468.1	10-May-2015/Saudi Arabia	Respiratory secretions
ALJ54467.1	12-May-2015/Saudi Arabia	Respiratory secretions
ALJ54466.1	12-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54465.1	07-Mar-2015/Saudi Arabia	Respiratory secretions
ALJ54464.1	08-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54463.1	01-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54462.1	30-Jan-2015/Saudi Arabia	Respiratory secretions
ALJ54461.1	10-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54460.1	21-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54459.1	21-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54458.1	23-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54457.1	23-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54456.1	26-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54454.1	28-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54455.1	28-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54453.1	06-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54452.1	14-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54451.1	14-Feb-2015/Saudi Arabia	Respiratory secretions
ALJ54450.1	12-Feb-2015/Saudi Arabia	Respiratory secretions

Open in a new tab

In Silico PCR

(http://insilico.ehu.es/PCR_virus/) In silico PCR amplification is a program that made amplification against sequenced viruses, by mimicking PCR amplification and primers confirmatory tools too; here it was used for the above viruses by using store gene bank sequence; it contains 1783 sequences from 1421 completely sequenced viruses (last update: 31 May 2010).

Determination of Conserved Regions

The retrieved sequences, which were collected from NCBI, were used as a platform to obtain the conserved regions by using multiple sequence alignment (MSA). Sequences were aligned with the aid of ClustalW as implemented in the BioEdit program, version 7.0.9.0.

B-Cell Epitope Prediction

B-cell epitope is characterized by being hydrophilic, accessible, flexible, antigenic propensity and in a beta turn region. Thus, the classical propensity scale methods and hidden Markov model programmed software from IEDB analysis resource (http://www.iedb.org/) were used for the following aspects:

Prediction of Linear B-Cell Epitopes

BepiPred from immune epitope database and analysis resource (http://toolsiedb.ofg/bcell/) was used for linear B-cell epitope prediction from the conserved region with a default threshold value of 0.350. BepiPred combines the predictions of a hidden Markov model and the propensity scale of Parker et al. as it is described in Larsen et al. (Immunome Research, 2006).

Prediction of Surface Accessibility

By Emini surface accessibility prediction tool of the immune epitope database (IEDB), the surface-accessible epitopes were predicted from the conserved regions holding the default threshold value 1.000 or higher.

Prediction of Epitope Antigenicity Sites

The Kolaskar and Tongaonkar antigenicity method was used to determine the antigenic sites with a default threshold value of 1.045.

Prediction of Epitope Hydrophilicity

Parker hydrophilicity prediction tool was used to determine the hydrophilicity of the conserved regions; the threshold default value was 1.286.

Prediction of Beta Turn Sites

Chou and Fasman beta turn prediction method was used with the default threshold 1.009 to determine the sites that contain beta turns.

Prediction of Flexibility

Karplus and Schulz flexibility prediction tools were used for the prediction of chain flexibility in proteins (selection of peptide antigen) with default threshold value 0.992.

Thresholds of all tools were provided by IEDB and it is mainly calculated by the software as the average score of the tested protein for each corresponding tools.

T-Cell Epitope Prediction

Scanning an antigen sequence for amino acid patterns indicative of:

MHC Class I Binding Predictions

Analysis of peptide binding to MHC class I molecules was assessed by the IEDB MHC I prediction tool http://tools.iedb.org/mhci/n; for MHC-I binding prediction, several alleles were used including HLA-A, HLA-B, HLA-C, and HLA-E that have been reported as frequent around the world. MHC-I peptide complex presentation to T lymphocytes undergo several steps. The attachment of cleaved peptides to MHC molecules step was predicted. Consensus method which combines ANN, SMM, and scoring matrices derived from combinatorial peptide libraries (Comblib_Sidney2008) was used. 9-mer epitope lengths were selected. All internationally conserved epitopes that bind to alleles at score equal or less than 1.0 percentile rank (low percentile rank = good binders) were selected for further analysis as in selecting thresholds (cutoffs) for MHC class I and II binding predictions, http://help.iedb.org/entries/23854373-Selecting-thresholds-cut-offs-for-MHC-class-I-and-II-binding-predictions.

Note: For S glycoprotein, the sequence was divided into ten parts due to software limitations, no more than 200 FASTA sequences interring [7–11].

MHC Class II Binding Predictions

Analysis of peptide binding to MHC class II molecules was assessed by the IEDB MHC II prediction tool http://tools.immuneepitope.org/mhcii/. For MHC-II binding prediction, the reference set of alleles was used, which include HLA-DQ, HLA-DP, and HLA-DR that are most frequent around the world. MHC class II groove has the ability to bind to peptides with different lengths. There are seven prediction methods in the IEDB MHC II prediction tool; NetMHCIIpan was used in this study; the conserved epitopes that bind to alleles at scores equal or less than 10 percentile rank were selected for further analysis as in selecting thresholds (cutoffs) for MHC class I and II binding predictions, http://help.iedb.org/entries/23854373-Selecting-thresholds-cut-offs-for-MHC-class-I-and-II-binding-predictions [7, 11–14].

Proteasomal Cleavage/TAP Transport/MHC Class I Combined Predictor

This tool combines predictors of proteasomal processing, TAP transport, and MHC binding to produce an overall score for each peptide’s intrinsic potential of being a T-cell epitope selected; in this study NetMHCpan was used with immunoproteasomal cleavage prediction; there are two types of proteasomes, the constitutively expressed “housekeeping” type and immunoproteasomes that are induced by IFN-γ secretion. Results can be displayed in proteasome score, TAP score, MHC score, processing score, total score, and IC50 score. Explanations of prediction output:

Proteasome cleavage

The scores can be interpreted as logarithms of the total amount of cleavage site usage liberating the peptide C-terminus; it depends on a lot of other factors, e.g., the amount of source protein degraded.

TAP transport

The TAP score estimates an effective −log (IC50) values for the binding to TAP of a peptide or its N-terminal prolonged precursors.

MHC binding

The MHC binding prediction is identical to Class I with output −log (IC50) values.

Processing

This score combines the proteasomal cleavage and TAP transport predictions. It predicts a quantity proportional to the amount of peptide present in the ER, where a peptide can bind to multiple MHC molecules. This allows predicting T-cell epitope candidates independent of MHC restriction.

Total

This score combines the proteasomal cleavage, TAP transport, and MHC binding predictions. It predicts a quantity proportional to the amount of peptide presented by MHC molecules on the cell surface. High scores mean high efficiency.

Neural Network-Based Prediction of Proteasomal Cleavage Sites (NetChop) and T-Cell Epitopes (NetCTL and NetCTLpan)

NetChop that was used here is a predictor of proteasomal processing based upon a neural network. NetCTL and NetCTLpan are predictors of T-cell epitopes along a protein sequence. The positive predictions threshold, 0.5, 0.75, and 1, sequentially for all methods above are displayed in green, while the red color for prediction below the threshold.

MHC-NP: Prediction of Peptides Naturally Processed by the MHC

MHC-NP employs data obtained from MHC elution experiments in order to assess the probability that a given peptide is naturally processed and binds to a given MHC molecule. This tool used in this study was the winner of the second Machine Learning Competition in Immunology; it is composed of three groups of peptides, binders, nonbinders, and eluted peptides that considered as naturally processed peptides, so greater probe score considered naturally processing peptide.

Epitope Analysis Tools

Population Coverage Calculation

All potential MHC I and MHC II binders from spike glycoprotein, E protein, and S and E modified sequences were assessed for a population coverage against the whole world population especially Saudi Arabia with other reported MERS-CoV countries. Calculations are achieved using the selected MHC-I and MHC-II interacted alleles by the IEDB population coverage calculation tool http://tools.iedb.org/tools/population/iedb_input; it computes projected population coverage, average number of epitope hits/HLA combinations recognized by the population, and minimum number of epitope hits/HLA combinations recognized by 90% of the population (PC90).

Homology Modeling

The complete 3D structure of spike glycoprotein and envelope protein was obtained by phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2) which uses advanced remote homology detection methods to build 3D models. UCSF Chimera (version 1.8) was used to visualize the 3D structure, which is currently available within the chimera package and available from the chimera website (http://www.cgl.ucsf.edu/cimera). Homology modeling was achieved for further verification of the service accessibility and hydrophilicity of B-lymphocyte epitopes predicted, as well as visualization of all predicted T-cell epitopes in the structural level.

In addition to the above methods, three other software were used to determine the effect that was induced in S and E reference sequences among the amino acid (SNP, single nucleotide polymorphism).

Confirmation of Amino Acid Change in Spike Glycoprotein (S) and Envelope Protein (E) Sequence

PolyPhen-2

(Polymorphism Phenotyping v2) (http://genetics.bwh.harvard.edu/pph2/index.shtml) is an online bioinformatics program to automatically predict the consequence of an amino acid change on the structure and function of a protein was assessed here. Basically, this program searches for 3D protein structures, multiple alignments of homologous sequences, and amino acid contact information in several protein structure databases and then calculates position-specific independent count scores (PSIC) for each of two variants and then computes the PSIC score difference between two variants; PolyPhen scores were assigned as probably damaging (2.00 or more), possibly damaging (1.40–1.90), potentially damaging (1.0–1.50), and benign (0.00–0.90). Basically PolyPhen accepts input in form of SNPs or protein sequences [18].

I-Mutant Suite

I used I-Mutant version 3.0 (http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi) to predict the protein stability changes upon single-site mutations. I-Mutant3.0 basically can evaluate the stability change of a single-site mutation starting from the protein structure or from the protein sequences. This program was trained on some data set derived from ProTherm which is considered to be the most comprehensive database of experimental data on protein mutations [18].

Project Hope Mutation

(http://www.cmbi.ru.nl/hope/) Hope Version 1.1.0, HOPE is an easy-to-use web service that analyzes the structural effects of a point mutation in a protein sequence.

SNPs and GO

(http://snps.biofold.org/snps-and-go//snps-and-go.html) were used to predict disease-associated variations through using GO terms by collected information in a unique framework that derived from protein sequence, 3D structure, protein sequence profile, and protein function, beside gene ontology annotation to predict if a given variation can be classified disease-related or neutral. It calculates the result according to the three methods used depending on SVM type and data such as:

PANTHER

output of the PANTHER algorithm.

PhD-SNP

SVM input is the sequence and profile at the mutated position.

SNPs and GO

SVM input is all the input in PhD-SNP, PANTHER, and GO term features, by giving disease probability (if >0.5 mutation is predicted disease).

Peptide Search Tool

The peptide search tool was used to find all UniProtKB sequences that exactly match a query peptide sequence (http://www.uniprot.org/peptidesearch/). This means we can easily synthesis the desired peptides in the laboratory by cloning methods and so on to study peptide impact on immune system via injected laboratory animals with peptide sequence of any organisms.

AllerHunter

(http://tiger.dbs.nus.edu.sg/AllerHunter/index.html) is a cross-reactive allergen prediction program built on a combination of support vector machine (SVM) and pairwise sequence similarity. Results of prediction of query sequence(s) can be achieved by using AllerHunter and FAO/WHO evaluation scheme; in AllerHunter sequence can be considered as a cross-reactive allergen if it has a probability of ≧0.06, while in the guideline of the FAO/WHO, they stated that a sequence is potentially allergenic if it either has an identity of at least 6 contiguous amino acids OR >35 percent sequence identity over a window of 80 amino acids when compared to known allergens.

AlgPred: Prediction of Allergenic Proteins and Mapping of IgE Epitopes

(http://www.imtech.res.in/raghava/algpred/index.html) AlgPred used to predict allergenic protein and mapping of IgE epitopes by:

It allows prediction of allergens based on similarity of known epitope with any region of protein.
The mapping of IgE epitope(s) feature of server allows user to locate the position of epitope in their protein.
Server search MEME/MAST allergen motifs using MAST and assign a protein allergen if it has any motif.
It allows predicting allergens based on SVM modules using amino acid or dipeptide composition.
It facilitates BLAST search against 2890 allergen-representative peptides (ARPs) obtained from Bjorklund et al. (2005) and assigns a protein allergen if it has a BLAST hit.
Hybrid option of server allows predicting allergen using combined approach (SVMc + IgE epitope + ARPs BLAST + MAST).

VaxiJen v2.0

(http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen_help.html) VaxiJen is the first server for alignment-independent prediction of protective antigens. It was developed to allow antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment.

Results

Prediction of B-Cell Epitopes

Spike glycoprotein, E protein, and modified S and E protein were subjected to BepiPred linear epitope prediction, Emini surface accessibility, Kolaskar and Tongaonkar antigenicity, Parker hydrophobicity, Chou and Fasman beta turn prediction methods, and Karplus and Schulz flexibility in IEDB, as the results in Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, and 24.

Fig. 1 — BepiPred linear epitope prediction of S glycoprotein, the desired epitope residue showed in yellow color. The red horizontal line indicates surface accessibility threshold (0.35)

Fig. 2 — Emini surface accessibility prediction of S glycoprotein. The desired epitope residue for surface accessibility showed in yellow color, while green color was below threshold (1.000)

Fig. 3 — Kolaskar and Tongaonkar antigenicity prediction of S glycoprotein. The desired epitope residue for antigenicity showed in yellow color, while the green color below the red horizontal line indicates less antigenicity below (1.045)

Fig. 4 — Parker hydrophilicity prediction of S glycoprotein. The desired epitope residue showed in yellow color. The red horizontal line indicates parker hydrophilicity threshold (1.286)

Fig. 5 — Chou and Fasman beta turn prediction of S glycoprotein. The desired epitope residue showed in yellow color. The red horizontal line indicates beta turn prediction threshold (1.009)

Fig. 6 — Karplus and Schulz flexibility prediction of S glycoprotein. The desired epitope residue showed in yellow color. The red horizontal line indicates surface accessibility threshold (0.35)

Fig. 7 — BepiPred linear epitope prediction of S glycoprotein modified sequence. The desired epitope residue showed in yellow color. The red horizontal line indicates BepiPred Linear Epitope threshold (0.35)

Fig. 8 — Emini surface accessibility prediction of S glycoprotein modified sequence. The desired epitope residue showed in yellow color, while green color below the red horizontal line indicates surface accessibility threshold ≤ (1.000)

Fig. 9 — Kolaskar and Tongaonkar antigenicity prediction of S glycoprotein modified sequence. The desired epitope residue showed in yellow color. The red horizontal line indicates antigenicity threshold ≤ (1.045)

Fig. 10 — Parker hydrophilicity prediction of S glycoprotein modified sequence. The desired epitope residue showed in yellow color, while green color below the red horizontal line indicates hydrophilicity threshold ≤ (1.286)

Fig. 11 — Chou and Fasman beta turn prediction of S glycoprotein modified sequence. The desired epitope residue showed in yellow color. The red horizontal line indicates beta turn threshold (1.009)

Fig. 12 — Karplus and Schulz flexibility prediction of S glycoprotein modified sequence. The desired epitope residue showed in yellow color, while green color below the red horizontal line indicates flexibility threshold ≤ (0.992)

Fig. 13 — BePipred linear epitope prediction of E protein. The desired epitope residue showed in yellow color. The red horizontal line indicates Bepipred Linear Epitope threshold ≤ (0.35)

Fig. 14 — Emini surface accessibility prediction of E protein. The desired epitope residue showed in yellow color, while green color below the red horizontal line indicates surface accessibility threshold (1.000)

Fig. 15 — Kolaskar and Tongaonkar antigenicity prediction of E protein. The desired epitope residue showed in yellow color, while green color below the red horizontal line indicates antigenicity threshold (1.045)

Fig. 16 — Parker hydrophilicity prediction of E protein the desired epitope residue showed in yellow color. The red horizontal line indicates hydrophilicity threshold ≤ (1.286)

Fig. 17 — Chou and Fasman beta turn prediction of E protein. The desired epitope residue showed in yellow color. The red horizontal line indicates beta turn threshold ≤ (1.009)

Fig. 18 — Karplus and Schulz flexibility prediction of E protein. The desired epitope residue showed in yellow color, while green color below the red horizontal line indicated flexibility below threshold (0.992)

Fig. 19 — BepiPred linear epitope prediction of E protein modified sequence. The desired epitope residue showed in yellow color. The red horizontal line indicates BepiPred Linear Epitope threshold (0.35)

Fig. 20 — Emini surface accessibility prediction of E protein modified sequence. The desired epitope residue showed in yellow color, above the red horizontal line threshold (1.000)

Fig. 21 — Kolaskar and Tongaonkar Antigenicity prediction of E protein modified sequence. The desired epitope residue showed in yellow color, while green color indicates antigenicity below threshold (1.045)

Fig. 22 — Parker hydrophilicity prediction of E protein modified sequence. The desired epitope residue showed in yellow color. The red horizontal line indicates hydrophilicity threshold ≤ (1.286)

Fig. 23 — Chou and Fasman beta turn prediction of E protein modified sequence. The desired epitope residue showed in yellow color, while green color below the red horizontal line indicates low beta turn threshold ≤ (1.009)

Fig. 24 — Karplus and Schulz flexibility prediction of E protein modified sequence. The desired epitope residue showed in yellow color that illustrates flexibility threshold ≤ (0.992)

BepiPred Linear Epitope Prediction Method

The average binder score of spike glycoprotein to B cell was 0.35; all values equal or greater than the default threshold 0.35 were predicted to be potential B-cell binders.

Emini Surface Accessibility Prediction

The average surface accessibility areas of the protein were scored as 1.000; all values equal or greater than the default threshold 1.0 were regarded potentially in the surface. A total number of positive S glycoprotein peptide represent 481 peptide out of 1349, while in E protein represents 23 out of 77 and in S and E modified sequence represents 485 out 485 and 17out of 77 peptides sequentially.

Kolaskar and Tongaonkar Antigenicity

The default threshold of antigenicity of the protein was 1.045; all values greater than 1.045 were considered as potential antigenic determinants. The positive result number of selected S glycoprotein peptide represents 655 out of 1348, while in E protein represents 55 out of 76 and in S and E modified sequence represents 668 out of 668 and 47 out of 76 peptides sequentially.

Parker Hydrophilicity Prediction

The average hydrophilicity score of the protein was 1.286; all values equal or greater than the default threshold 1.286 were potentially hydrophilic. The positive result number of S glycoprotein peptide represents 693 out of 1348, while in E protein represents 18 out of 76 and in S and E modified sequence represents 690 out of 695 and 20 out of 76 peptides sequentially.

Chou and Fasman Beta Turn Prediction

To determine the site that contains beta turns, the default threshold was 1.009; all values equal or greater than the default threshold were considered beta turn sites. The positive result number of selected peptide represents 668 out of 1348 in S glycoprotein, while it represents 19 out of 76 in E protein and 673 out of 673 with 21 out of 76 in both S and E modified sequence sequentially.

Karplus and Schulz Flexibility Prediction

The default threshold value 0.992 determined chain flexibility in proteins, so all values equal or greater than the default threshold were considered as chain flexibility of protein. The positive results of selected peptide represent 679 out of 1347 in S glycoprotein, and it represents 24 out of 24 in E protein beside represented 680 out of 681 and 24 out of 75 in S and E modified sequences sequentially.

The most common B-cell epitope for E protein is YVKFQDS in a position 69, while for E protein modified sequence, they are VYVPQQD, YVPQQDS, and PPLPED/PPLPEDV in positions 68, 69, and 77 respectively.

The most common B-cell epitopes for both S and modified S are DVGPDSV, PDSVKSA, DSVKSAC, PRPIDVS, HTPATDC, AKPSGSV, KPSGSVV, SGTPPQV, GTPPQVY, TPPQVYN, QLSPLEG, YGPLQTP, PRSVRSV, RSVRSVP, SVKSSQS, VKSSQSS, SQSSPII, and SLNTKYV in the following positions 23, 26, 27, 48, 211, 371, 372, 393, 394, 395, 547, 707, 750, 751, 855, 856, 859 (or 857 in modified S), and 1202 sequentially; but QVDQLNS and VDQLNSS in positions 772 and 773 are ordinary only found in S glycoprotein, while LTPTSSY, TPTSSYV, PTSSYVD, TSSYVDV, DHGDYYV, YSQDVKQ, ANQYSPC, NQYSPCV, and YYRKQLS in a positions 15, 16, 17, 18, 83, 108, 523, 524, and 543 sequentially are only found in S glycoprotein modified sequence.