Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2020 Jul 11;85:104457. doi: 10.1016/j.meegid.2020.104457

Genome-wide analysis of Indian SARS-CoV-2 genomes for the identification of genetic mutation and SNP

Indrajit Saha a,⁎,1, Nimisha Ghosh b,1, Debasree Maity c, Nikhil Sharma d, Jnanendra Prasad Sarkar e,f, Kaushik Mitra g
PMCID: PMC7351671  PMID: 32659347

Abstract

The wave of COVID-19 is a big threat to the human population. Presently, the world is going through different phases of lock down in order to stop this wave of pandemic; India being no exception. We have also started the lock down on 23rd March 2020. In this current situation, apart from social distancing only a vaccine can be the proper solution to serve the population of human being. Thus it is important for all the nations to perform the genome-wide analysis in order to identify the genetic variation in Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) so that proper vaccine can be designed. This fast motivated us to analyze publicly available 566 Indian complete or near complete SARS-CoV-2 genomes to find the mutation points as substitution, deletion and insertion. In this regard, we have performed the multiple sequence alignment in presence of reference sequence from NCBI. After the alignment, a consensus sequence is built to analyze each genome in order to identify the mutation points. As a consequence, we have found 933 substitutions, 2449 deletions and 2 insertions, in total 3384 unique mutation points, in 566 genomes across 29.9 K bp. Further, it has been classified into three groups as 100 clusters of mutations (mostly deletions), 1609 point mutations as substitution, deletion and insertion and 64 SNPs. These outcomes are visualized using BioCircos and bar plots as well as plotting entropy value of each genomic location. Moreover, phylogenetic analysis has also been performed to see the evolution of SARS-CoV-2 virus in India. It also shows the wide variation in tree which indeed vivid in genomic analysis. Finally, these SNPs can be the useful target for virus classification, designing and defining the effective dose of vaccine for the heterogeneous population.

Keywords: Multiple sequence alignment, Point mutation, SNPs, SARS-CoV-2

Graphical abstract

Unlabelled Image

1. Introduction

Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) and its associated disease known as COVID-19, was originated in Wuhan, China (Zhu et al., 2020). It has wreaked havoc on human lives and declared as a pandemic by World Health Organisation on 11th March 2020. Among others, symptoms of COVID-19 include fever, cough and shortness of breath (Chen et al., 2020). In more severe cases, infection may lead to pneumonia (Zhou et al., 2020), kidney failure and eventual death. As of now, no vaccine or medicine has been invented or discovered and the only protective measures are being taken by different countries are through lock downs and social distancing. However, even these extreme measures have not been able to contain the spread of SARS-CoV-2. Everyday thousands of new cases are coming into light. According to the record of 27th June, globally more than 9.8 million people are affected by this deadly virus, with a total death count close to 500 thousand (Worldometer, 2020). India is also fighting hard to keep the citizens safe through lock down and other protective measures. Yet the virus has shown a surge in the country with more than 500 thousand affected cases and death cases of more than 15 thousand as of 27th June 2020 (Worldometer, 2020).

The metagenomic analysis using Next-Generation Sequencing (NGS) (Lu et al., 2020) reveals that the SARS-CoV-2 is a single-stranded enveloped RNA virus with a genome length of 29.9 kilobases (Cui et al., 2019; Su et al., 2016; Weiss and Navas-Martin, 2005; Zhou et al., 2020). It has 11 coding regions as reported in NCBI that can encode ORF1ab polyproteins, spike (S) glycoprotein, envelope (E) protein, membrane (M) glycoprotein, nucleocapsid (N) protein and accessory proteins such as ORF3a, ORF6, ORF7a, ORF7b, ORF8 and ORF10. It has also been reported that several non-structural proteins (nsp) are encoded from Open Reading Frame (ORF). The genomic orientation of SARS-CoV-2 virus is shown in Fig. 1 .

Fig. 1.

Fig. 1

Genomic orientation of SARS-CoV-2 virus.

The strain of this virus is novel and the understanding the genetic variability as mutation of this virus in different nations is still very limited, especially the coding region of Open Reading Frame (ORF). Generally, the mutation occurs when an error is incorporated in a viral genome (Fleischmann, 1996). It can also be considered to be a coping mechanism with genomic damage. As a consequence, the resultant mutated strain may cause an outbreak in human host like the case with SARS-CoV-2. The DNA mutation can be of three types: base substitution, deletion and insertion. Moreover, if the substitution occurs more than 1% of the population, it can be considered as Single Nucleotide Polymorphism (SNP). Such polymorphism usually different from the mutation as it creates a variant in the population while mutation keeps the population same (Pavlovic-Lazetic et al., 2004). On the other hand, RNA viruses have high mutation rates (Jenkins et al., 2002; Woo et al., 2009). Thus it is difficult to identify the proper string of the virus. Subsequently, designing and define the dose of the vaccine are also very challenging tasks (Paital et al., 2020). In this regard, Chothe et al. used SNP on sequences of Bovine herpesvirus-1 (BoHV-1) (Chothe et al., 2018), which was affecting cattle and causing respiratory illness, to cluster them into three groups with two different vaccine groups and one distinct cluster of field isolates. Based on this information, they developed an SNP-based PCR assay to show differentiation between vaccine and clinical strains. Analysis of SNPs for Varicella-zoster virus was carried out in (Jeon et al., 2016) which suggested 24 potential vaccine-specific sites. Taking cues from (Chothe et al., 2018) and (Jeon et al., 2016), we have analyzed 566 Indian SARS-CoV-2 genomes to find the genetic variability in terms of point mutation as substitution, deletion and insertion as well as SNP. It can be helpful for the classification of virus strain and accordingly designing of vaccine and defining the dose of the vaccine can be done effectively.

To address the above facts, we have analyzed publicly available 566 Indian complete or near complete SARS-CoV-2 genomes in order to find the mutation points as substitution, deletion and insertion. For this purpose, Multiple Sequence Alignment (Wallace et al., 2005) is performed in presence of reference sequence from NCBI. Thereafter, a consensus sequence is build to analyze each genome to identify the mutation points. As a result, we have found 933 substitutions, 2449 deletions and 2 insertions, in total 3384 unique mutation points, in 566 genomes across 29.9 kbp. Further, it has been classified into three groups (a) cluster of mutation points if the mutation appears more than two times in consecutive genomic positions (b) point mutations as substitution, deletion and insertion that are not present in clusters (c) Single Nucleotide Polymorphism (SNP) that appeared more than 1% of the population of SARS-CoV-2 used in our study. Finally, 100 clusters of mutation (mostly deletions), 1609 point mutations as substitution, deletion and insertion and 64 SNPs out of categories (a) and (b) have been identified as they appeared more than 1% of the population i.e. 6 times in Indian SARS-CoV-2 genomes. These outcomes are visualized using BioCircos and bar plots as well as plotting entropy value of each genomic location. Moreover, phylogenetic analysis (Stuessy, 2009) has also been performed to see the evolution of SARS-CoV-2 virus in India.

2. Material and methods

In this section, we have discussed the source of data or genomic sequence of virus and methods used in systemic way to accomplish this task of finding mutation points as substitution, deletion, insertion as well as SNP.

2.1. Data collection

The genomic sequences of Indian SARS-CoV-2 virus was collected from Global Initiative on Sharing All Influenza Data (GISAID)2 in fasta format on 11th June 2020. The dataset contains 566 genomes with sequence ID and sequence in fasta format. We have extracted the date from the sequence ID to show the number of sequences uploaded per month. This is shown in Fig. 2 . Moreover, the original and shorter sequence IDs (for better visualization) and the length of each genome are mentioned in the supplementary Table S1. The average length of the 566 genomes is 29,831 bp. It is also to be noted that we have considered complete or near complete virus genomes for our study. Further, we have downloaded the Reference Sequence (NC_045512.2)3 from National Center for Biotechnology Information (NCBI) to conduct the experiment with 566 Indian SARS-CoV-2 genomes. This reference genome is also used to map the coding regions as collected from NCBI. This is also reported in Table 1 and used while mentioning the mutation points in result section. Please note that for the data visualization and editing, BioEdit and MEGA-X have been used.

Fig. 2.

Fig. 2

Bar plot shows number of SARS-CoV-2 genomes per day and pie chart shows number of SARS-CoV-2 genomes per month uploaded during January to May 2020.

Table 1.

Coding Regions of SARS-CoV-2 genome after mapping the collected coding regions from NCBI with the Reference Sequence of SARS-CoV-2 genome.

Start coordinate of coding region End coordinate of coding region Length of coding region Name of the coding region
1 265 265 5′-UTR (Non-coding Region)
266 21,555 21,290 ORF1ab
21,563 25,385 3823 Spike (S)
25,393 26,221 829 ORF3a
26,245 26,473 229 Envelope (E)
26,523 27,192 670 Membrane (M)
27,202 27,388 187 ORF6
27,394 27,760 367 ORF7a
27,756 27,888 133 ORF7b
27,894 28,260 367 ORF8
28,274 29,534 1261 Nucleocapsid (N)
29,558 29,675 118 ORF10
29,676 29,905 230 3′-UTR (Non-coding Region)

2.2. Pipeline of the workflow

The pipeline of the workflow is shown in Fig. 3 (a). In order to find the mutations in 566 Indian SARS-CoV-2 genomes, the multiple sequencing alignment (MSA) technique called ClustalW Thompson et al. (1994) is used in presence of reference sequence from NCBI. The ClustalW uses the concept of Neighbor-Joining tree where bootstrap size is consider as 1000. It is a widely used MSA technique for aligning any number of homologous nucleotide or protein sequences like in our case. It uses progressive alignment method where the most similar sequences with the best alignment score are aligned first. After performing the alignment, a consensus sequence is built in order to extract the mutation points from each genome as substitution, deletion and insertion. The detection scheme of identifying substitution, deletion and insertion is shown in Fig. 3(b). This is applied for each virus genome. Once we have the list of substitution, deletion and insertion further (a) a list of clusters of mutation points are identified based on their appearance in consecutive genomic positions. Here a cluster is formed if the consecutive mutation in genomic positions is greater than 2. (b) a list of substitution, deletion and insertion that are not present in the clusters (c) Single Nucleotide Polymorphisms (SNPs) that occur as substitution in more than 1% of the population of virus genomes i.e. 566 × 0.01 = ceil(5.66) = 6 in our case. Similarly, if we consider 10% of the virus population, then such substitution occurs in 57 virus genomes. The substitution, deletion and insertion in nucleotides is also translated to change in amino acid using codon table.4 Moreover, to compute the change in nucleotides in virus sequences, entropy (ℰ) is computed as follows:

=ln5+VaplnVap (1)

where V a p represents the frequency of each residue a occurring at position p. 5 represents the number of possible residues for nucleic acid (in this case 4) plus gap. Further to verify the such genetic variation in Indian SARS-CoV-2 genome phylogenetic analyses is performed so that the evolution can be seen. The phylogenetic analysis is conducted using Maximum Likelihood technique Stuessy (2009) where Neighbor-Joining is used to construct the tree for the visualization of evolution.

Fig. 3.

Fig. 3

(a) Pipeline of the Workflow (b) Different types of mutation considered during identification of substitution, deletion and insertion in Indian SARS-CoV-2 genomes.

3. Results

The results of the experiment are discussed here. Our objective is to identify point mutation as substitution, deletion and insertion initially after performing the multiple sequence alignment. In this regard, we have obtained 933, 2449 and 2 numbers of substitutions, deletions and insertions, respectively. Further, it has been classified as (a) a list of clusters of mutation (b) a list of substitution, deletion and insertion that are not present in the clusters and (c) SNPs that occur more than 1% of the population. The results of the analysis is shown in Fig. 4 . As a result after this analysis, we have obtained 100 clusters that cover 92 substitutions and 1683 deletions. The 20 such clusters based on their size and occurrence in genomic sequences are reported in Table 2 . The rest of the 80 clusters are reported in supplementary Table S2. Moreover, from this clustering results, 6 deletions in different coding regions are shown in Fig. 5 using BioEdit software.

Fig. 4.

Fig. 4

Results of mutation analysis of 566 Indian SARS-CoV-2 genomes.

Table 2.

Mutation of 20 Clusters within Indian SARS-CoV-2 genomes.

Start coordinate of mutation End coordinate of mutation Cluster size of mutation Avg. occurrence of mutation in genomes Type of mutation Avg. entropy of cluster of mutation Mapped with coding region
2 83 82 147 Deletion 0.2757 5′-UTR (Non-coding Region)
3827 3877 51 1 Deletion 0.0150 ORF1ab
6508 6554 47 1 Deletion 0.0161 ORF1ab
6840 6877 38 3 Deletion 0.0363 ORF1ab
7020 7049 30 1 Deletion 0.0151 ORF1ab
7279 7334 56 3 Deletion 0.0341 ORF1ab
8345 8445 101 1 Deletion 0.0146 ORF1ab
19,306 19,354 49 1 Deletion 0.0145 ORF1ab
19,389 19,424 36 1 Deletion 0.0157 ORF1ab
19,497 19,550 54 2 Deletion 0.0267 ORF1ab
21,171 21,208 38 1 Deletion 0.0148 ORF1ab
21,459 21,502 44 2 Deletion 0.0196 ORF1ab
21,547 21,603 57 3 Deletion 0.0367 ORF1ab
23,403 23,405 3 112 Substitution 0.2485 Spike
23,966 24,018 53 3 Deletion 0.0362 Spike
26,340 26,399 60 6 Deletion 0.0540 Envelope
27,541 27,682 142 2 Deletion 0.0284 ORF7a
27,697 27,743 47 2 Deletion 0.0275 ORF7a
28,883 28,885 3 64 Substitution 0.3817 Nucleocapsid
29,742 29,904 163 142 Deletion 0.2406 3′-UTR (Non-coding Region)

Fig. 5.

Fig. 5

Some Mutations in coding regions as deletion in Indian SARS-CoV-2 genomes.

Once the clusters of mutation points are identified, we have found the mutation points as substitution, deletion and insertion that are not present in the clusters. Consequently 841 substitutions, 766 deletions and 2 insertions are identified and shown in Fig. 4. These 1609 mutation points are reported in supplementary Table S3. Thereafter, 933 substitutions are considered to identify SNPs that are present at least in 6 virus genomes as a clause of 1% of the virus population. As a result, we have found 64 SNPs, out of which 57 SNPs in 6 coding regions while 7 of them are present in 5′-UTR and 3′-UTR. However, in Table 3 and Fig. 6 , SNPs that are present in more than 10% of the virus population, i.e. more than 57 virus sequences, are reported and shown as percentage of population having SNP at each genomic location. It describes the size of the virus population for a given genomic position where SNP occurs. The biggest SNPs are identified at 241, 3037, 14,410 and 23,405 genomic coordinates which belongs to the coding regions of ORF1ab and Spike. In these four cases, almost 60% of the virus population are having the change in nucleotide. The rest of the SNPs are shown in supplementary Fig. S1 and reported in supplementary Table S4. Moreover, 6 SNPs in different coding regions are shown in Fig. 7 using BioEdit software. This is to be noted that for each lists of mutations as mentioned in Table 2, Table 3, we have provided genomic coordinates, number of occurrence of mutation in virus genome (frequency of mutation), change in nucleotide, change in amino acid, entropy to measure the change in nucleotide as information contains at that genomic location and mapping with coding region so that mutation point can be identified precisely in Table 2, Table 3. For example, in Table 3 the SNP at 18879 occurs in 117 virus sequences where the change in nucleotide is C>T, change in corresponding amino acid is S>F and the value of entropy is 0.5216. Higher the entropy value signifies that the change in nucleotide is more in sequences.

Table 3.

Mutation as SNPs in more than 10% of population of Indian SARS-CoV-2 genomes.

Coordinate of mutation Occurrence of mutation in genomes Type of mutation Change in nucleotide Change in amino acid Entropy Mapped with coding region
241 341 Substitution C>T S>L 0.7078 5′-UTR (Non-coding Region)
3037 339 Substitution C>T S>F 0.7020 ORF1ab
6310 65 Substitution C>A A>E,A>K 0.3972 ORF1ab
6312 160 Substitution C>A T>K 0.7357 ORF1ab
11,085 204 Substitution G>T,G>A C>F,C>-,C>Y 0.74748 ORF1ab
13,732 187 Substitution C>T A>V,A>- 0.6561 ORF1ab
14,410 332 Substitution C>T P>L 0.7277 ORF1ab
18,879 117 Substitution C>T S>F 0.5216 ORF1ab
22,446 69 Substitution C>T T>I 0.3703 Spike
23,405 334 Substitution A>G D>G 0.7198 Spike
23,931 165 Substitution C>T T>I 0.6856 Spike
25,565 122 Substitution G>T R>I 0.5207 ORF3a
26,737 112 Substitution C>T T>I 0.4969 Membrane
28,313 174 Substitution C>T P>L 0.6987 Nucleocapsid
28,856 71 Substitution C>T S>L 0.3899 Nucleocapsid
28,883 64 Substitution G>A R>K 0.3755 Nucleocapsid
28,884 64 Substitution G>A G>N 0.3849 Nucleocapsid
28,885 64 Substitution G>C G>T 0.3849 Nucleocapsid

Fig. 6.

Fig. 6

SNPs present in more than 10\% of population of Indian SARS-CoV-2 genomes.

Fig. 7.

Fig. 7

Mutations in coding regions as SNPs greater than 1% of population of Indian SARS-CoV-2 genomes.

This is important to mention that the overall results of the mutation as substitution, deletion, insertion, cluster and SNP are shown using BioCircos plots in Fig. 8 where each track shows the frequency of of mutation as histogram using bar and dot plots. Generally, it summarizes all the results visually. Moreover, the computed entropy at each genomic location to have the information of change of nucleotide for the whole population of virus genome is also shown in Fig. 9 . This is prepared using BioEdit software. Finally, phylogenetic analysis is shown in Fig. 10 for 566 and its bootstrap samples of 60 virus sequences in order to visualize the variation in trees clearly. It is evident from the trees that the Indian SARS-CoV-2 genomes are having a wide variation which we have also noticed in genomic analysis. These trees are generated using MEGA-X software. In addition to this, the aligned sequences are provided as supplementary5 for further use.

Fig. 8.

Fig. 8

BioCircos plots to illustrate the frequency of mutations across the Indian SARS-CoV-2 genomes through different tracks: (a) Substitutions (outer track 1), Deletions (track 2), Insertions (track 3), Clusters (track 4) and SNPs (inner track 5) (b) Substitutions (outer track 1) and SNPs (inner track 2).

Fig. 9.

Fig. 9

The change of entropy in Indian SARS-CoV-2 genomes for each genomic location after alignment with Reference Sequence from NCBI.

Fig. 10.

Fig. 10

Phylogenetic trees of (a) 566 and (b) 60 bootstrap sequences out of 566 Indian SARS-CoV-2 genomes with Reference Genome as marked in red color. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

4. Conclusion

In this paper, we have analyzed 566 Indian SARS-CoV-2 genomes in order to find the mutation as substitution, deletion and insertion as well as SNP. Our analysis has identified 100 clusters of mutations (mostly deletions), 1609 point mutations as substitution, deletion and insertion and 64 SNPs. Out of these 64 SNPs, 57 are present in the 6 coding regions. The purpose of finding SNPs is to identify the genomic locations that can be targeted to classify the virus strain in India. Apart from this, the major advantage is that for personalized vaccine these SNPs could be used to define the dose of the vaccine after identifying the proper strain of the virus. Moreover, for future research, these SNPs can be used to model the proteins and to see its conformational changes so that potential drag can be designed to target such proteins for Indian patients. We are currently working in this direction and also help the other researchers to conduct their research with the use of these SNPs.

Ethics approval and consent to participate

The ethical approval or individual consent was not applicable.

Consent for publication

Not applicable.

Funding

This work has been partially supported by CRG short term research grant on COVID-19 (CVD/2020/000991) from Science and Engineering Research Board (SERB), Department of Science and Technology, Govt. of India.

Author contributions

Indrajit Saha: Conceptualization; Data curation; Supervision; Funding acquisition; Formal analysis; Investigation; Methodology; Project administration; Resources; Validation; Visualization; Writing - original draft; Writing - review & editing, Nimisha Ghosh: Formal analysis; Software; Validation; Visualization; Writing - review & editing, Debasree Maity: Conceptualization; Data curation; Methodology; Writing - review & editing, Nikhil Sharma: Methodology; Writing - review & editing, Jnanendra Prasad Sarkar: Methodology; Writing - review & editing,  Kaushik Mitra: Conceptualization; Writing - review & editing

Declaration of Competing Interest

The authors declare that they have no conflict of interest.

Acknowledgment

We thank all those who have contributed sequences to GISAID database and reviewers for the valuable comments to improve the article.

Footnotes

Appendix A. Supplementary data

Supplementary material 1

mmc1.zip (20KB, zip)

Supplementary material 2

mmc2.zip (7.1KB, zip)

Supplementary material 3

mmc3.zip (1.2KB, zip)

Supplementary material 4

mmc4.zip (11.2KB, zip)

Supplementary material 5

mmc5.pdf (482.2KB, pdf)

References

  1. Chen N., Zhou M., Dong X., Qu J., Gong F., Han Y., Qiu Y., Wang J., Liu Y., Wei Y., Xia J., Yu T., Zhang X., Zhang L. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. 2020;395:507–513. doi: 10.1016/S0140-6736(20)30211-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Chothe S.K., Sebastian A., Thomas A., Nissly R.H., Wolfgang D., Byukusenge M., Mor S.K., Goyal S.M., Alber I., Tewari D., Jayarao B.M., Kuchipudi S.V. Whole-genome sequence analysis reveals unique snp profiles to distinguish vaccine and wild-type strains of bovine herpesvirus-1 (bohv-1) Virology. 2018;522:27–36. doi: 10.1016/j.virol.2018.06.015. [DOI] [PubMed] [Google Scholar]
  3. Cui J., Li F., Shi Z. Origin and evolution of pathogenic coronaviruses. Nat. Rev. Microbiol. 2019;17 doi: 10.1038/s41579-018-0118-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Fleischmann W.R.J. University of Texas Medical Branch at Galveston; 1996. Viral Genetics. Galveston (TX) [PubMed] [Google Scholar]
  5. Jenkins G.M., Rambaut A., Pybus O.G., Holmes E.C. Rates of molecular evolution in rna viruses: a quantitative phylogenetic analysis. J. Mol. Evol. 2002;54:156–165. doi: 10.1007/s00239-001-0064-3. [DOI] [PubMed] [Google Scholar]
  6. Jeon J.S., Won Y.H., Kim I.K., Ahn J.H., Shin O.S., Kim J.H., Lee C.H. Analysis of single nucleotide polymorphism among varicella-zoster virus and identification of vaccine-specific sites. Virology. 2016;496:277–286. doi: 10.1016/j.virol.2016.06.017. [DOI] [PubMed] [Google Scholar]
  7. Lu I.-N., Muller C.P., He F.Q. Applying next-generation sequencing to unravel the mutational landscape in viral quasispecies. Virus Res. 2020;283 doi: 10.1016/j.virusres.2020.197963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Paital B., Das K., Parida S.K. Inter nation social lockdown versus medical care against covid-19, a mild environmental insight with special reference to India. Sci. Total Environ. 2020;728 doi: 10.1016/j.scitotenv.2020.138914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Pavlovic-Lazetic G.M., Mitic N.S., Beljanski M.V. Bioinformatics analysis of sars coronavirus genome polymorphism. BMC Bioinforma. 2004 doi: 10.1186/1471-2105-5-65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Stuessy T.F. Columbia University Press; 2009. Plant Taxonomy: The Systematic Evaluation of Comparative Data. [Google Scholar]
  11. Su S., Wong G., Shi W., Liu J., Lai A.C.K., Zhou J., Liu W., Bi Y., Gao G.F. Epidemiology, genetic recombination, and pathogenesis of coronaviruses. Trends Microbiol. 2016;24:490–502. doi: 10.1016/j.tim.2016.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Thompson J.D., Higgins D.G., Gibson T.J. Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Wallace I.M., Blackshields G., Higgins D.G. Multiple sequence alignments. Curr. Opin. Struct. Biol. 2005;15 doi: 10.1016/j.sbi.2005.04.002. [DOI] [PubMed] [Google Scholar]
  14. Weiss S.R., Navas-Martin S. Coronavirus pathogenesis and the emerging pathogen severe acute respiratory syndrome coronavirus. Microbiol. Mol. Biol. Rev. 2005;(4) doi: 10.1128/MMBR.69.4.635-664.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Woo P.C.Y., Lau S.K.P., Huang Y., Yuen K. Coronavirus diversity, phylogeny and interspecies jumping. Exp. Biol. Med. 2009;234:1117–1127. doi: 10.3181/0903-MR-94. [DOI] [PubMed] [Google Scholar]
  16. Worldometer Coronavirus Disease 2019 (covid-19) Cases in India. 2020. https://www.worldometers.info/coronavirus/country/india/
  17. Zhou P., Yang X.L., Wang X.G., Hu B., Zhang L., Zhang W., Si H.R., Zhu Y., Li B., Huang C., Chen H., Chen J., Luo Y., Guo H., Jiang R., Liu M., Chen Y., Shen X., Wang X., Zheng X., Zhao K., Chen Q., Deng F., L L., Yan B., Zhan F., Wang Y., Xiao G., Shi Z. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270–273. doi: 10.1038/s41586-020-2012-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Zhu N., Zhang D., Wang W., Li X., Yang B., Song J., Zhao X., Huang B., Shi W., Lu R., Niu P., Zhan F., Ma X., Wang D., Xu W., Wu G., Gao G.F., Tan W. A novel coronavirus from patients with pneumonia in China, 2019. N. Engl. J. Med. 2020;382:727–733. doi: 10.1056/NEJMoa2001017. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material 1

mmc1.zip (20KB, zip)

Supplementary material 2

mmc2.zip (7.1KB, zip)

Supplementary material 3

mmc3.zip (1.2KB, zip)

Supplementary material 4

mmc4.zip (11.2KB, zip)

Supplementary material 5

mmc5.pdf (482.2KB, pdf)

Articles from Infection, Genetics and Evolution are provided here courtesy of Elsevier

RESOURCES