Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2022 Jan 18;12:936. doi: 10.1038/s41598-022-04976-8

Two short low complexity regions (LCRs) are hallmark sequences of the Delta SARS-CoV-2 variant spike protein

Arturo Becerra 1, Israel Muñoz-Velasco 1, Abelardo Aguilar-Cámara 1, Wolfgang Cottom-Salas 1,2, Adrián Cruz-González 1, Alberto Vázquez-Salazar 3, Ricardo Hernández-Morales 1, Rodrigo Jácome 1, José Alberto Campillo-Balderas 1, Antonio Lazcano 1,4,
PMCID: PMC8766472  PMID: 35042962

Abstract

Low complexity regions (LCRs) are protein sequences formed by a set of compositionally biased residues. LCRs are extremely abundant in cellular proteins and have also been reported in viruses, where they may partake in evasion of the host immune system. Analyses of 28,231 SARS-CoV-2 whole proteomes and of 261,051 spike protein sequences revealed the presence of four extremely conserved LCRs in the spike protein of several SARS-CoV-2 variants. With the exception of Iota, where it is absent, the Spike LCR-1 is present in the signal peptide of 80.57% of the Delta variant sequences, and in other variants of concern and interest. The Spike LCR-2 is highly prevalent (79.87%) in Iota. Two distinctive LCRs are present in the Delta spike protein. The Delta Spike LCR-3 is present in 99.19% of the analyzed sequences, and the Delta Spike LCR-4 in 98.3% of the same set of proteins. These two LCRs are located in the furin cleavage site and HR1 domain, respectively, and may be considered hallmark traits of the Delta variant. The presence of the medically-important point mutations P681R and D950N in these LCRs, combined with the ubiquity of these regions in the highly contagious Delta variant opens the possibility that they may play a role in its rapid spread.

Subject terms: Molecular evolution, SARS-CoV-2, Viral evolution

Introduction

Protein segments that exhibit a bias in their composition can be formed by (a) a small number of different amino acids, in which case they are called low complexity regions (LCRs); or (b) homopolymers or homorepeats, if they consist of a long repetition of a single amino acid1,2. LCRs tend to be more prevalent in proteins associated with polysaccharide-, ion-, and nucleic acid binding, as well as in phospholipid interaction, transcription, translation, and folding functions3. It is estimated that approximately 0.4% of eukaryotic proteomes are LCRs, which is up to 23 times higher than in prokaryotes3.

LCRs emergence has been associated with replication slippage and the formation of microsatellites during genome replication or recombination events4,5. The regions of the proteins in which the LCRs are located evolve rapidly, but there is an ongoing debate whether they change neutrally or under selective pressures6. Given the immunological significance of pathogens’ surface proteins in which many LCRs are located5,710, it is somewhat surprising that little attention has been given to their presence in viral proteomes. In sensu stricto, the presence and location of LCRs in viruses has only been reported in the HIV-19 and, more recently, in SARS-CoV-211. They are rather abundant in the HIV-1 gp120 protein, and over 30% of them are located in the hypervariable regions of the connecting loops present in the protein, where they may play a role in immune escape9. LCRs are scattered throughout the SARS-CoV-2 proteome, and are more prevalent in the non-structural protein 3, spike protein, and the nucleocapsid protein, where they may simultaneously enhance immune evasion and induce a strong immunogenic response11. However, they are conspicuously absent in several proteins of the replication-transcription complex (RdRp, helicase, and NSP14 exonuclease), and in the NSP1, 3CL protease, NSP9-11, NSP15, ORF3a, membrane (M) protein, ORF6, ORF8 and ORF10 proteins11.

In addition to LCRs, the presence of nucleotide simple sequence repeats (SSR) has also been reported in viral genomes. SSRs are DNA segments of tandemly repeated nucleotide motifs (e.g. di-, tri-, tetra, or penta-nucleotides) found in prokaryotic and eukaryotic genomes. Like LCRs, SSRs have also been associated with increased adaptability, as well as with enhanced recombination rates and indel generation in both cells and in viruses1215. Viral SSRs are present in both RNA- and DNA viruses, including DNA mycobacteriophages14, economically relevant plant viruses such as potyviruses16, tobamoviruses17, and geminiviruses18, as well as in medically important viruses like herpesviruses19, HIV-120 and filoviruses21. More recently, several SSRs rich in hexameric repeats have been identified in the SARS-CoV-2 genome22,23, which appear to be more prevalent in the ORF1ab, S, ORF3a, N and ORF7a of the SARS-CoV-2 genic regions23.

In this work, we report the conservation and variability of LCRs in several SARS-CoV-2 variants of concern (VOC) and interest (VOI) using comparative proteomics and protein structure analyses. We have identified three previously unreported LCRs that are present only in some VOIs and VOCs (Fig. 1). Quite significantly, these LCRs do not exhibit a random distribution in the proteins where they are located. Two of them are located in highly conserved positions of the spike S1 and S2 subunits of the extremely contagious Delta variant. Our results demonstrate that these two conserved (98–99%) short LCRs are hallmark sequences of the highly transmissible Delta SARS-CoV-2 variant, which suggest that they might play a significant role in the viral adaptation and rapid spread of this VOC.

Figure 1.

Figure 1

LCRs in VOCs (Alpha, Beta, Gamma and Delta variants), VOIs (Epsilon, Eta, Iota, Kappa and Lambda variants) and Other SARS-CoV-2 proteomes (Others). (a) LCRs present in ORF1ab, which includes nsp1-4, 3CL protease (3CL), nsp6-nsp11, RdRp polymerase (RdRp), Helicase (Hel) and nsp 14-16. (b) LCRs along spike, ORF3a (3a), envelope (E), membrane (M), ORFs 6, 7a, 7b, 8, nucleocapsid (N) and ORF10. The Spike LCR sequences reported here in the Delta, Iota and Kappa variants are represented with red lines. The width of the lines is not proportional to the number of sequences in each variant.

Results

In this work, we have analyzed a total of 28,231 SARS-CoV-2 whole proteomes (July 17, 2021) and 261,051 spike protein sequences (November 4, 2021) to search for LCRs. As summarized in Figs. 1, 2, and Figure S1, our results indicate that most of the LCRs are present in the viral reference genome and its variants. However, we have detected important differences in the prevalence of these LCRs between the SARS-CoV-2 VOCs and VOIs proteomes.

Figure 2.

Figure 2

Complexity of the spike proteins of VOIs, VOCs and other SARS-CoV-2. The x axis shows the number of amino acid residues and the y axis shows the complexity level. A) The complexity level of each variant is in a different color: Iota—blue; Delta—dark red; Kappa—green. B) Complexity of the Delta spike proteins. The complexity level of the subsets is in a different color: Delta sensu lato—dark red; Variants AY—salmon pink; lineage B.617.2—bright red.

As shown in Figs. 1 and S1, the Spike LCR-1 formed by the sequence FVFLVLLPLV is present between residues 2 and 11 of all the spike proteins11 except for the Iota variant. Here, we report three previously undescribed, highly prevalent, short specific LCRs in the spike proteins of the Delta-, Iota-, and Kappa variants (Spike LCR-2, Spike LCR-3, and Spike LCR-4) (Figs. 1, 2 and Figure S1). In this work we have named each LCR according to the following rules: the first word of the name corresponds to the protein in which the LCR is located, and the number corresponds to its position in each of the SARS-CoV-2 proteins (Table S3). The overall properties of the LCR’s described here, are summarized in Table 1. Figure 3a displays the actual location of these LCRs in a spike protein 3D structure (PDB ID: 7BNM). The LCR which we have named Spike LCR-2 (Fig. 3) is located between the residues 252 and 264 of the N-terminal domain (NTD) of the Iota variant spike protein (Fig. 3c). The sequence of this LCR is GGSSSGWTAGAAA (Fig. 3b and Table 1), and it is present in 79.87% of the Iota variants from the proteomes sample. In contrast, this LCR is absent in the Eta-, and Kappa variants (Fig. S1), and its prevalence in other VOCs, VOIs, and other SARS-CoV-2 samples is below 3% (Fig. S1). Analysis of the spike protein sequences database yielded similar results, indicating that this LCR is present in 99.02% of Iota variants and practically absent in others.

Table 1.

Spike proteins LCRs reported in this work for the Delta, Iota and Kappa variants.

Name Spike LCR  ~ Start position  ~ End position Variants Nº of amino acids * Molecular
Weight *
Theoretical pI * Aliphatic
index *
Grand average of hydropathicity *
Spike LCR-1 FVFLVLLPLV 2 11 All (except for Iota) 10 1159.52 5.52 243.00 3.180
Spike LCR-2 GGSSSGWTAGAAA 252 264 Iota 13 1079.09 5.52 30.77 0.123
Spike LCR-3 SRRRARSVASQSIIA 680 694 Delta, Kappa 15 1657.90 12.48 91.33 − 0.407
Spike LCR-4 LQNVVNQNAQALN 948 960 Delta 13 1425.56 5.52 120.00 − 0.377

The position of each LCR in the spike sequence is indicated. The first letter of the name corresponds to the protein in which the LCR is located, and the number corresponds to its position in each of the SARS-CoV-2 proteins. (*) This information was derived from https://www.expasy.org/resources/protparam.

Figure 3.

Figure 3

(a) The SARS-CoV-2 spike protein three-dimensional structure (by cryo-electron microscopy24, PDB code: 7BNM). The structure corresponds to a trimer, where each monomer is represented with a different color. The two subunits that make up each monomer (Subunit 1, also known as Head region, and Subunit 2, or the Stalk region) are indicated. (b) Domain organization of the spike protein. The position of each of the LCRs found in this work, together with the mutations present in each variant spike protein are shown. The position of the signal peptide (SP) and Spike LCR-1 are indicated. The green arrow in the Spike LCR-3 box indicates the furin cleavage site. (c) Monomer of the spike protein. Close ups of each of the structural regions corresponding to the different LCRs are shown in colored boxes. The sequences of each LCR are represented, with the mutations indicated with a red letter. Protein structures in panels a and c were rendered using PyMOL (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.). Panel b was created with BioRender.com. Abbreviations: SP, signal peptide; NTD, N-terminal domain; RBD, receptor-binding domain; RBM, receptor-binding motif; SD1, subdomain 1; SD2, subdomain 2; FP, fusion peptide; HR1, heptad repeat 1; CH, central helix; CD, connector domain; HR2, heptad repeat 2; TM, transmembrane domain.

The Spike LCR-3 (Delta-Kappa prevalent) is positioned between residues 680 and 694 in the Delta- and Kappa spike protein variants (Fig. 3b). Its sequence is the polybasic, conserved 15 amino acid segment SRRRARSVASQSIIA (Table 1), that is located precisely in the furin cleavage site in the S1 C-terminus, whose tertiary structure has not been visualized due to its inherent flexibility24 (Fig. 3c). In the proteome sample we have analyzed, this LCR is found in 99.19% of the Delta variants (Fig. S1). As shown in Fig. 2, the complexity value of this region in the Delta- and Kappa variants is significantly lower in comparison with the rest of the protein; however, a small number of the Delta sequences (39/4830) do not surpass our cutoff value due to the presence of amino acid substitutions that raise the complexity value of the regions. Analysis of the spike protein sequences database shows that the Spike LCR-3 is present in 99.44% of Delta variants (B.1.617.2) but appears only in 0.52% of other variants.

The other LCR, or Spike LCR-4, has the conserved 13-aa polar-rich sequence LQNVVNQNAQALN, and is located between residues 946 and 958 of the spike protein of the highly transmissible Delta variant. It is found in an alpha-helix rich domain (HR1) (Fig. 3b) that is part of the spike protein S2 stalk region (Fig. 3a). In the proteome dataset, this low complexity region is present in practically every Delta variant; only 1.7% do not surpass the LCR cutoff value defined here (Fig. 1, Fig. S1). In the Beta-, Eta-, and Kappa variants analyzed here, this LCR is completely absent (Fig. S1), whereas in the other SARS-CoV-2 categories, its prevalence is below 2% (Fig. S1). The analysis of the spike protein set shows that the Spike LCR-4 is present in 98.13% of Delta variants (B.1.617.2) and is missing in 99.88% of the other variants in our sample (Fig. 2b and Supplementary Table S1).

In the Alpha variant sequences analyzed here, the NSP3 LCR-3 (Fig. S1) is missing in 98.95% of the proteomes. The Lambda NSP3 LCR-4 (Fig. S1) is absent in 98.84% of the analyzed proteomes.

The available information does not allow any inference on the possible geographical distribution of the different SARS-CoV-2 spike proteins where the LCRs reported here are located (Table S2).

Discussion and conclusions

LCRs are found in a broad spectrum of proteins and appear to contribute to the antigenic variability in both viral and cellular pathogen populations. Although polymerase slippage events may be involved2527, the mechanisms that produce viral LCRs are poorly understood. The processes that lead to the LCR preservation in highly streamlined genomes, such as those of most RNA viruses, are not well understood, and their tempo and mode of evolution remain open issues. However, the conservation of the two small LCRs (Spike LCR-3, Spike LCR-4) reported here in the rapidly spreading Delta variant suggests that together with mutations found in the nucleocapsid28 they may be part of its hallmark traits. Accordingly, a detailed analysis of their frequency and phenotypic significance may contribute to the understanding of the origin of this variant’s increased transmissibility. Dozens of Delta subvariants have been reported throughout the world since the original submission of this paper. All these subvariants have different defining mutations29 and their properties are still being investigated. Our analyses of the spike proteins of these variants show that a highly significant percentage are endowed with (Table S1) the same LCRs described in the original SARS-CoV-2 Delta spike protein itself30,31.

The Spike LCR-1 (FVFLVLLPLV) is a highly hydrophobic region that consists of helix-forming residues, including phenylalanine, valine and leucine, and it is the major component of the signal peptide (amino acids 1–13) located upstream of the N-terminus domain32,33 (Fig. 3). In the lumen of the endoplasmic reticulum this signal peptide plays a key role in guiding the spike protein to its membrane location by cellular signal peptidases34.

As noted above, the Kappa/Delta Spike LCR-3 and the Delta Spike LCR-4 regions are located in the spike S1 and S2 subunits, respectively. The mutation P681R detected in the Spike LCR-3 (SRRRARSVASQSIIA) (Fig. 3) at the furin cleavage site increases the polybasic nature of this region, which could augment its affinity with the furin protease35. In vitro experiments and SARS-CoV-2 infections in animal models have demonstrated that the P681R mutation enhances both the fusogenicity and pathogenicity of the virus36. The phylogenetic relation between the Kappa- and Delta variants, both of which are part of the lineage B.1.61737,38, very likely explains the presence of these two mutations in both the Delta- and the Kappa Spike LCR-3 (Figs. 1 and S1).

The ectodomain of the SARS-CoV-2 spike protein is endowed with two heptad repeat motifs (HR1 and HR2) which are involved in cell fusion, which is a key step in viral entry39,40. The Spike LCR-4 (LQNVVNQNAQALN) includes charged-neutral, polar (asparagine and glutamine) and hydrophobic amino acids (leucine, valine, and alanine), which are typical of heptad repeat motifs. The interaction of HR1 and HR2 leads to the formation of a six-helical bundle that mediates cell fusion39. Accordingly, it is possible that the asparagine (N) of the mutation D950N (Fig. 3) of the Spike LCR-4 may enhance the stabilization of the post-fusion hairpin conformation, since the conservation of the N and Q residues of HR1 is known to play an important role in the arrangement of hydrogen-bonding zippers that force HR2 to adopt its final conformation in SARS-CoV40. The structural relevance of this region has been demonstrated by studies with other RNA viruses, in which the use of fusion inhibitors that disrupt HR1-HR2 conformational changes, are known to limit viral entry41,42.

Although there may be minuscule variations in the LCRs length and/or amino acid composition, the segments described in this work fall well within the low complexity category and open the possibility that their biased composition may confer adaptive advantages to the Delta variant. For instance, the polybasic Spike LCR-3, which includes several arginines in its N-terminus, is a highly conserved sequence located precisely in the furin cleavage site at spike S1/S2, which is essential for membrane fusion, and plays a key role in viral infection and transmission4244.

The use of the stringent cut-off value used here (W = 12, K1 = 1.9, K2 = 2.1) shows that, except for a limited number of sequences of the Spike LCR-3 and the Spike LCR-4, these two LCRs are extremely prevalent (99.19% and 98.3% of all proteomes, and 99.44% and 98.13 of the subset of spike protein sequences). Although they display the biological traits of typical low complexity regions (Fig. 2), the multiple sequence alignments (Supplementary file 1 and 2) of the sequences that escape our cutoff values show single point mutations within these LCRs. These single-amino acid substitutions increase the complexity of the fragments and prevent their detection by the methodology employed here.

The SARS-CoV-2 Delta variant was detected in the late 202037, and the proteomic traits described here may contribute together with other features to explain in part its rapid worldwide expansion. The role of LCRs in enhancing sequence variability in surface proteins of viral and cellular pathogens has been postulated5,9,11. The conservation of the position and the sequence of two LCRs (Spike LCR-3 and Spike LCR-4) in the Delta variant we have described here highlights the importance of LCRs, which might lead to the evolution and development of new functions or the improvement of existing ones.

Simple repeats have been shown to lead to variations in genome size in cellular systems45. However, although compositionally biased sequences in SARS-CoV-2 are quite ubiquitous in most of the coronaviral proteins (Fig. 1 and S1), they do not contribute significantly to the increase of its genome size. In contrast, we hypothesize that the high conservation of the two LCRs in the Delta spike protein suggests that, together with the seven mutations present in this variant, they are part of the phenotypic traits associated with its high infectivity. Laboratory studies are required to confirm the possibility that the presence of compositionally biased segments in the Delta variant spike protein may be related to increased transmission, which is part of the defining features of VOCs and VOIs4648.

Methods

SARS-CoV-2 proteome sequences

To retrieve a list of proteomes meeting the requirements to be considered as input to the pipeline (https://github.com/abelardoacm/SARS-COV2_LCRs), we downloaded metadata of all the sequences available on the China National Center for Bioinformation web portal (https://ngdc.cncb.ac.cn/news/85) on July 17, 2021. The entries were filtered, keeping only those that corresponded to complete proteomes (Nuc. Completeness = Complete), with high sequence quality (Sequence Quality = High) available in NCBI GenBank (Data Source = GenBank). The proteome sample size per variant was limited to a maximum of 4,000 sequences, a figure comparable to the numbers of the Alpha- and Delta samples analyzed here and included multiple geographical regions (217 locations from 64 countries) that were sampled between January 20, 2020 and July 17, 2021. A subset was made for each variant classified either as a VOC (Alpha n = 3903; Beta n = 384; Gamma n = 4000; and Delta n = 4830) or as a VOI (Eta n = 363; Iota n = 4000; Kappa n = 115; and Lambda n = 259). We have also included proteomes from a random sampling using the R sample{base} function, of 10,377 non-VOC/VOI that met the same quality criteria and were classified as "Others SARS-CoV-2" (Others SARS-CoV-2 n = 10,377). Proteomes were downloaded using NCBI batch entrez. Accessions with empty fields in their metadata were discarded, leaving a total of 28,231 proteome files (Supplementary file 3).

SARS-CoV-2 spike protein database

To broaden our analyses, we also included 261,051 spike protein sequences downloaded from the NCBI Virus database (www.ncbi.nlm.nih.gov/labs/virus/vssi/#/ surface glycoprotein) available up to November 4, 2021. In this database 6,514 sequences correspond to Delta variants (3,269 B.1.617.2 and 3,235 subvariant AY), and 254,537 correspond to other variants (Table S1).

Detection of low complexity regions (LCRs)

To search for the LCRs in the sample, the SEG49 algorithm was used with W = 12, K1 = 1.9, K2 = 2.1 parameters, which are slightly stricter than the default values (W = 12, K1 = 2.2, K2 = 2.5). The pipeline "SARS-COV2_LCRs" was built to couple annotation data from genomic GenBank files with SEG output files and locate and identify LCRs within each genome.

A "genomic features" csv-file containing coordinates for both genes and proteins was prepared, which served as a template to create a proteomic fasta enriched with location information. All the PERL and R scripts we have employed are available at https://github.com/abelardoacm/SARS-COV2_LCRs.git.

Once all LCRs were identified within all proteomes and spike protein sequences in our sample, their frequency was calculated using an R script (Fig. S1). From this analysis, LCRs of interest were selected based on their high prevalence in each variant proteome dataset (Table S1). Subsequently, a LCRs of interest presence matrix was calculated by an R script and used as input to plot the total counts per variant and number of versions per low complexity region (Fig. 2). The amino acid composition of the 4830 Delta spike sequences was analyzed with a multiple sequence alignment built with MUSCLE50 v3.8.1551, followed by an amino acid Logo representation (Fig. S2) made with the WebLogo 3 program (http://weblogo.threeplusone.com/create.cgi51).

Supplementary Information

Supplementary Figure S2. (24.9MB, tiff)
Supplementary Tables. (28.4KB, docx)

Acknowledgements

WC-S is a PhD student from the Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México (UNAM) and received fellowship CVU-815057 from CONACyT. AA-C is a MSc student from the Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México (UNAM) and received fellowship CVU-1034340 from CONACyT. AC-G received a fellowship from CONACyT CVU-1002377. Support from DGAPA-PAPIIT (IN214421), DGAPA-PAPIME (PE204921) and SRE-AMEXCID (CH.06.UNAM) is gratefully acknowledged.

Author contributions

All authors contributed equally to the results and analyses presented here. All authors reviewed the manuscript.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-022-04976-8.

References

  • 1.Haerty W, Golding GB. Low-complexity sequences and single amino acid repeats: Not just “junk” peptide sequences. Genome. 2010;53:753–762. doi: 10.1139/g10-063. [DOI] [PubMed] [Google Scholar]
  • 2.Mier P, et al. Disentangling the complexity of low complexity proteins. Brief Bionform. 2020;21:458–472. doi: 10.1093/bib/bbz007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ntountoumi C, et al. Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved. Nucleic Acids Res. 2019;47:9998–10009. doi: 10.1093/nar/gkz730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Jorda J, Kajava AV. Protein homorepeats sequences, structures, evolution, and functions. Adv. Protein Chem. Str. 2010;79:59–88. doi: 10.1016/S1876-1623(10)79002-7. [DOI] [PubMed] [Google Scholar]
  • 5.Kajava AV. Tandem repeats in proteins: From sequence to structure. J. Struct. Biol. 2012;179:279–288. doi: 10.1016/j.jsb.2011.08.009. [DOI] [PubMed] [Google Scholar]
  • 6.Haerty W, Golding GB. Genome-wide evidence for selection acting on single amino acid repeats. Genome Res. 2010;20:755–760. doi: 10.1101/gr.101246.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Fankhauser N, Nguyen-Ha T-M, Adler J, Mäser P. Surface antigens and potential virulence factors from parasites detected by comparative genomics of perfect amino acid repeats. Proteome Sci. 2007;5:20. doi: 10.1186/1477-5956-5-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mendes TAO, et al. Repeat-enriched proteins are related to host cell invasion and immune evasion in parasitic protozoa. Mol. Biol. Evol. 2013;30:951–963. doi: 10.1093/molbev/mst001. [DOI] [PubMed] [Google Scholar]
  • 9.Velasco MA, et al. Low complexity regions (LCRs) contribute to the hypervariability of the HIV-1 gp120 protein. J. Theor. Biol. 2013;338:80–86. doi: 10.1016/j.jtbi.2013.08.039. [DOI] [PubMed] [Google Scholar]
  • 10.Mier P, Andrade-Navarro MA. The conservation of low complexity regions in bacterial proteins depends on the pathogenicity of the strain and subcellular location of the protein. Genes (Basel) 2021;12(3):451. doi: 10.3390/genes12030451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gruca A, et al. Common low complexity regions for SARS-CoV-2 and human proteomes as potential multidirectional risk factor in vaccine development. BMC Bioinform. 2021;22:182. doi: 10.1186/s12859-021-04017-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Li YC, Korol AB, Fahima T, Nevo E. Microsatellites within genes: structure, function, and evolution. Mol. Biol. Evol. 2004;21:991–1007. doi: 10.1093/molbev/msh073. [DOI] [PubMed] [Google Scholar]
  • 13.Lin WH, Kussell E. Evolutionary pressures on simple sequence repeats in prokaryotic coding regions. Nucleic Acids Res. 2012;40(6):2399–2413. doi: 10.1093/nar/gkr1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Alam CM, Iqbal A, Sharma A, Schulman AH, Ali S. Microsatellite diversity, complexity, and host range of mycobacteriophage genomes of the siphoviridae family. Front. Genet. 2019;10:207. doi: 10.3389/fgene.2019.00207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Laskar, R., Jilani, M. G. & Ali, S. Implications of genome simple sequence repeats signature in 98 Polyomaviridae species. 3 Biotech.11, 35; 10.1007/s13205-020-02583-w (2021). [DOI] [PMC free article] [PubMed]
  • 16.Zhao X, et al. Microsatellites in different Potyvirus genomes: Survey and analysis. Gene. 2011;488:52–56. doi: 10.1016/j.gene.2011.08.016. [DOI] [PubMed] [Google Scholar]
  • 17.Alam CM, Singh AK, Sharfuddin C, Ali S. In-silico analysis of simple and imperfect microsatellites in diverse tobamovirus genomes. Gene. 2013;530:193–200. doi: 10.1016/j.gene.2013.08.046. [DOI] [PubMed] [Google Scholar]
  • 18.George B, Mashhood Alam C, Jain SK, Sharfuddin C, Chakraborty S. Differential distribution and occurrence of simple sequence repeats in diverse geminivirus genomes. Virus Genes. 2012;45:556–566. doi: 10.1007/s11262-012-0802-1. [DOI] [PubMed] [Google Scholar]
  • 19.Wu X, Zhou L, Zhao X, Tan Z. The analysis of microsatellites and compound microsatellites in 56 complete genomes of Herpesvirales. Gene. 2014;551:103–109. doi: 10.1016/j.gene.2014.08.054. [DOI] [PubMed] [Google Scholar]
  • 20.Chen M, et al. Similar distribution of simple sequence repeats in diverse completed Human Immunodeficiency Virus Type 1 genomes. FEBS Lett. 2009;583:2959–2963. doi: 10.1016/j.febslet.2009.08.004. [DOI] [PubMed] [Google Scholar]
  • 21.Alam C, Sharfuddin C, Ali S. Analysis of simple and imperfect microsatellites in Ebolavirus species and other genomes of Filoviridae family. Gene Cell Tissue. 2015;2(2):e26204. doi: 10.17795/gct-26204. [DOI] [Google Scholar]
  • 22.Satyam R, et al. Deciphering the SSR incidences across viral members of Coronaviridae family. Chem Biol Interact. 2020;331:109226. doi: 10.1016/j.cbi.2020.109226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Siddiqe R, Ghosh A. Genome-wide in silico identification and characterization of Simple Sequence Repeats in diverse completed SARS-CoV-2 genomes. Gene Rep. 2021;23:101020. doi: 10.1016/j.genrep.2021.101020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Benton DJ, et al. The effect of the D614G substitution on the structure of the spike glycoprotein of SARS-CoV-2. Proc. Natl. Acad. Sci. U. S. A. 2021;118:e2022586118. doi: 10.1073/pnas.2022586118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hancock JM, Chaleeprom W, Chaleeprom W, Dale J, Gibbs A. Replication slippage in the evolution of potyviruses. J. Gen. Virol. 1995;76(Pt 12):3229–3232. doi: 10.1099/0022-1317-76-12-3229. [DOI] [PubMed] [Google Scholar]
  • 26.Rodamilans B, et al. RNA polymerase slippage as a mechanism for the production of frameshift gene products in plant viruses of the potyviridae family. J. Virol. 2015;89(13):6965–6967. doi: 10.1128/JVI.00337-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Stewart H, Olspert A, Butt BG, Firth AE. Propensity of a picornavirus polymerase to slip on potyvirus-derived transcriptional slippage sites. J. Gen. Virol. 2019;100(2):199–205. doi: 10.1099/jgv.0.001189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Syed AM, et al. Rapid assessment of SARS-CoV-2 evolved variants using virus-like particles. Science. 2021 doi: 10.1126/science.abl6184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.O’Toole Á, et al. Tracking the international spread of SARS-CoV-2 lineages. Wellcome Open Res. 2021 doi: 10.12688/wellcomeopenres.16661.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Roy B, Roy H. The Delta Plus variant of COVID-19: Will it be the worst nightmare in the SARS-CoV-2 pandemic? J. Biomed. Sci. 2021;8:1–2. doi: 10.3126/jbs.v8i1.38449. [DOI] [Google Scholar]
  • 31.Alexandar S, Ravinsakar M, Senthil Kumar R, Jakkan K. A Comprehensive review on Covid-19 Delta variant. Int. J. Clin. Pharmacol. Res. 2021;5:83–85. [Google Scholar]
  • 32.Xia X. Domains and functions of spike protein in SARS-CoV-2 in the context of vaccine design. Viruses. 2021;13:109. doi: 10.3390/v13010109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Duan L, et al. The SARS-CoV-2 Spike glycoprotein biosynthesis, structure, function, and antigenicity: Implications for the design of Spike-based vaccine immunogens. Front. Immunol. 2020 doi: 10.3389/fimmu.2020.576622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Peacock TP, et al. The furin cleavage site in the SARS-CoV-2 spike protein is required for transmission in ferrets. Nat. Microbiol. 2021;6:899–909. doi: 10.1038/s41564-021-00908-w. [DOI] [PubMed] [Google Scholar]
  • 35.Frazier, L. E. et al. Spike protein cleavage-activation mediated by the SARS-CoV-2 P681R mutation: A case-study from its first appearance in variant of interest (VOI) A.23.1 identified in Uganda. Preprint at https://www.biorxiv.org/content/10.1101/2021.06.30.450632v5 (2021). [DOI] [PMC free article] [PubMed]
  • 36.Saito, A. et al. SARS-CoV-2 spike P681R mutation, a hallmark of the Delta variant, enhances viral fusogenicity and pathogenicity. Preprint at https://www.biorxiv.org/content/10.1101/2021.08.12.456173v3 (2021).
  • 37.GISAID. Tracking of Variants. Retrieved on July 14, from https://www.gisaid.org/hcov19-variants/ (2021).
  • 38.Centers for Disease Control and Prevention. COVID Data Tracker. Retrieved on July 14, from https://covid.cdc.gov/covid-data-tracker/#variant-proportions (2021).
  • 39.Xia S, et al. Fusion mechanism of 2019-nCoV and fusion inhibitors targeting HR1 domain in spike protein. Cell. Mol. Immunol. 2020;17:765–767. doi: 10.1038/s41423-020-0374-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Duquerroy S, Vigouroux A, Rottier PJM, Rey FA, Jan Bosch B. Central ions and lateral asparagine/glutamine zippers stabilize the post-fusion hairpin conformation of the SARS coronavirus spike glycoprotein. Virology. 2005;335:276–285. doi: 10.1016/j.virol.2005.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Feng M, Bell DR, Kang H, Shao Q, Zhou R. Exploration of HIV-1 fusion peptide–antibody VRC34.01 binding reveals fundamental neutralization sites. Phys. Chem. Chem. Phys. 2019;21:18569–18576. doi: 10.1039/C9CP02909E. [DOI] [PubMed] [Google Scholar]
  • 42.Ispas G, et al. Antiviral activity of TMC353121, a Respiratory Syncytial Virus (RSV) fusion inhibitor, in a non-human primate model. PLoS ONE. 2015;10:e0126959. doi: 10.1371/journal.pone.0126959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Wu Y, Zhao S. Furin cleavage sites naturally occur in coronaviruses. Stem Cell Res. 2021;50:102115. doi: 10.1016/j.scr.2020.102115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Scudellari M. How the coronavirus infects cells—and why Delta is so dangerous. Nature. 2021;595:640–644. doi: 10.1038/d41586-021-02039-y. [DOI] [PubMed] [Google Scholar]
  • 45.Hancock JM. Genome size and the accumulation of simple sequence repeats: Implications of new data from genome sequencing projects. Genetica. 2002;115:93–103. doi: 10.1023/A:1016028332006. [DOI] [PubMed] [Google Scholar]
  • 46.Weisblum Y, et al. Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants. Elife. 2020 doi: 10.7554/eLife.61312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Davies NG, et al. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science. 2021;372:eabg3055. doi: 10.1126/science.abg3055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zhou W, Wang W. Fast-spreading SARS-CoV-2 variants: challenges to and new design strategies of COVID-19 vaccines. Signal Transduct. Target Ther. 2021;6:226. doi: 10.1038/s41392-021-00644-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Wootton J, Federhen S. Statistics of local complexity in amino acid sequence and sequences database. Comput. Chem. 1993;17:149–163. doi: 10.1016/0097-8485(93)85006-X. [DOI] [Google Scholar]
  • 50.Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator. Genome Res. 2004;14(6):1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Figure S2. (24.9MB, tiff)
Supplementary Tables. (28.4KB, docx)

Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES