Abstract
SARS-CoV-2 is the virus responsible for the COVID-19 and has afflicted the world since the end of 2019. Different lineages have been discovered and the Gamma lineage, which started the second wave of infections, was first described in Brazil, one of the most affected countries by pandemic. Therefore, this study analyzed SARS-CoV-2 sequenced genomes from Esteio city in Rio Grande do Sul, Southern Brazil. We also comparatively analyzed genomes of the two first years of the pandemic from Rio Grande do Sul state for understanding their genomic and evolutionary patterns. The phylogenomic analysis showed monophyletic groups for Alpha, Gamma, Delta and Omicron, as well as for other circulating lineages in the state. Molecular evolutionary analysis identified several sites under adaptive selection in membrane and nucleocapsid proteins which could be related to a prevalent stabilizing effect on membrane protein structure, as well as majoritarily destabilizing effects on C-terminal nucleocapsid domain.
Keywords: SARS-CoV-2, Genomics, COVID-19, Molecular evolution, Phylogenomics
1. Introduction
After the first outbreak of COVID-19 (Coronavirus disease 2019) in Wuhan, Hubei Province, China in December 2019 (Zhu et al., 2020), the new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spread around the world. Starting a world pandemic, declared by the World Health Organization (WHO) in March 2020, COVID-19 already exceeds 649 million cases and 6.65 million deaths until December 2022 (Dong et al., 2020, https://coronavirus.jhu.edu/map.html accessed on December 13, 2022). Currently (December 2022), Brazil is the fifth country most affected by SARS-CoV-2, reaching the mark of 35 million confirmed cases and more than 690 thousands deaths (Dong et al., 2020, https://coronavirus.jhu.edu/map.html accessed on December 13, 2022). Of these, 7.9% of the cases and 6% of the deaths are from Rio Grande do Sul (RS), the southernmost state of Brazilian territory and the fourth state in the ranking of COVID-19 cases (https://covid.saude.gov.br/, accessed on December 13, 2022).
Among several lineages, along these two years of pandemic, various Variants of Concern (VOCs), such as Alpha, Beta, Gamma, Delta, and Omicron, carrying signature amino acid substitutions (especially in the spike protein) circulated in RS state (Gularte et al., 2022; Wink et al., 2022; Gräf et al., 2022). The sublineage P.1.2, a Gamma-like variant also considered as a Variant of Interest, was initially identified in RS state (Franceschi et al., 2021a), despite other studies estimating its divergence between late 2020 and early 2021 in São Paulo state (Junior et al., 2021). As in Brazil, an increasing number of cases and deaths by Gamma (P.1 lineage) became evident in RS in the beginning of 2021, causing a second COVID-19 wave. The P.1 lineage harbors mutations in the Spike's receptor-binding domain (RBD) such as E484K, K417T, and N501Y, which promote evasion from antibody neutralization elicited by infection or vaccination (R. E. Chen et al., 2021; Chakraborty, 2022).
In December 2020, the Delta variant was described initially in India. This variant is highly transmissible and spreads easily, causing new waves of infection around the world by the middle 2021 (Shiehzadegan et al., 2021). However, despite this lineage rapidly becoming dominant in Brazil (including RS state), in July/August, 2020, there was not a concurrent increase in reported cases or deaths (Giovanetti et al., 2022). By the end of 2021, the number of cases of COVID-19 were progressively decreasing, until the emergence of the Omicron variant, confirmed in November 2021, in South Africa (Wang and Powell, 2021). This new variant is a concern due to the mutations on RBD and cleavage sites that suggest higher transmissibility, up to three times more contagious than Delta variant (J. Chen et al., 2022).
Despite all VOC characterizations mostly focused on spike mutations, structural proteins E (Envelope), M (Membrane), and N (Nucleocapsid) are functionally important to the virus assembly and pathogenesis (Yadav et al., 2021). The N protein has been associated to the promotion of inflammatory processes by activation of cyclooxygenase-2 (COX-2), to interaction with p42 proteasome in order to avoid the degradation of viral proteins, and to inhibition of IFN-I in immune response (Satarker and Nampoothiri, 2020). Moreover, M protein is known to inhibit NFκB, to reduce levels of COX-2, to activate IFN-β, and to interact with PDK1/PKB proteins, leading to cell death or apoptosis.
In this way, this study aims to perform genomic sequencing and characterization of the SARS-CoV-2 genomes from RS state as well as to identify selection traits in E, M, and N protein sites, elucidating the molecular evolution processes that drive the diversification or conservation of the structural proteins from SARS-CoV-2 in Rio Grande do Sul state. Since most studies focus on Spike protein evolution, we brought some useful data to the understanding of mutations potentially related to immunologic responses mediated by structural proteins in the fourth state of Brazil most affected by COVID-19.
2. Materials and methods
2.1. Sample collection and clinical testing
Nasopharyngeal swab samples were analyzed by Laboratório Central de Saúde Pública do Estado do Rio Grande do Sul (LACEN) (Porto Alegre, Rio Grande do Sul, Brazil) using RT-qPCR AllPlex SARS-CoV-2 assays Seegene Inc. Seoul, Republic of Korea with primers and probes targeting the RNA dependent RNA Polymerase (RdRP) Nucleocapsid (N) and Envelope (E) genes as recommended by the World Health Organization, with remnant samples stored at −20 °C. For the sequencing protocol, positive samples in the first RT-qPCR between April 09, 2021 to June 29, 2021, were selected and submitted to a second RT-qPCR, which was performed by BiomeHub (Florianópolis, Santa Catarina, Brazil), with a charite-berlin protocol. Samples with quantification cycle (Cq) up to 30 for at least one primer were selected for SARS-CoV-2 genome sequencing and assembly by the BiomeHub laboratory. In total, 12 samples who tested positive for SARS-CoV-2 RT-qPCR were included in the study.
2.2. SARS-CoV-2 genome sequencing and assembly
Total RNAs were prepared according to a reference protocol (Eden and Sim, 2020), with cDNA synthesized with SuperScript IV (Invitrogen) and DNA amplified with Platinum Taq High Fidelity (Invitrogen). The library preparation was performed with Nextera Flex (Illumina) and quantification was performed with Picogreen and Collibri Library Quantification Kit (Invitrogen). The genome sequencing was generated on Illumina MiSeq Platform by 150 × 150 runs with 500xSARS-CoV-2 coverage (50–100 mil reads/per sample).
For the genome assembly (BiomeHub in-house script), the adapters removal and read trimming for 150 nt read sequences were performed by fastqtools.py. The alignment of the sequenced reads to the reference SARS-CoV-2 genome (GenBank ID: NC_045512.2) was performed by Bowtie v2.4.2 (Langmead and Salzberg, 2012) with additional parameters as end-to-end and very-sensitive. The analyses of the sequencing coverage and depth were generated by samtools v1.11 (Li et al., 2009) with minimum base quality per base (Q) ≥ 30. Finally, the consensus sequence for each SARS-CoV-2 genome was generated by a bcftools pipeline (Li, 2011), including the commands mpileup (parameters: Q ≥ 30; q ≥ 40, depth (d) ≤ 2500), filter (parameters: DP > 50), call and consensus.
2.3. SARS-CoV-2 genomes and data retrieval
In order to perform the phylogenetic analysis of Rio Grande do Sul state genomes including the 12 SARS-CoV-2 genomes from Esteio, we gathered 2227 complete and high-coverage sequences from the GISAID database (Elbe and Buckland-Merrett, 2017) with a collection date between March 1, 2020 and May 27, 2022 (submission up to May 27, 2022). The sequences were selected according to the following filters: (i) Location: South America/Brazil/Rio Grande do Sul; (ii) Clade: all; (iii) Complete genomes; (iv) High Coverage Selected.
The analysis of sequencing efforts, lineage frequencies and genomic characterization in Rio Grande do Sul state was posteriorly performed with GISAID data by retrieval of 4706 sequences included in the following parameters: (i) Location: South America/Brazil/Rio Grande do Sul, (ii) Collection date between March 1, 2020 and May 31, 2022, and (iii) Submission date up to September 30, 2022.
2.4. SARS-CoV-2 mutations and lineages
SNPs and insertions/deletions in each sample were identified by the variant calling pipeline (https://github.com/tseemann/snippy), which uses FreeBayes and snpEff to call, annotate and predict variant effects on genes and proteins. The genomes were aligned with MAFFT and the extraction of SNPs and gaps from the sequences in relation to the reference was performed with msastats.py script. The reference sequence comes from the GenBank RefSeq (NC_045512.2), isolated and sequenced from an initial case from Wuhan, China, in 2019. The strains were identified using the dynamic nomenclature implemented in Pangolin (Rambaut, 2020) (https://github.com/cov-lineages/pangolin) and global clades and mutations using Nextstrain from Nextclade (https://clades.nextstrain.org/).
2.5. Phylogenomic analyses
For the global phylogenomics, a search was performed by Audacity Instant on the GISAID database (Elbe and Buckland-Merrett, 2017) to find closely related sequences to the sequenced genomes from this study (up to June 23, 2022). The resulting genome set was aligned with the MAFFT v.7 web server (Katoh et al., 2017). The trimming of 5′ and 3’ UTRs was performed with UGENE (Okonechnikov et al., 2012). The evolutionary model and phylogenomic tree inferences were performed by the IQ-TREE software (Nguyen et al., 2014) with addition of a Shimodaira-Hasegawa-like approximate likelihood ratio test of 1000 replicates (Guindon et al., 2010) and an approximate Bayes test (Anisimova et al., 2011). Figtree software (http://tree.bio.ed.ac.uk/software/figtree/) was used to inspect and visualize the phylogenomic tree.
For the local phylogenomic analyses, 2227 genome sequences from Rio Grande do Sul state, previously downloaded from the GISAID database, were aligned using the MAFFT v.7 web server. The trimming of 5′ and 3’ UTRs was performed with UGENE, identification of the best evolutionary model and phylogenomic inference performed by IQ-TREE software with the same parameters used for the global phylogenomics previously described, with Figtree software used for the inspection and visualization of phylogenomic tree.
2.6. Phylogenetics and molecular evolution of SARS-CoV-2 structural proteins
In order to infer the phylogenetic patterns of structural proteins E, M and N, genomic alignment coordinates related to these sequences were exported according to SARS-CoV-2 reference genome (NC_045512.2). Sequences with nucleotide insertions altering the reading frame were excluded from the analysis. For each gene sequence alignment, the evolutionary model and phylogenetic tree were inferred according to the previously described parameters from phylogenomic analysis.
Molecular evolution tests were performed with the HyPhy package (Pond et al., 2004). The methods FUBAR (Murrell et al., 2013), FEL (Kosakovsky Pond and Frost, 2005), and SLAC (Kosakovsky Pond and Frost, 2005) were implemented to evaluate potential sites under adaptive (pervasive) and purifying selection.
2.7. Molecular stability of structural proteins M and N
The estimation of molecular stability of structural proteins was performed by DynaMut2 web server (Rodrigues et al., 2020) using the experimentally resolved structures by Electron Microscopy: (a) 8CTK (3.52 Å) - relative to protein M; and X-Ray Diffraction: (b) 7VNU (1.95 Å) - relative to N-terminal domain of protein N; and (c) 6ZCO (1.36 Å) - relative to C-terminal domain of protein N, from Protein Data Bank (https://www.rcsb.org/). The selection of tested amino acid mutations was defined according to the sites detected under positive selection by the molecular evolution tests.
3. Results
3.1. Sample characterization
Twelve nasopharyngeal swab samples used for COVID-19 diagnostic purposes (RT-qPCR) were collected from patients from the municipality of Esteio, Rio Grande do Sul (RS) state, between April 9th and June 29th, 2021. The mean cycle threshold (Ct) value for the first RT-qPCR conducted at Laboratório Central de Saúde Pública do Estado do Rio Grande do Sul (LACEN) was 23.83 (median: 23.00; IQR: 4.5).
The sequence coverage for the twelve sequenced genomes ranged between 84.92 and 99.76% (mean: 98.26%) of the 29,903 bp of NC_045512.2 reference genome. The mean of sequencing depth was calculated to 292.23x, with a variation between 53.15 and 542.28x. Leastwise 48.81% of the sequence accomplished a coverage depth ≥51x (max: 98.27%, mean: 87.68%) (Supplementary File 1). According to Pango lineage assignment, 10 sequenced samples belong to P.1 lineage and 2 are from P.1.17 sublineage, both from Gamma variant clade.
3.2. Comparative genomics of Esteio SARS-CoV-2 sequences
Sixty-eight different nucleotide substitutions were found in the twelve genomes from this study, being 36 of these non-synonymous. The mutations NSP13:E341D (ORF1b:E1264D), S:D614G and S:V1176F were predominantly found in these sequences. The most common missense substitutions (in at least 75% of the sequences) were NSP12:P323L (ORF1b:P314L), S:K417T, S:T1027I, ORF8:E92K, and N:P80R (in 11 genomes); 5′UTR:C241T and NSP3:K977Q (10 genomes) and H655Y (in 9 genomes). Only two samples carry substitutions S:E484K and S:N501Y, being both mutations in the same samples. The mutations S:E661D and S:S689I were present in one sample each.
A few mutations found in our samples are described for the first time in RS (Table 1 ). Most of them were already identified in the world and occurred in sequences from different VOCs such as Alpha, Beta, Gamma, Delta, and Omicron. Substitution D1208A (NSP3) was also found for the first time in Brazil. The non-synonymous mutation I136V, in the NSP12 gene, was not described on GISAID, being the first report about this mutation (Table 1).
Table 1.
Mutation | Occurrences in the world | First occurrence in world | Occurrences in Brazil | First occurrence in Brazil | Date of our sample | VOCs |
---|---|---|---|---|---|---|
ORF3a: | 7462 | 2020 | 85 | 2021-02-03 | 2021-05-01 | A, B, G, D, O |
A72S | ||||||
NSP3: | 17,219 | 2020 | 428 | 2020-06-16 | 2021-06-14 | A, B, G, D, O |
P1103L | ||||||
NSP3: | 271 | 2020-03-13 | not described | – | 2021-06-14 | A, G, D, O |
D1208A | ||||||
NSP4: | 198 | 2020 | 2 | 2021-05-31 | 2021-06-14 | A, G, D, O |
D144G | ||||||
NSP13: | 1255 | 2020 | 15 | 2021-01-27 | 2021-06-14 | A, B, G, D, O |
L325F | ||||||
NSP12: | not described | – | not described | – | 2021-05-08 | G |
I136V |
The table describes the mutations and the number of occurrences in the world and Brazil, as well as their first occurrences in these locations. The table also indicates the VOC lineages where each mutation can be found (Access on GISAID: June 7, 2022). A: Alpha; B: Beta; G: Gamma; D: Delta; O: Omicron.
3.3. Comparative genomics of Rio Grande do Sul SARS-CoV-2 sequences
A number of 4706 sequences were downloaded from the GISAID platform ranging from March 2020 up to May 2022 (26 months) (Submission up to September 30, 2022). The genome sequencing count per month in RS state can be visualized in Fig. 1 . Most sequencing efforts are associated with COVID-19 waves caused by the introduction of new lineages in RS state. This can be observed during Gamma/Delta waves (February up to October 2021) and, recently, the Omicron wave, started in January 2022. The total number of sequences in the state is low compared to the number of cases, representing around 0.19% of them (number of cases = 2,435,883 on May 31, 2022).
As shown in Fig. 2 , lineage B.1.1.33 predominates in 2020. Between November 2020 and January 2021, the Zeta (P.2) lineage became more frequent, followed by B.1.1.28 and P.7 lineages. From February until July 2021, P.1 and derivative lineages (mostly P.1 and P.1.2) became prevalent in the RS genomes, in accordance with the lineages of our samples, which were collected between April and June 2021, during the Gamma-related second wave of COVID-19 in the state. Approximately 27.6% of these SARS-CoV-2 genomes obtained on the GISAID database between March 2020 and May 2022 belong to the Gamma clade. Considering only 2021, 57.1% of the SARS-CoV-2 sequenced genomes were classified as Gamma. The Delta lineages (mostly AY.99.2 and AY.101) were initially identified in the state by sequencing in June 2021 and became prevalent from August up to the end of the year. Delta lineages were accountable for 32.30% of the sequenced genomes from RS state in 2021. The Omicron lineages (mostly BA.1.1 and BA.2) arised in RS state in December 2021, achieving more than 90% of sequenced genomes between January and April 2022.
About the genome set from RS state, the missense variant NSP12:P323L (ORF1b:P314L) was the most prevalent, found in 98.70% (n = 4645) of samples (Fig. 3 ). The highly frequent mutations (present in at least 90% of the genomes) also include the extragenic substitution C241T (n = 4484), the synonymous mutation NSP3:F106F (n = 4509), and the non-synonymous mutation S:D614G (n = 4504). Other non-synonymous mutations such as N:R203K/G204R (n = 3856/3854), S:N501Y (n = 2762), S:H655Y (n = 2842) were found in > 50% of samples.
3.4. Global phylogenomics
In order to establish the evolutionary relationships of Esteio sequenced genomes with SARS-CoV-2 global dataset, the AudacityInstant tool from GISAID database was used to identify genetically related genomes. Four sequences could not be related to other sequences in the database, probably due to low sequencing quality. Considering the most related genomes (Table 2 ), four sequences were associated with samples from Brazil (one of them from Rio Grande do Sul state) and the other four were more closely related to genomes from Chile, Mexico, USA, and Canada.
Table 2.
Sequence | Closest related genome |
||||
---|---|---|---|---|---|
Distance | Match quality | Location | Collection date | Lineage | |
RS-44473 | No related genomes found | ||||
RS-44474 | No related genomes found | ||||
RS-44475 | 4 | 0.910 | Brazil/São Paulo | 2021–11–22 | P.1 |
RS-44476 | 3 | 0.920 | Canada | 2021–05–26 | P.1.17 |
RS-44477 | No related genomes found | ||||
RS-44478 | 4 | 0.900 | Chile | 2021–08–13 | B.1.1 |
RS-44479 | 8 | 0.937 | Brazil/Rio de Janeiro | 2021–02–10 | P.1 |
RS-44480 | 2 | 0.905 | Brazil/São Paulo | 2021–06–07 | P.1 |
RS-44481 | 1 | 0.971 | Brazil/Rio Grande do Sul/Guaíba | 2021–06–24 | P.1 |
RS-44482 | 5 | 0.901 | USA | 2021–03–17 | P.1.13 |
RS-44483 | No related genomes found | ||||
RS-44484 | 4 | 0.903 | Mexico | 2021–06–05 | P.1.17 |
Besides the most closely related genomes, other 424 unique sequences (638 genomes in total) were recovered as being related to the sequenced genomes from this study with a genetic distance of 9 or less according to AudacityInstant parameters (Fig. 4 ). Sequences RS-44475, RS-44478, RS-44479, and RS-44481 were mostly associated with Brazilian sequences (47.4 up to 89% of the retrieved genomes). RS-44476, RS-44480, and RS-44484 were predominantly related to Mexico (50%) and USA (30.3 and 27.5% of the genomes), respectively.
3.5. Phylogenomic analysis of SARS-CoV-2 from Rio Grande do Sul state
The phylogenomic analysis of the SARS-CoV-2 genomes from Rio Grande do Sul state showed the formation of multiple monophyletic groups for the main VOCs (Fig. 5 and Supplementary File 2). Alpha (B.1.1.7), Gamma (P.1 and derivative lineages), Delta (B.1.617.2 and derivative lineages), and Omicron (BA.2) clades were validated by SH-aLRT and aBayes tests with at least 97% of branch support (100/1, 99.9/1, 97.1/1, and 100/1 for Alpha, Gamma, Delta, and Omicron, respectively). For other lineages and former VOIs P.7 and Zeta (P.2), it was also observed the clustering in monophyletic groups (97.1/1, and 99/1 of statistical support for P.7 and P.2, respectively), as well as a larger clade including the P.1 (and its derivatives), P.2, and P.7 clades with B.1.1.28 sequences at basal branch (85.3/0.995 of statistical support for SH-aLRT and aBayes test). Interestingly, a clade with Alpha and Omicron genomes was formed with 86.2/0.996 of branch support.
As expected, all 12 genomes sequenced by this study clustered in the Gamma clade, which also presented subclades related to P.1 sublineages. P.1.2 (94/1 for SH-aLRT/aBayes), P.1.17 (85.3/0.997 for SH-aLRT/aBayes, including two genomes from this study at the basal branch), and P.1.7 (88.7/1 for SH-aLRT/aBayes) were found to form monophyletic groups. In the Delta group, sublineages AY.101 and AY.9.2 were supported as subclades by the statistical tests (96.3/1 and 100/1 for SH-aLRT/aBayes, respectively) (Fig. 5 and Supplementary File 2).
3.6. Phylogenetics and molecular evolution of SARS-CoV-2 structural proteins
The molecular evolutionary analysis aimed to identify positively and negatively selected sites from SARS-CoV-2 structural proteins in the genome dataset from Rio Grande do Sul state. The E, M, and N proteins were tested with HyPhy FUBAR, FEL, and SLAC methods (Table 3, Table 4, Table 5 ).
Table 3.
Codon | FUBAR |
FEL |
SLAC |
|||||||
---|---|---|---|---|---|---|---|---|---|---|
Alpha | Beta | Post. prob. | Alpha | Beta | LRT | Prob. | dS | dN | Prob | |
25 | – | – | – | 9.477 | 0.000 | 2.747 | 0.0974 | – | – | – |
63 | – | – | – | 11.265 | 0.000 | 2.756 | 0.0969 | – | – | – |
Post. prob.: Posterior probability. Prob: Probability.
Table 4.
Codon | FUBAR |
FEL |
SLAC |
|||||||
---|---|---|---|---|---|---|---|---|---|---|
Alpha | Beta | Post. prob. | Alpha | Beta | LRT | Prob. | dS | dN | Prob | |
53 | – | – | – | 7.466 | 0.000 | 6.794 | 0.0091 | 3.447 | 0.000 | 0.024 |
94 | 0.764 | 6.559 | 0.9003 | – | – | – | – | – | – | – |
112 | – | – | – | 5.455 | 0.000 | 4.538 | 0.0332 | 2.306 | 0.000 | 0.084 |
114 | – | – | – | 15.269 | 0.000 | 7.834 | 0.0051 | – | – | – |
121 | – | – | – | 5.333 | 0.000 | 2.799 | 0.0943 | 2.294 | 0.000 | 0.084 |
125 | 1.006 | 10.637 | 0.9555 | – | – | – | – | – | – | – |
135 | – | – | – | 11.754 | 0.000 | 3.455 | 0.0631 | – | – | – |
138 | – | – | – | 4.521 | 0.000 | 3.761 | 0.0524 | – | – | – |
139 | – | – | – | 7.642 | 0.000 | 3.612 | 0.0574 | – | – | – |
147 | – | – | – | 7.642 | 0.000 | 3.292 | 0.0696 | – | – | – |
166 | – | – | – | 12.036 | 0.000 | 3.279 | 0.0702 | – | – | – |
195 | – | – | – | 7.642 | 0.000 | 3.873 | 0.0491 | – | – | – |
203 | – | – | – | 8.065 | 0.000 | 4.217 | 0.0400 | 3.443 | 0.000 | 0.024 |
208 | – | – | – | 7.764 | 0.000 | 3.503 | 0.0613 | – | – | – |
Post. prob.: Posterior probability. Prob: Probability. Bold rows indicate positive selection test results.
Table 5.
Codon | FUBAR |
FEL |
SLAC |
|||||||
---|---|---|---|---|---|---|---|---|---|---|
Alpha | Beta | Post. prob. | Alpha | Beta | LRT | Prob. | dS | dN | Prob | |
9 | 1.163 | 6.688 | 0.9170 | – | – | – | – | – | – | – |
21 | – | – | – | 5.593 | 0.000 | 4.290 | 0.0383 | – | – | – |
30 | – | – | – | 5.530 | 0.000 | 3.301 | 0.0692 | – | – | – |
34 | 0.627 | 6.935 | 0.9841 | 0.000 | 4.547 | 5.308 | 0.0212 | – | – | – |
35 | – | – | – | 6.487 | 1.007 | 4.013 | 0.0452 | 5.000 | 0.500 | 0.018 |
40 | – | – | – | 2.586 | 0.000 | 3.399 | 0.0652 | – | – | – |
60 | – | – | – | 2.810 | 0.000 | 3.736 | 0.0532 | – | – | – |
63 | 0.626 | 6.645 | 0.9772 | 0.000 | 4.538 | 4.348 | 0.0371 | – | – | – |
70 | – | – | – | 11.034 | 0.000 | 3.401 | 0.0652 | – | – | – |
78 | – | – | – | 3.206 | 0.000 | 3.486 | 0.0619 | 2.379 | 0.000 | 0.079 |
100 | – | – | – | 11.034 | 0.000 | 3.763 | 0.0524 | – | – | – |
107 | – | – | – | 7.579 | 0.000 | 3.904 | 0.0482 | – | – | – |
110 | – | – | – | 6.426 | 0.000 | 11.948 | 0.0005 | 4.779 | 0.000 | 0.006 |
118 | – | – | – | 11.034 | 0.000 | 3.814 | 0.0508 | – | – | – |
149 | – | – | – | 3.211 | 0.000 | 3.613 | 0.0573 | – | – | – |
151 | 0.595 | 7.118 | 0.9879 | 0.000 | 5.412 | 6.086 | 0.0136 | – | – | – |
157 | – | – | – | 4.417 | 0.780 | 2.727 | 0.0987 | 3.213 | 0.484 | 0.092 |
170 | – | – | – | – | – | – | – | 3.000 | 0.000 | 0.037 |
172 | – | – | – | 5.021 | 0.000 | 6.599 | 0.0102 | 3.607 | 0.000 | 0.025 |
182 | 0.609 | 4.020 | 0.9256 | 0.000 | 3.128 | 3.258 | 0.0711 | – | – | – |
192 | 8.387 | 0.000 | 6.223 | 0.0126 | 5.968 | 0.000 | 0.002 | |||
208 | 0.604 | 4.018 | 0.9266 | 0.000 | 3.130 | 3.478 | 0.0622 | – | – | – |
210 | – | – | – | – | – | – | – | 58.466 | 1.127 | 0.008 |
215 | 0.620 | 3.320 | 0.9035 | 0.000 | 2.534 | 2.776 | 0.0957 | – | – | – |
221 | – | – | – | 1.500 | 0.000 | 2.827 | 0.0927 | – | – | – |
226 | – | – | – | 15.769 | 0.000 | 7.811 | 0.0052 | 2.425 | 0.000 | 0.085 |
227 | – | – | – | 3.098 | 0.000 | 5.676 | 0.0172 | 2.172 | 0.000 | 0.098 |
228 | – | – | – | – | – | – | – | 2.381 | 0.000 | 0.078 |
238 | 0.613 | 5.716 | 0.9707 | 0.000 | 3.731 | 3.992 | 0.0457 | – | – | – |
265 | 0.733 | 5.368 | 0.9120 | 0.000 | 4.711 | 3.278 | 0.0702 | – | – | – |
268 | – | – | – | 5.021 | 0.000 | 6.596 | 0.0102 | 3.573 | 0.000 | 0.026 |
274 | – | – | – | 10.000 | 0.000 | 17.898 | 0.000 | 7.165 | 0.000 | 0.000 |
289 | 1.191 | 6.872 | 0.9165 | – | – | – | – | – | – | – |
291 | – | – | – | 5.474 | 0.000 | 6.537 | 0.0106 | |||
292 | – | – | – | – | – | – | – | 5.327 | 1.455 | 0.069 |
296 | – | – | – | 0.000 | 3.085 | 3.082 | 0.0791 | – | – | – |
298 | – | – | – | 4.982 | 0.000 | 6.561 | 0.0104 | 3.572 | 0.000 | 0.026 |
302 | – | – | – | 6.788 | 0.000 | 7.863 | 0.0050 | 5.000 | 0.000 | 0.004 |
309 | – | – | – | 2.810 | 0.000 | 3.332 | 0.0680 | – | – | – |
312 | – | – | – | 5.721 | 0.000 | 4.285 | 0.0384 | – | – | – |
313 | – | – | – | 2.810 | 0.000 | 3.283 | 0.0700 | – | – | – |
315 | – | – | – | 3.333 | 0.000 | 5.946 | 0.0148 | 2.379 | 0.000 | 0.079 |
318 | – | – | – | 4.025 | 0.000 | 6.183 | 0.0129 | 3.000 | 0.000 | 0.041 |
327 | – | – | – | 3.999 | 0.000 | 6.181 | 0.0129 | 3.000 | 0.000 | 0.041 |
329 | – | – | – | 4.000 | 0.000 | 3.733 | 0.0533 | 3.000 | 0.000 | 0.037 |
330 | – | – | – | – | – | – | – | 0.000 | 0.428 | 1.000 |
333 | – | – | – | 5.021 | 0.000 | 6.565 | 0.0104 | 3.576 | 0.000 | 0.026 |
337 | – | – | – | 3.000 | 0.000 | 4.235 | 0.0396 | 2.142 | 0.000 | 0.097 |
341 | – | – | – | – | – | – | – | 3.573 | 0.463 | 0.069 |
344 | – | – | – | 5.675 | 0.000 | 3.621 | 0.0571 | – | – | – |
346 | – | – | – | 4.977 | 0.000 | 8.931 | 0.0028 | 3.573 | 0.000 | 0.022 |
362 | 1.235 | 7.840 | 0.9224 | – | – | – | – | – | – | – |
363 | – | – | – | 3.301 | 0.000 | 5.949 | 0.0147 | 2.381 | 0.000 | 0.078 |
366 | 1.235 | 7.819 | 0.9223 | – | – | – | – | – | – | – |
372 | – | – | – | 11.754 | 0.000 | 3.770 | 0.0522 | – | – | – |
382 | – | – | – | 3.649 | 0.000 | 4.552 | 0.0329 | – | – | – |
391 | 0.602 | 5.676 | 0.9571 | 0.000 | 4.605 | 4.619 | 0.0316 | – | – | – |
398 | – | – | – | 11.102 | 1.047 | 3.909 | 0.0480 | – | – | – |
403 | – | – | – | 1.667 | 0.000 | 2.970 | 0.0848 | – | – | – |
404 | – | – | – | 2.810 | 0.000 | 4.459 | 0.0347 | – | – | – |
Post. prob.: Posterior probability. Prob: Probability. Bold rows indicate positive selection test results.
3.6.1. E protein
No sites were identified by FUBAR, FEL or SLAC methods to be under positive selective pressure in the Envelope protein. Two sites were identified by a negative selection test from the FEL method.
3.6.2. M protein
Two sites were identified to be under adaptive selection by FUBAR method in Membrane protein (Table 4). Twelve sites were found to be under purifying selection by FEL and/or SLAC methods, four of them by both methods.
3.6.3. N protein
Fourteen sites were identified to be under positive selective pressure by FUBAR and/or FEL in Nucleocapsid protein, of which nine were identified by both methods (Table 5). Forty-six sites were found to be under negative selective pressure, twenty of them identified by FEL and SLAC methods.
3.7. Molecular stability of the structural proteins E, M, and N
The program DynaMut2 was used to estimate the molecular stability of the SARS-CoV-2 structural proteins M and N with mutated residues at sites previously identified under positive selection (Table 6 ). Sites of M and N proteins recognized by the HyPhy tests had their amino acid substitutions evaluated at molecular level using publicly available structures from PDB database (Fig. 6 ). Differently from the spike protein, proteins M and N are less represented in experimentally resolved structures from PDB. Thus, structures: (a) 8CTK - relative to protein M; (b) 7VNU - relative to N-terminal domain of protein N; and (c) 6ZCO - relative to C-terminal domain of protein N were selected to perform these analyzes.
Table 6.
Protein | Site | Mutation | Frequencya | Predicted Stability Change (ΔΔGStability) |
---|---|---|---|---|
M | 94 | S → G | 0.45% | −0.19 kcal/mol (Destabilizing) |
M | 94 | S → I | 0.09% | 0.65 kcal/mol (Stabilizing) |
M | 125 | H → L | 0.04% | 1.83 kcal/mol (Stabilizing) |
M | 125 | H → Q | 0.04% | 0.52 kcal/mol (Stabilizing) |
M | 125 | H → Y | 0.18% | 2.04 kcal/mol (Stabilizing) |
N | 63 | D → G | 18.53% | 0.08 kcal/mol (Stabilizing) |
N | 63 | D → Y | 0.04% | 1.0 kcal/mol (Stabilizing) |
N | 151 | P → L | 0.22% | −0.43 kcal/mol (Destabilizing) |
N | 151 | P → S | 0.09% | −0.17 kcal/mol (Destabilizing) |
N | 265 | T → I | 0.04% | −0.32 kcal/mol (Destabilizing) |
N | 289 | Q → L | 0.04% | 0.42 kcal/mol (Stabilizing) |
N | 289 | Q → H | 0.22% | −0.38 kcal/mol (Destabilizing) |
N | 296 | T → I | 0.09% | −0.75 kcal/mol (Destabilizing) |
N | 362 | T → I | 0.18% | −0.12 kcal/mol (Destabilizing) |
N | 362 | T → K | 0.09% | −0.09 kcal/mol (Destabilizing) |
Mutation frequency in the multiple sequence alignment (n = 2,240 sequences).
In the analysis of M protein, most alterations promote a stabilizing effect in the protein structure, excepting for S94G (ΔΔG = −0.19 kcal/mol). In the N-terminal domain from N protein, alterations in site 63 were associated with a stabilizing effect, while mutations in site 151 seems to destabilize the protein structure. Mutations observed in the C-terminal majoritarily suggest a destabilizing effect, excepting for mutation Q289L (ΔΔG = 0.42 kcal/mol). Some mutations such as M:S94G/I and N:Q289 H/L showed variable stabilizing/destabilizing patterns for the same site. Mutations M:H125L/Y and N:D63Y not only presented a stabilizing effect but also are associated with a larger predicted stability change, increasing from 1 up to 2.04 kcal/mol in Gibbs Free Energy.
4. Discussion
Rio Grande do Sul (RS) is currently the fourth state most affected by COVID-19 in Brazil (https://covid.saude.gov.br/accessed on December 13, 2022). The municipality of Esteio, located in the metropolitan region of RS capital, as a commuter town, presents a large flow of inhabitants to nearby cities and had its population massively tested for COVID-19 between May 2020 and December 2021 by the project GPS COVID conducted by our research group. As a result, we detected in this city one of the firsts occurrences of E484K mutation in RS state and Brazil, as well as the importation of P.2 lineage in Rio Grande do Sul (Franceschi et al., 2021b). The P.1 lineage initiated a new wave of infections in Brazil around November 2020, starting in Manaus (northern Brazil) and spreading across the country (Faria et al., 2021). In RS, P.1 arrived in mid of January 2021, with a high transmission rate until April 2021, characterizing the second COVID-19 wave in the state (Varela et al., 2021; Salvato et al., 2021).
According to the phylogenomic analysis, the genomes from RS state formed monophyletic groups for most of the lineages, with specific clades to lineages and sublineages belonging to Alpha, Gamma, Delta, and Omicron VOCs. This data may suggest some intra-lineage genetic conservation in the SARS-CoV-2 genomes from RS state. However, as seen in AudacityInstant data search, four of our sequenced samples are most closely related to genomes from Chile, Mexico, Canada, and USA. It is not possible to guarantee if it is not an artifact from sequencing related to low quality and covering, sampling or if it is real evidence of possible migration events, since our genomes present older collection dates than their matches. The sequencing reads assembly for these four samples achieved between 98.12 and 99.73% of SARS-CoV-2 reference sequence covering and the occurrence of undefined nucleotides (Ns) ranged from 4.47 up to 17.78%, indicating low coverage sequences.
For those sequences more closely related to Brazilian samples, except for genome RS-44479 - which was collected in June 2021 and had their closest related sequence dated to February 2021, in Rio de Janeiro - the remaining ones also have older collection dates than their matches. The sequencing reads assembly for these four “Brazilian” samples achieved between 98.82 and 99.76% of SARS-CoV-2 reference sequence covering and the occurrence of undefined nucleotides (Ns) ranged from 2.00 up to 13.47%, indicating three low and one high coverage sequence in this set. The remaining four genomes with no identifiable related sequences presented very low sequencing quality and covering, ranging from 84.92 up to 98.26% of SARS-CoV-2 reference sequence covering and 49.55 up to 58.43% of undefined nucleotides.
RNA viruses have a higher mutation rate than DNA viruses and organisms (Duffy, 2018). Selective pressure occurs in a way that the virus can keep its transmission and immune evasion mechanisms updated according to the host characteristics (Zarai et al., 2020). The E protein is the smallest structural protein of SARS-CoV-2 and keeps their structure highly conserved across diverse genres of β-coronaviruses (Yadav et al., 2021). It comprises three main domains, the N-terminal (NT), C-terminal (CT), and transmembrane domain (TMD). Possible modification in TMD could indicate an differential interaction with membrane lipids, as well as the alteration of the capacity of membrane attachment and ER targeting by the E protein (Timmers et al., 2021). Similarly, sites located at the D-L-L-V motif bind to the host protein PALS1 could facilitate infection (Timmers et al., 2021). However, no sites were found under positive selective pressure in E protein. The presence of sites 25 and 63 under negative selective pressure suggest their importance to protein function conservation.
The M protein is very important in the mounting of the virion and the other structural proteins in the coronaviruses (Neuman et al., 2011). In SARS-CoV-2, this protein can be related to antigenic reactions, with the S and N proteins (Lopandić et al., 2021) even reducing the interferon I responses (Sui et al., 2021). Therefore, modifications in its genomic structure can directly impact the virus survival, which is probably the reason for the low identification of diversifying selection events in that protein.
The N protein structure is composed of three main domains: N-terminal domain (NTD), a linker domain rich on serine and arginine residues (SR-rich linker), and a C-terminal domain (CTD) (Timmers et al., 2021). NTD and CTD comprise major antigenic sites of the N protein in SARS-CoV virus (Surjit and Lal, 2009). This protein has a role in the packing of the viral genetic material besides related to immune escape, blocking interferons and other defense mechanisms of the host (Bai et al., 2021). According to Rahman et al., several alterations in that protein makes it difficult to create vaccines and medications that could use it as a target (Rahman et al., 2020). The co-occurring amino acid mutations R203K and G204R, for example, are known to enhance replication, fitness, and pathogenesis of SARS-CoV-2 (Johnson et al., 2022).
Changes in nucleotides can result in modification of the protein structure, increasing or decreasing their stability (Jaenicke, 1996). The flexibility of a protein is related to its function and conformation (Zhao, 2010). In this way, the supervised machine-learning trained tool DynaMut2 was selected to predict the effect of missense variations from positively selected sites on protein stability. Previous work performed by our group showed that DynaMut findings could be similar to those provided by Molecular Dynamics simulations, which identified P.1 spike structures with higher structural stability while DynaMut inferred the stabilizing effect of P.1-related RBD mutations E484K and N501Y (Gröhs Ferrareze et al., 2022). According to the DynaMut2 results for M protein, mutations in site 94 could stabilize or destabilize protein structure according to the alteration of the native serine by an isoleucine or a glycine, respectively. Importantly, site 94 constitutes part of polar contacts between M protein monomer and membrane lipids. Studies suggest that mutations in neighbor sites, such as 92 (W92Q), 93 (L93S) and 97 (I97 T/S), result in an increased stabilization of the homodimer structure (Marques-Pereira et al., 2022). Site 125 achieved a stabilizing effect in all tested mutations. The mutation H125Y is the most frequent on GISAID with 14,355 occurrences in SARS-CoV-2 genomes in the world (accessed on October 31, 2022). Present in all variants of concern, this mutation was found to be prevalently associated to the Delta and Omicron clades, 35.10% and 24.46% of the occurrences on GISAID, respectively, while the variant H125L is less spread, occurring in ≅0.001% of world genomes, including those from Alpha, Delta, and Omicron groups. Another minor variant, H125Q (≅0.0009%) is also related to Alfa, Delta, and Omicron lineages.
The N-terminal domain of N protein had two sites analyzed with 2 different mutations each. Mutations on site 63 (D63G/Y) lead to a stabilizing effect and mutations on site 151 (P151 L/S) seem to destabilize protein structure. The alteration of an aspartic acid to a glycine in D63G mutation is widely found in the world, occurring in 32.74% of world genomes on GISAID, majoritarily in Delta sequences (99.8% of D63G occurrences). D63Y is found in approximately 2800 genomes, mostly from Omicron and Alpha lineages. P151S substitution (≅1.28% of SARS-CoV-2 genomes in the world) destabilizes N-terminal domain from Nucleocapsid protein by alteration of a non-polar proline by a polar serine. Prevalent in Omicron (95.64% of P151S occurrences on GISAID), P151S is more frequent than P151L (non-polar proline to non-polar leucine), which destabilizes the protein structure, with more than 26,000 occurrences on SARS-CoV-2 genomes in the world, including all VOCs. The analysis of nucleocapsid mutations with different computational approaches such as DynaMut2, MaestroWeb, mCS, SDM and CUPSAT, by Mohammad et al. (2021), also identified a destabilizing effect for P151L.
In the C-terminal domain of N protein were considered four sites with their respective amino acid alterations and the majority of them seem to destabilize the protein. Only mutation Q289L (≅0.009% of GISAID genomes) demonstrates a stabilizing effect on the C-terminal domain structure. Surprisingly, as observed in N-terminal domain mutations, Q289H (≅0.08% of GISAID genomes in the world) that lead to a destabilizing effect is nine times more frequent than the substitution for a leucine residue and is mostly found in Delta and Omicron genomes.
Similar results were found by Rahman et al. (2020) as well as by Mohammad et al. (2021) in the analysis of the structural effects of mutation Q289H. Likewise, mutation T265I was found to be destabilizing (Mohammad et al., 2021). For T362I/K, their results indicate a stabilizing effect, in contrast with our findings. Nevertheless, Azad (2021) identified the mutation T362I as destabilizing in the analysis with mCSM and DynaMut. Finally, more studies are necessary to completely understand how structural changes may lead to advantages of SARS-CoV-2 in the host-pathogen interactions.
5. Conclusion
Despite most studies focusing on spike protein, other structural proteins such as the nucleocapsid protein present non-synonymous mutations that alter SARS-CoV-2 pathogenesis. This work analyzed phylogenetics and positively selected sites in these structural proteins, identifying their stabilizing or destabilizing effects on protein structures. The inference of a prevalent stabilizing effect on membrane protein structure and the mostly destabilizing effects on C-terminal nucleocapsid domain, as well as sites under positive selection in nucleocapsid protein may be related to the selective pressures that allow its diversification, potentially improving SARS-CoV-2 fitness. The identification of these genetic traits in samples from Rio Grande do Sul state helps to understand the evolutionary pattern of SARS-CoV-2 in the Brazilian territory.
Ethics statement
The research protocol was approved with exemption of written informed consent for viral genome sequencing and bioinformatic analyses by Comitê de Ética em Pesquisa em Seres Humanos of Universidade Federal de Ciências da Saúde de Porto Alegre (CEP - UFCSPA) under process number CAAE 39247920.0.0000.5345.
CRediT authorship contribution statement
Amanda de Menezes Mayer: Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing. Patrícia Aline Gröhs Ferrareze: Conceptualization, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing. Luiz Felipe Valter de Oliveira: Methodology, Resources, Funding acquisition, Writing – review & editing. Tatiana Schäffer Gregianini: Methodology, Resources, Writing – review & editing. Carla Lucia Andretta Moreira Neves: Resources, Writing – review & editing. Gabriel Dickin Caldana: Investigation, Writing – original draft, Writing – review & editing. Lívia Kmetzsch: Supervision, Writing – review & editing. Claudia Elizabeth Thompson: Conceptualization, Formal analysis, Investigation, Methodology, Resources, Supervision, Funding acquisition, Writing – original draft, Writing – review & editing, All authors have read and approved the manuscript.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
We thank the administrators of the GISAID database and research groups across the world for supporting the rapid and transparent sharing of genomic data during the COVID-19 pandemic and the Governo do Estado do Rio Grande do Sul and Ministério da Saúde for supplies and equipment used in the SARS-CoV-2 diagnosis routine. We also thank the Prefeitura Municipal de Esteio (Esteio Mayor’s Office). Scholarships and Fellowships were supplied by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance code 001 - and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). The genome sequencing was performed by BiomeHub laboratory.
Handling Editor: Ms. J Jasmine Tomar
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.virol.2023.03.005.
Appendix A. Supplementary data
The following are the Supplementary data to this article:
Data availability
Data will be made available on request.
References
- Anisimova M., Gil M., Dufayard J.-F., Dessimoz C., Gascuel O. Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst. Biol. 2011;60(5):685–699. doi: 10.1093/sysbio/syr041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Azad G.K. The molecular assessment of SARS-CoV-2 Nucleocapsid Phosphoprotein variants among Indian isolates. Heliyon. 2021;7(2) doi: 10.1016/j.heliyon.2021.e06167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bai Z., Cao Y., Liu W., Li J. The sars-cov-2 nucleocapsid protein and its role in viral structure, biological functions, and a potential target for drug or vaccine mitigation. Viruses. 2021;13(6) doi: 10.3390/v13061115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chakraborty S. E484K and N501Y SARS-CoV 2 spike mutants Increase ACE2 recognition but reduce affinity for neutralizing antibody. Int. Immunopharm. 2022;102 doi: 10.1016/j.intimp.2021.108424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen J., Wang R., Gilby N.B., Wei G.-W. Omicron variant (B.1.1.529): infectivity, vaccine breakthrough, and antibody resistance. J. Chem. Inf. Model. 2022;62(2):412–422. doi: 10.1021/acs.jcim.1c01451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen R.E., Zhang X., Case J.B., Winkler E.S., Liu Y., VanBlargan L.A., Liu J., Errico J.M., Xie X., Suryadevara N., Gilchuk P., Zost S.J., Tahan S., Droit L., Turner J.S., Kim W., Schmitz A.J., Thapa M., Wang D., et al. Resistance of SARS-CoV-2 variants to neutralization by monoclonal and serum-derived polyclonal antibodies. Nat. Med. 2021;27(4):717–726. doi: 10.1038/s41591-021-01294-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dong E., Du H., Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 2020;20(5):533–534. doi: 10.1016/s1473-3099(20)30120-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duffy S. Why are RNA virus mutation rates so damn high? PLoS Biol. 2018;16(8) doi: 10.1371/journal.pbio.3000003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eden J.-S., Sim E. ZappyLab, Inc; 2020. SARS-CoV-2 Genome Sequencing Using Long Pooled Amplicons on Illumina Platforms V1. [DOI] [Google Scholar]
- Elbe S., Buckland-Merrett G. Data, disease and diplomacy: GISAID's innovative contribution to global health. Global Challenges. 2017;1(1):33–46. doi: 10.1002/gch2.1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faria N.R., Mellan T.A., Whittaker C., Claro I.M., Candido D. da S., Mishra S., Crispim M.A.E., Sales F.C.S., Hawryluk I., McCrone J.T., Hulswit R.J.G., Franco L.A.M., Ramundo M.S., de Jesus J.G., Andrade P.S., Coletti T.M., Ferreira G.M., Silva C.A.M., Manuli E.R., Sabino E.C. Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil. Science. 2021;372(6544) doi: 10.1126/science.abh2644. New York, N.y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Franceschi V.B., Caldana G.D., Perin C., Horn A., Peter C., Cybis G.B., Ferrareze P.A.G., Rotta L.N., Cadegiani F.A., Zimerman R.A., Thompson C.E. Predominance of the sars-cov-2 lineage P.1 and its sublineage P.1.2 in patients from the metropolitan region of Porto Alegre, southern Brazil in march 2021. Pathogens. 2021;10(8):988. doi: 10.3390/pathogens10080988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Franceschi V.B., Caldana G.D., de Menezes Mayer A., Cybis G.B., Neves C.A.M., Ferrareze P.A.G., Demoliner M., de Almeida P.R., Gularte J.S., Hansen A.W., Weber M.N., Fleck J.D., Zimerman R.A., Kmetzsch L., Spilki F.R., Thompson C.E. Genomic epidemiology of SARS-CoV-2 in Esteio, Rio Grande do Sul, Brazil. BMC Genom. 2021;22(1) doi: 10.1186/s12864-021-07708-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giovanetti M., Fonseca V., Wilkinson E., Tegally H., San E.J., Althaus C.L., Xavier J., Nanev Slavov S., Viala V.L., Ranieri Jerônimo Lima A., Ribeiro G., Souza-Neto J.A., Fukumasu H., Lehmann Coutinho L., Venancio da Cunha R., Freitas C., Campelo de A e Melo C.F., Navegantes de Araújo W., Do Carmo Said R.F., et al. Replacement of the Gamma by the Delta variant in Brazil: impact of lineage displacement on the ongoing pandemic. Virus Evol. 2022;8(1) doi: 10.1093/ve/veac024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gröhs Ferrareze P.A., Zimerman R.A., Franceschi V.B., Caldana G.D., Netz P.A., Thompson C.E. Molecular evolution and structural analyses of the spike glycoprotein from Brazilian SARS-CoV-2 genomes: the impact of selected mutations. J. Biomol. Struct. Dyn. 2022:1–19. doi: 10.1080/07391102.2022.2076154. [DOI] [PubMed] [Google Scholar]
- Guindon S., Dufayard J.-F., Lefort V., Anisimova M., Hordijk W., Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml 3.0. Syst. Biol. 2010;59(3):307–321. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]
- Gräf T., Bello G., Naveca F.G., Gomes M., Cardoso V.L.O., da Silva A.F., Dezordi F.Z., dos Santos M.C., Santos K.C. de O., Batista É.L.R., Magalhães A.L.Á., Vinhal F., Miyajima F., Faoro H., Khouri R., Wallau G.L., Delatorre E., Siqueira M.M., Resende P.C., et al. Phylogenetic-based inference reveals distinct transmission dynamics of SARS-CoV-2 lineages Gamma and P.2 in Brazil. iScience. 2022;25(4) doi: 10.1016/j.isci.2022.104156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gularte J.S., da Silva M.S., Mosena A.C.S., Demoliner M., Hansen A.W., Filippi M., Pereira, V. M. de A. G., Heldt F.H., Weber M.N., de Almeida P.R., Hoffmann A.T., Valim A.R. de M., Possuelo L.G., Fleck J.D., Spilki F.R. Early introduction, dispersal and evolution of Delta SARS-CoV-2 in Southern Brazil, late predominance of AY.99.2 and AY.101 related lineages. Virus Res. 2022;311 doi: 10.1016/j.virusres.2022.198702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jaenicke R. How do proteins acquire their three-dimensional structure and stability? Naturwissenschaften. 1996;83(12):544–554. doi: 10.1007/bf01141979. [DOI] [PubMed] [Google Scholar]
- Johnson B.A., Zhou Y., Lokugamage K.G., Vu M.N., Bopp N., Crocquet-Valdes P.A., Kalveram B., Schindewolf C., Liu Y., Scharton D., Plante J.A., Xie X., Aguilar P., Weaver S.C., Shi P.-Y., Walker D.H., Routh A.L., Plante K.S., Menachery V.D. Nucleocapsid mutations in SARS-CoV-2 augment replication and pathogenesis. PLoS Pathog. 2022;18(6) doi: 10.1371/journal.ppat.1010627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Junior R. da S.F., Lamarca A.P., de Almeida L.G.P., Cavalcante L., Machado D.T., Martins Y., Brustolini O., Gerber A.L., Guimarães A.P. de C., Gonçalves R.B., Alves C., Mariani D., Cruz T.F., de Souza I.V., de Carvalho E.M., Ribeiro M.S., Carvalho S., Silva F. D. da, Garcia M.H. de O., de Vasconcelos A.T.R. Turnover of sars-cov-2 lineages shaped the pandemic and enabled the emergence of new variants in the state of Rio de Janeiro, Brazil. Viruses. 2021;13(10) doi: 10.3390/v13102013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katoh K., Rozewicki J., Yamada K.D. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Briefings Bioinf. 2017;20(4):1160–1166. doi: 10.1093/bib/bbx108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kosakovsky Pond S.L., Frost S.D.W. Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol. Biol. Evol. 2005;22(5):1208–1222. doi: 10.1093/molbev/msi105. [DOI] [PubMed] [Google Scholar]
- Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9(4):357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27(21):2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lopandić Z., Protić-Rosić I., Todorović A., Glamočlija S., Gnjatović M., Ćujic D., Gavrović-Jankulović M. IgM and igg immunoreactivity of sars-cov-2 recombinant M protein. Int. J. Mol. Sci. 2021;22(9) doi: 10.3390/ijms22094951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marques-Pereira C., Pires M.N., Gouveia R.P., Pereira N.N., Caniceiro A.B., Rosário-Ferreira N., Moreira I.S. SARS-CoV-2 membrane protein: from genomic data to structural new insights. Int. J. Mol. Sci. 2022;23(6):2986. doi: 10.3390/ijms23062986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mohammad T., Choudhury A., Habib I., Asrani P., Mathur Y., Umair M., Anjum F., Shafie A., Yadav D.K., Hassan Md I. Genomic variations in the structural proteins of sars-cov-2 and their deleterious impact on pathogenesis: a comparative genomics approach. Front. Cell. Infect. Microbiol. 2021;11 doi: 10.3389/fcimb.2021.765039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murrell B., Moola S., Mabona A., Weighill T., Sheward D., Kosakovsky Pond S.L., Scheffler K. FUBAR: a fast, unconstrained bayesian approximation for inferring selection. Mol. Biol. Evol. 2013;30(5):1196–1205. doi: 10.1093/molbev/mst030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neuman B.W., Kiss G., Kunding A.H., Bhella D., Baksh M.F., Connelly S., Droese B., Klaus J.P., Makino S., Sawicki S.G., Siddell S.G., Stamou D.G., Wilson I.A., Kuhn P., Buchmeier M.J. A structural analysis of M protein in coronavirus assembly and morphology. J. Struct. Biol. 2011;174(1) doi: 10.1016/j.jsb.2010.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen L.-T., Schmidt H.A., von Haeseler A., Minh B.Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 2014;32(1):268–274. doi: 10.1093/molbev/msu300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Okonechnikov K., Golosova O., Fursov M. Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. 2012;28(8):1166–1167. doi: 10.1093/bioinformatics/bts091. [DOI] [PubMed] [Google Scholar]
- Pond S.L.K., Frost S.D.W., Muse S.V. HyPhy: hypothesis testing using phylogenies. Bioinformatics. 2004;21(5):676–679. doi: 10.1093/bioinformatics/bti079. [DOI] [PubMed] [Google Scholar]
- Rahman M.S., Islam M.R., Alam A.S.M.R.U., Islam I., Hoque M.N., Akter S., Rahaman Md M., Sultana M., Hossain M.A. Evolutionary dynamics of SARS‐CoV‐2 nucleocapsid protein and its consequences. J. Med. Virol. 2020;93(4):2177–2195. doi: 10.1002/jmv.26626. [DOI] [PubMed] [Google Scholar]
- Rodrigues C.H.M., Pires D.E.V., Ascher D.B. DynaMut2: assessing changes in stability and flexibility upon single and multiple point missense mutations. Protein Sci. 2020;30(1):60–69. doi: 10.1002/pro.3942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salvato R.S., Gregianini T.S., Campos A.A.S., Crescente L.V., Vallandro M.J., Ranieri T.M.S., Vizeu S., Martins L.G., Da Silva E.V., Pedroso E.R., Burille A., Baethgen L.F., Schiefelbein S.H., Machado T.R.M., Becker I.M., Ramos R., Piazza C.F., Nunes Z.M.A., Bastos C.G.M.B. Epidemiological investigation reveals local transmission of SARS-CoV-2 lineage P.1 in Southern Brazil. Revista de Epidemiologia e Controle de Infecção. 2021;11(1) doi: 10.17058/reci.v1i1.16335. [DOI] [Google Scholar]
- Satarker S., Nampoothiri M. Structural proteins in severe acute respiratory syndrome coronavirus-2. Arch. Med. Res. 2020;51(6):482–491. doi: 10.1016/j.arcmed.2020.05.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shiehzadegan S., Alaghemand N., Fox M., Venketaraman V. Analysis of the delta variant B.1.617.2 COVID-19. Clin. Pract. (Wash. D C) 2021;11(4) doi: 10.3390/clinpract11040093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sui L., Zhao Y., Wang W., Wu P., Wang Z., Yu Y., Hou Z., Tan G., Liu Q. SARS-CoV-2 membrane protein inhibits type I interferon production through ubiquitin-mediated degradation of TBK1. Front. Immunol. 2021;12 doi: 10.3389/fimmu.2021.662989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Surjit M., Lal S.K. Molecular Biology of the SARS-Coronavirus. Springer Berlin Heidelberg; 2009. The nucleocapsid protein of the SARS coronavirus: structure, function and therapeutic potential; pp. 129–151. [DOI] [Google Scholar]
- Timmers L.F.S.M., Peixoto J.V., Ducati R.G., Bachega J.F.R., de Mattos Pereira L., Caceres R.A., Majolo F., da Silva G.L., Anton D.B., Dellagostin O.A., Henriques J.A.P., Xavier L.L., Goettert M.I., Laufer S. SARS-CoV-2 mutations in Brazil: from genomics to putative clinical conditions. Sci. Rep. 2021;11(1) doi: 10.1038/s41598-021-91585-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Varela A.P.M., Prichula J., Mayer F.Q., Salvato R.S., Sant'Anna F.H., Gregianini T.S., Martins L.G., Seixas A., Veiga A. B. G. da. SARS-CoV-2 introduction and lineage dynamics across three epidemic peaks in Southern Brazil: massive spread of P.1. Infect. Genet. Evol. 2021;96 doi: 10.1016/j.meegid.2021.105144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X., Powell C.A. How to translate the knowledge of COVID‐19 into the prevention of Omicron variants. Clin. Transl. Med. 2021;11(12) doi: 10.1002/ctm2.680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wink P.L., Ramalho R., Monteiro F.L., Volpato F.C.Z., Willig J.B., Lovison O. von A., Zavascki A.P., Barth A.L., Martins A.F. Genomic surveillance of sars-cov-2 lineages indicates early circulation of P.1 (Gamma) variant of concern in southern Brazil. Microbiol. Spectr. 2022;10(1) doi: 10.1128/spectrum.01511-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yadav R., Chaudhary J.K., Jain N., Chaudhary P.K., Khanra S., Dhamija P., Sharma A., Kumar A., Handu S. Role of structural and non-structural proteins and therapeutic targets of sars-cov-2 for COVID-19. Cells. 2021;10(4):821. doi: 10.3390/cells10040821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zarai Y., Zafrir Z., Siridechadilok B., Suphatrakul A., Roopin M., Julander J., Tuller T. Evolutionary selection against short nucleotide sequences in viruses and their related hosts. DNA Res.: Int. J. Rapid Pub. Rep. Genes Genomes. 2020;27(2) doi: 10.1093/dnares/dsaa008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Q. Protein flexibility as a biosignal. Crit. Rev. Eukaryot. Gene Expr. 2010;20(2):157–170. doi: 10.1615/critreveukargeneexpr.v20.i2.60. [DOI] [PubMed] [Google Scholar]
- Zhu N., Zhang D., Wang W., Li X., Yang B., Song J., Zhao X., Huang B., Shi W., Lu R., Niu P., Zhan F., Ma X., Wang D., Xu W., Wu G., Gao G.F., Tan W. A novel coronavirus from patients with pneumonia in China, 2019. N. Engl. J. Med. 2020;382(8):727–733. doi: 10.1056/nejmoa2001017. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data will be made available on request.