Abstract
With the coexistence of multiple lineages and increased international travel, recombination and gene flow are likely to become increasingly important in the adaptive evolution of SARS-CoV-2. These processes could result in genetic introgression and the incipient parallel evolution of multiple recombinant lineages. However, identifying recombinant lineages is challenging, and the true extent of recombinant evolution in SARS-CoV-2 may be underestimated. This study describes the first SARS-CoV-2 Deltacron recombinant case identified in Brazil. We demonstrate that the recombination breakpoint is at the beginning of the Spike gene. The 5′ genome portion (circa 22 kb) resembles the AY.101 (Delta), and the 3′ genome portion (circa 8 kb nucleotides) is most similar to the BA.1.1 (Omicron). Furthermore, evolutionary genomic analyses indicate that the new strain emerged after a single recombination event between lineages of diverse geographical locations in December 2021 in South Brazil. This Deltacron, AYBA-RS, is one of the dozens of recombinants described in 2022. The submission of only four sequences in the GISAID database suggests that this lineage had a minor epidemiological impact. However, the recent emergence of this and other Deltacron recombinant lineages (XD, XF, and XS) suggests that gene flow and recombination may play an increasingly important role in the COVID-19 pandemic. We explain the evolutionary and population genetic theory that supports this assertion, concluding that this stresses the need for continued genomic surveillance. This monitoring is vital for countries where multiple variants are present, as well as for countries that receive significant inbound international travel.
Keywords: COVID-19, SARS-CoV-2 genomes, gene flow, recombination, genetic introgression, adaptive landscape, severe acute respiratory syndrome coronavirus 2, Brazil, Deltacron, recombinant, AYBA-RS
1. Introduction
During the first two years of the COVID-19 outbreak, most genetic variation in SARS-CoV-2 was generated by mutations, some of which improved the fitness of the virus in its new host and epidemiological environment. More recently, the adaptive evolution of SARS-CoV-2 has also involved recombination. In SARS-CoV-2, recombination can occur when distinct variants co-infect the same host cell and exchange genetic material [1,2]. This process is called genetic introgression, and it plays an essential role in the virulence evolution of parasites and pathogens [3]. Gene flow occurs when a genotype of a given variant is moved from one population into another. If this gene flow also results in a co-infection in a host already infected with another variant, it might lead to genetic introgression. This phenomenon is known to have resulted in the evolution of novel subspecies in other human parasites, such as Cryptosporidium spp. (e.g., [4]). The exchange of genetic information between distinct lineages underpins the virulence evolution of these parasites [5]. In the case of SARS-CoV-2, international travel is likely to contribute considerably to the gene flow of different variants across the globe, thereby increasing the probability of genetic introgression [3].
Genetic introgression offers three potential advantages over mutation: (1) it can insert multiple substitutions all at once; (2) these substitutions have been previously selected and are functional in the genomic background of the parental lineage; and (3) this enables the recombinant genotype to bridge the fitness valleys in the adaptive fitness landscape and find higher fitness peaks [6]. In addition, increased international travel enables gene flow and recombinant exchange between distinct lineages that have evolved worldwide. Consequently, recombination and gene flow may play an increasingly important role in the transmissibility, severity, and resistance to vaccines and treatments of SARS-CoV-2, and the evolutionary epidemiology of the COVID-19 pandemic. Here, we examine the evidence for the incipient parallel evolution of recombinant lineages, studying SARS-CoV-2 genomes in Brazil. This country has seen less intensive genomic and epidemiological surveillance than other parts of the world. Hence, by studying the SARS-CoV-2 genome sequence variation in Brazil, we may better understand the extent of cryptic recombinant lineages.
Up to September 2022, the WHO reported five SARS-CoV-2 variants of concern (VOCs): Alpha, Beta, Gamma, Delta, and Omicron [7]. The variant Delta (B.1.617.2) emerged in India at the end of 2020 and spread to at least 185 nations [8,9]. The WHO then classified this variant as a VOC due to its high transmissibility and potential to cause severe COVID-19. In November 2021, the Omicron (BA.1) variant emerged in South Africa [10] and was declared a VOC. According to GISAID in August 2022, the sub-variants of BA.1 spread to at least 193 countries [11,12]. The widespread and simultaneous circulation of both VOCs, Omicron and Delta, resulted in recombinants known as “Deltacron”. Genomic analysis of SARS-CoV-2 samples revealed a novel Deltacron lineage in France in February 2022. This lineage presented two recombination breakpoints, one at the beginning of the Spike region and another at the beginning of ORF3a [13]. The genomic segment within these limits displayed Omicron signature mutations; however, the rest of the genome presented Delta signature mutations. This recombinant variant was then designated XD [13] and was mainly found in Denmark and the Netherlands [14,15]. Besides XD, two other Omicron and Delta hybrids, XF and XS, circulated in the United Kingdom and the USA, respectively. Both recombinants presented a minor Delta portion at the 5’ end of an Omicron genomic backbone, although with distinct breakpoint locations [16]. According to the Cov-Lineages [17], there are less than 40 sequences to each XD, XF and XS recombinant.
Most recombinants in the GISAD database were recovered from European countries and the USA. To fill the gap of studies in other regions, we investigated four putative Brazilian recombinants recovered from South and Southeast Brazil. We analysed the mutation profile, identified the recombination breakpoints, and made a phylogenetic reconstruction to trace the origins of the novel recombinant lineages. This analysis confirmed that the Deltacron recombinant in the country had evolved de novo, and that it could be considered a case of incipient parallel evolution. In other words, this recombinant variant acquired similar characteristics to those of other Deltacron variants (i.e., it shows a high sequence similarity), and the new variant AYBA-RS had evolved this independently from other circulating variants. Our results highlighted the importance of genomic surveillance for monitoring the viral evolution caused by co-infections with different SARS-CoV-2 lineages and for identifying putative recombinants. This action is specifically pressing during periods of high viral circulation and in countries with multiple variants, as well as in regions that are a hub for international air travel.
2. Materials and Methods
2.1. Bioethics, Sample Collection and Processing
The clinical samples were retrieved from three different institutions performing COVID-19 diagnosis and SARS-CoV-2 genomic surveillance in Rio Grande do Sul, Brazil: Centro Estadual de Vigilância em Saúde-CEVS (The State Centre for Health Surveillance of the State Department of Health); the Genetics and Molecular Biology Laboratory from Hospital Moinhos de Vento; and Laboratório de Bioinformática Aplicada a Microbiologia Clínica from Federal University of Santa Maria (UFSM). In all cases, the SARS-CoV-2 infection was first detected by real-time RT-PCR, and the samples were submitted to a genomic sequencing routine in each institution.
2.2. Whole-Genome Sequencing, Assembly, and Quality Control
We observed an S gene dropout (i.e., gene not detected) in the sample SC2-9898 on May 2022 and then selected this sample for genome sequencing with the SARS-CoV-2 FLEX NGS panel (Paragon Genomics, Fremont, CA, USA) on the Illumina MiSeq platform. The library preparation was conducted according to the manufacturer’s protocol, and the sequencing was performed using the MiSeq Reagent Micro Kit v2 (Illumina Inc, San Diego, CA, USA). The FASTq files were obtained using the Local Run Manager Generate FASTQ Analysis Module v3.0 (Illumina Inc, San Diego, CA, USA) and submitted to the SOPHiA DDM v.5 platform. These files were analysed using the CleanPlex SARS hCoV2 pipeline for sequence alignment. Finally, the sequence was deposited in the GISAID database with the entry EPI_ISL_14381991.
For the EPI_ISL_12110384 and EPI_ISL_14284846 sequences, whole-genome sequencing was performed using the Illumina COVIDSeq protocol (Illumina Inc,, San Diego, CA, USA) on the Illumina sequencing platform. The pipeline ViralFlow was used to perform genome assembly, variant calling, and consensus generation [18]. To evaluate the quality and determine the lineage of the genome sequences, we analysed them on Nextclade Web version 2.3.1 [19].
2.3. Identification of Lineage Counterparts
To identify the candidate parental genomes that may have introgressed to from the new Brazilian recombinant, we performed blast searches using the sequence Brazil/RS-FIOCRUZ-8390/2022 (EPI_ISL_12110384) on the “Unassigned” dataset from GISAID (assessed on 25 July 2022). After this, we visually inspected the mutation pattern of the top hits on the Nextclade Web, using the putative Brazilian recombinant sequences as references.
2.4. Parental Lineages Determination
Genome sequences were first aligned using Nextalign version 1.11.0 with default parameters and sequence MN908947 as reference. We evaluated the recombinant genomes using Sc2rf [20]. Subsequently, we manually segmented the genome of the oldest sequences of each recombinant lineage according to the Delta and Omicron portions using Aliview version 1.27 [21]. Next, we assessed the Pangolin lineage of each of the 5′ delta and the 3′ omicron segments in the Nextclade Web and Pangolin COVID-19 Lineage Assigner version 4.1.1 [22].
We also performed a blastn search (BLAST version 2.10.1+, [23,24]) for each of the Delta and Omicron segments in the sequences of Nextstrain’s global analysis—GISAID data [25] (assessed on 23 June 2022). We then checked the lineage of the oldest top-hit strain in the GISAID metadata.
We built lineage-specific databases (GISAID sequences) considering the Pango lineages determined in the previous analyses. We again utilised each segment as a query to find the top 20 best hits of the reference databases (Delta or Omicron). Once we had identified the best parental candidates, we downloaded their sequences from GISAID with the following filters: “low coverage excluded”, “collection date complete”, and “complete sequence”. Finally, we utilised these top hit sequences to compute each lineage’s frequency of mutations using a Python script (Pandas library version 1.4.2).
2.5. Network Analyses
To investigate the evolutionary history of the recombinants, we constructed a set that included the Brazilian recombinant, XD, and XS lineages and their respective putative parental sequences. However, we only added the XF sequences to the dataset since we did not identify recombination in this lineage with the Sc2rf analysis (i.e., there were no candidate parental sequences to be included).
Subsequently, we aligned the sequences using Nextalign and submitted the dataset to a network analysis in Splitstree version 4.18.2 [26]. For this purpose, we used the NeighborNet method and drew the network using the RootedEqualAngle method using the Wuhan/WH01/2019 (EPI_ISL_406798) sequence as the root.
We also carried out a network analysis using the library pegas 1.1 from R version 4.1.3 [27]. We randomly sampled five sequences per lineage (AY.101, AY.4, B.1.617.2, BA.1, BA.1.1, XD, XS) from the original aligned dataset to improve the resolution of the network. As before, we included all four sequences of the Brazilian recombinant in the sampled dataset. Finally, we determined the haplotypes using the function haplotype and carried out the network modelling using the haploNet method (default parameters).
2.6. Recombination Analyses
For recombination detection, we carried out two additional analyses. Firstly, we utilised the sampled dataset in the software RDP4 version 4.101 [28], using a “full exploratory recombination scan” (all methods with default parameters). Secondly, we performed an analysis with the HybridCheck R library version 1.0.1 [29]. For this analysis, we considered each segment’s oldest top hit to be the parental sequence.
Concerning the XF lineage, we used South Africa/NICD-N28358/2022 (Omicron) and South Africa/NHLS-UCT-GS-AF27/2021 (Delta) as the parental sequences, as described in Wang et al. (2022) [16]. Regarding the recombinant sequences described in this study, we annotated the genome mutations using the Coronapp [30]. We then drew the genome maps using the Python libraries, Seaborn and DNA features viewer 3.1.1 [31].
2.7. Phylogenetic Analyses
To investigate the phylogenetic history of the Brazilian recombinant segments, we concatenated the four identified sequences with their respective parental sequences (top hits of lineages AY.101 and BA.1.1). We then split the aligned sequences into two segments, considering the recombination breakpoint inferred in the HybridCheck analysis: the 5′ portion encompassed nucleotide positions 1–21,769 and the 3′ portion encompassed positions 21,770–29,903.
Next, we built a phylogenetic tree using IQ-Tree version 1.6.12 [32] with an automatically detected substitution model (option-m MFP) and 1000 ultrafast bootstrapping replicates. We then conducted a timetree inference and a “mugration” model using discrete PANGO lineages with Treetime version 0.8.6 [33]. Subsequently, we drew a chronogram tree using a script written in R ggtree library version 3.2.1 [34], colouring the branches according to the PANGO lineages.
2.8. Estimating the Age of Introgression
To estimate the date of the recombination event, we extracted the SNPs of the Brazilian recombinant with snp-sites version 2.5.1 [35], only outputting columns containing ACGT (option-c). We then calculated the coalescence time based on the formula described in Ward & Oosterhout (2015) [29], considering a mutation rate of 1.83 × 10−6 substitutions per site per day [36] and a genome size of 29,903 nucleotides, based on the reference genome (Wuhan/2019).
To evaluate the context of the co-circulating lineages in Brazil, we plotted a kernel density of the absolute frequencies of Brazilian sequences collected between June 2021 and June 2022 (assessed in GISAID on 29 July 2022). We generated the density plots considering the Brazilian regions with a script written in Python (Seaborn library), employing the kdeplot method with a smoothing parameter equal to 2 (bw_adjust = 2). Then, we assessed the association between the Brazilian regions (South and non-South) and Pango lineages (AY.101, BA.1.1, and other lineages) using the Chi-square test (SciPy version 1.8.1). p-values < 0.05 were considered statistically significant.
3. Results
3.1. Sampling, Data Acquisition, and Genome Assembly
The SARS-CoV-2 recombinant samples were independently identified and processed by each institution according to their routine sequencing testing. The clinical data available and the assembly metrics for the three sequenced genomes are summarised in Table A1.
3.2. Identification of the Brazilian Deltacron, AYBA-RS
Preliminary analyses assigned the genome sequence from Cruz Alta (Brazil/RS-FIOCRUZ-8390/2022) to the recombinant lineage XS. However, the first 20 kb of the genome presented a mutational pattern distinct from that of an XS archetype (Figure A1 in Appendix A).
Through the genomic surveillance routine of the State Rio Grande do Sul, we identified two more sequences similar to the Cruz Alta sequence; one from Porto Alegre (Brazil/SC2-9898/2022) and another from Santa Maria (Brazil/RS-315-66266-219/2022) (Table A1 and Figure A2). Additionally, we searched the GISAID database and found a sequence from Rio de Janeiro (Brazil/RJ-NVBS19517GENOV829190059793/2022) that was very similar to the recombinants of South Brazil.
Once we identified our sequences as putative recombinants, we detected possible recombination signals in their genomes with Sc2rf. This analysis indicated that the 5′ region (positions 1–21,845) came from a Delta lineage, and the 3′ region (positions 21,846–29,903) was from an Omicron lineage (Figure A3). Further analyses indicated that the 5′ genomic region resembled AY.101, and that the 3′ region was most similar to BA.1 or BA.1.1 (Table A2). Next, we built lineage-specific sequence databases and searched for the sequences most similar to each segment (5′ Delta and 3′ Omicron). We considered the oldest top-hit for each segment to be the parental sequences, and we compared their mutational signatures to those of the Brazilian recombinant sequences (Figure 1). In this analysis, all the Brazilian recombinant sequences presented similar patterns: their 5′ segment matched AY.101, and their 3′ region, the BA.1.1 lineage (Figure 1 and Figure A4). The substitution C10604T was found exclusively in all four sequences of the Brazilian recombinant (Figure A4). Since the recombinant found in this study does meet the requirements of the Pango nomenclature [37], we named it AYBA-RS, considering its parental lineages (AY.101 and BA.1.1) and the location of its origin (RS, Brazil).
3.3. Comparison between AYBA-RS and the Other Deltacrons
We compared the AYBA-RS sequences to those from other Deltacrons described in Cov-Lineages [17], namely XD, XF, and XS. Identification of the recombination blocks using HybridCheck [29] supported the above result with Sc2rf, revealing a breakpoint at the beginning of the gene S (Figure 2, position 21,769). Furthermore, the HybridCheck analysis showed that the recombination pattern differed from those of XD, XF, and XS (Figure 2). The XD and the AYBA-RS were mainly composed of a Delta scaffold, while the XF and XS were of an Omicron scaffold. Analysis with the RDP4 software [28] confirmed that the AYBA-RS arose from a single recombination event, separated from those that led to the other Deltacrons (Table A3). This analysis also indicated a breakpoint close to the gene S (position 22,675) (Table A3). The RDP4 analysis revealed recombination events for the XD and XS sequences but not for the XF sequences.
3.4. Evolutionary History of Recombinants of VOC Delta and VOC Omicron
The phylogenetic network reconstruction and haplotype network analysis were congruent since all four Brazilian recombinant sequences formed a group distinct from the other Deltacrons. Furthermore, both models showed that the Deltacrons were distributed between the Delta and Omicron groups, having additional portions of each lineage (Figure 3A,B).
Phylogenetic analyses for each 5′ (Delta) and 3′ (Omicron) block of the AYBA-RS assigned the 5′ segments of the Brazilian Deltacron to the AY.101 clade. This clade was formed only by sequences from Brazil, notably from Santa Catarina (SC), a state from the South region that borders the Rio Grande do Sul (RS) (Figure A5). On the other hand, the 3′ segments of the AYBA-RS formed a clade with BA.1.1 sequences from diverse geographical locations. However, in this tree, the AYBA-RS did not form a group with sufficient bootstrap support (Figure A5).
Considering the number of SNPs between the AYBA-RS sequences (Figure 2B, Table A4 and Table A5), we estimated that the recombination event that gave origin to the first AYBA-RS genotype was likely to have occurred 180 days before the collection date of the first sample, i.e., December 2021. Inspection of the lineage density plots revealed an overlap of AY.101 and BA.1.1, mainly in December 2021, across the country’s regions (Figure 4). In addition, AY.101 and BA.1.1 presented higher relative frequencies in the South region than in the rest of Brazil (Table A6, Chi-square test: X2 = 10,519.21, d.f. = 1, p < 0.00001).
4. Discussion
Here we describe the first Deltacron lineage identified in Brazil, AYBA-RS. Our analysis shows that this recombinant strain arose from a single recombination event between the AY.101 and the BA.1.1 lineages in Southern Brazil. The genetic exchange between both variants most likely happened in December 2021, when the Omicron lineage started to overtake Delta around the country [38]. Furthermore, we show that this recombinant differs from the previously described SARS-CoV-2 Deltacrons XD, XF, and XS, supporting a new recombination event and evidence of incipient parallel evolution. We employed a robust approach to identify and describe the recombinant SARS-CoV-2 lineages, combining methods involving phylogenetic and population genetic techniques incorporated in HybridCheck [29], RDP4 [28], SplitsTree [26] and other approaches. This combined approach enabled us to determine the parental lineages, identify the recombination breakpoint around the 22 kb position near the Spike gene (S), and estimate the date when the new recombinant lineage evolved.
In genomic studies of hybridisation, an apparent signature of genetic introgression can also be caused by mixed infections that result in chimeric sequences. Those chimeras may be erroneously interpreted as recombinants or hybrids. However, we observed four (nearly) identical recombinant genotypes that were collected at separate times and in different locations. Furthermore, these isolates were sequenced in other laboratories. Hence, we can confidently rule out the possibility of mixed infections resulting in chimeric sequences. Therefore, we can conclude that the samples described here are genuine recombinants.
Based on the phylogenetic reconstruction, we were able to ascertain that the sequences found in the Rio Grande do Sul and Rio de Janeiro States coalesced and had a single origin. Since genomic deposits in GISAID are recent, the absence of more sequences in the database suggests that the variant had a minor epidemiological impact. Alternatively, a lack of genomic surveillance may also have contributed. Indeed, the sequencing effort in Brazil is still lower than those in Europe and the USA [39], and this could have resulted in an underestimation of the true prevalence of this variant. The more comprehensive sequencing in Europe and the USA might explain why most of the recombinants are found in these regions. On the other hand, this could also reflect a genuine pattern, given that these regions experience considerable international air travel, enabling viral gene flow and between-variant recombination.
Co-infections with different variants of SARS-CoV-2 are necessary to trigger recombination events. Spatiotemporal variation in selection pressures can maintain a balanced polymorphism and multiple variants. In addition, multiple variants can also be maintained in substructured environments, as well as through a time lag in coevolution [40]. International travel can mediate gene flow and connect these distinct variants, facilitating inter-variant recombination and genetic introgression [3]. In fact, along the pandemic’s course, there have been reports of patients having Omicron and Delta co-infections [41,42,43]. Such events provide an opportunity for the emergence of new lineages with distinct phenotypes [2,44]. These phenotypes can occupy different peaks in the fitness landscape separated by fitness valleys [45,46]. Such valleys can be a consequence of epistasis, which is a phenomenon wherein nucleotide substitutions influence each other’s impact on fitness, resulting in a fitness landscape with many small and large peaks, ridges, and valleys. In such a rugged landscape, populations evolve slowly because they can become stuck once they have reached a local optimum, i.e., the highest fitness peak in the nearby landscape [47]. In that case, several mutations are required to climb the next even higher peak [48]. Recombination events could help the virus to bridge such valleys because recombination (and genetic introgression) offers three theoretical advantages over mutations (see Introduction). Given the large amount of nucleotide divergence that has evolved in multiple extant lineages, we argue that it is likely that recombinant evolution will play an increasingly important role in SARS-CoV-2 evolution and the COVID-19 pandemic. The potential for recombination to evolve better-adapted SARS-CoV-2 variants is increased by international travel that can bring allopatric lineages and variants from different continents together.
An analysis proposed by Turakhia and co-workers [49] suggested that approximately 2.7% of sequenced SARS-CoV-2 genomes have detectable recombinant ancestry. However, the authors also highlight that hybrid strains of genetically similar viral lineages are challenging to detect and that the overall recombination frequency could be underestimated [49,50]. The current study corroborates this assertion, showing that distinct recombinant lineages can be challenging to differentiate and that advanced evolutionary genomic analyses are required to identify and trace the origin of recombinant lineages. In addition, future studies with more Deltacron lineages would extend our analysis, allowing us to verify the breakpoint site’s impact on the recombinants’ fitness.
The genomic bulletin from June 2022, which included only 83 samples collected in the Rio Grande do Sul, revealed that even with the predominance of the Omicron lineage, Delta (AY.99.2) and Gamma (P.2) lineages are still circulating. Taking into account the relaxation of prevention measures, the non-adherence to the vaccine booster dose, and the simultaneous circulation of multiple lineages in the same region, we might be creating a perfect storm for the emergence of new SARS-CoV-2 VOCs. Our study supports the assertion that SARS-CoV-2 genetic introgression events might be more common than expected initially. This observation has implications for disease control measures, emphasising the need for more intensive genomic and epidemiological surveillance worldwide.
Acknowledgments
We thank Luana Giongo Pedrotti for statistical support and Edmund Willis for commenting on an earlier draft of the MS. We thank all the authors who have shared genome data on GISAID utilised in this study.
Appendix A
Table A1.
Brazil/RS-315-66266-219/2022 | Brazil/SC2-9898/2022 | Brazil/RS-FIOCRUZ-8390/2022 | |
---|---|---|---|
Accession ID | EPI_ISL_14284846 | EPI_ISL_14381991 | EPI_ISL_13523515 |
Sample date | 6 June 2022 | 2 May 2022 | 12 February 2022 |
Location | Santa Maria | Porto Alegre | Cruz Alta |
Mapped reads | 514,768 | 602,459 | 3,039,386 |
Coverage breadth | >30 × 98.79% | >30 × 98.70% | >30 × 99% |
Coverage depth | 798× | 532× | 10,000× |
Library construction method |
COVIDSeq Assay Illumina | PARAGON CleanPlex SARS-CoV-2 FLEX Panel | COVIDSeq Assay Illumina |
Sequencing technology | Illumina iSeq 100 | Illumina MiSeq | Illumina MiSeq |
Assembly method | ViralFlow | SOPHiA DDM v.5 | ViralFlow |
Table A2.
Sample (Lineage) | Segment Start | Segment End | Nextclade * | Pangolin # | Lineage of the Top-Hit (Strain Name) $ |
---|---|---|---|---|---|
Brazil/RS-FIOCRUZ-8390/2022 (AYBA-RS) | 37 | 21,845 | AY.101 | AY.101 | AY.100 (Guatemala/INC-LNS-127/2021) |
Brazil/RS-FIOCRUZ-8390/2022 (AYBA-RS) | 21,846 | 29,857 | BA.1 | - | BA.1.1 (Taiwan/TSGH-52/2021) |
France/HDF-IPP54794/2022 (XD) | 55 | 21,845 | XD | XD | BA.1 (Brazil/BA-FIOCRUZ-PVM99977/2022) |
France/HDF-IPP54794/2022 (XD) | 21,846 | 25,469 | XD | - | AY.4 (Belgium/ULG-17464/2021) |
USA/CO-CDC-FG-248528/2022 (XS) | 38 | 10,029 | XS | - | B.1.617.2 (Pakistan/UHSPK3-61/2021) |
USA/CO-CDC-FG-248528/2022 (XS) | 10,030 | 29,792 | XS | - | BA.1.1 (Paraguay/454211/2022) |
*—Lineage identification in the Nextclade Web; #—Lineage identification in the Pangolin COVID-19 Lineage Assigner; $—Blast on Nextstrain’s global analysis—GISAID data, lineage taken from the GISAID metadata.
Table A3.
Recombinant | Breakpoint Start | Breakpoint End | Minor Parental Lineages | Major Parental Lineages | RDP (p-Value) |
GENECONV (p-Value) |
Maxchi (p-Value) | Chimaera (p-Value) | SiSscan (p-Value) |
---|---|---|---|---|---|---|---|---|---|
AYBA-RS | 22,675 | 29,392 | BA.1.1, BA.1, XF, XS | AY.4, AY.101, B.1.617.2 | 8.21 × 10−6 | 3.89 ×10−5 | 6.05 × 10−10 | 1.20 × 10−9 | 1.33 × 10−8 |
XD | 21,804 | 25,526 | BA.1.1, BA.1, XF, XS | AY.4, AY.101, B.1.617.2 | 1.76 × 10−8 | 5.11 × 10−9 | 1.33 × 10−9 | 6.12 × 10−10 | 2.45 × 10−7 |
XS | 29,652 | 9751 | AY.4, AY.101, B.1.617.2, XD | BA.1.1, BA.1 | 0.012 | 2.04 × 10−4 | 0.001 | 0.002 | NS |
NS—non-significant.
Table A4.
Sequences | SNP Divergence | Mean Time (in Days) | Min Time (5% CI) | Max Time (95% CI) |
---|---|---|---|---|
CA–RJ | 19 | 180 | 120 | 255 |
SM–CA | 16 | 152 | 100 | 222 |
PA–CA | 14 | 134 | 84 | 200 |
SM–RJ | 13 | 125 | 78 | 190 |
PA–RJ | 11 | 107 | 63 | 168 |
PA–SM | 4 | 42 | 18 | 83 |
CA—Brazil/RS-FIOCRUZ-8390/2022 (collection date, 2022-02-11); PA—Brazil/SC2-9898/2022 (collection date, 2022-05-02); RJ—Brazil/RJ-NVBS19517GENOV829190059793/2022 (collection date, 2022-05-06); SM—Brazil/RS-315-66266-219/2022 (collection date, 2022-06-06).
Table A5.
Position | CA | PA | RJ | SM |
---|---|---|---|---|
245 | C | C | C | T |
647 | A | G | A | A |
1348 | C | C | T | C |
3464 | T | C | C | C |
4057 | T | C | C | C |
7075 | T | C | C | C |
7081 | C | T | T | T |
14,183 | C | T | T | T |
16,238 | G | C | C | C |
17,407 | T | C | C | C |
20,062 | G | T | T | T |
21,752 | T | T | C | T |
21,846 | T | T | C | T |
22,419 | C | C | C | T |
22,599 | G | A | G | A |
22,673 | C | C | T | C |
22,688 | A | A | G | A |
22,775 | G | G | A | G |
22,786 | A | A | C | A |
22,792 | C | C | T | C |
22,882 | T | G | T | G |
25,000 | T | T | T | C |
25,482 | C | A | A | A |
25,704 | T | C | C | C |
27,864 | C | T | T | T |
CA—Brazil/RS-FIOCRUZ-8390/2022 (collection date, 2022-02-11); PA—Brazil/SC2-9898/2022 (collection date, 2022-05-02); RJ—Brazil/RJ-NVBS19517GENOV829190059793/2022 (collection date, 2022-05-06); SM—Brazil/RS-315-66266-219/2022 (collection date, 2022-06-06).
Table A6.
Lineage | South Brazil | Non-South Brazil |
---|---|---|
AY.101 | 15.68% (2580) |
1.44% (1590) |
BA.1.1 | 16.08% (2646) |
8.45% (9344) |
Other lineages | 68.24% (11,228) |
90.11% (99,645) |
Absolute frequencies are in parentheses.
Author Contributions
Conceptualization, C.v.O. and E.W.; methodology, F.H.S., T.R.y.C., B.C.C., A.d.A.V., T.F.A., R.S.S., G.d.L.W., L.F., G.B., C.S.B., B.d.O.R., R.B.B., F.M.d.S.G., G.L.T.G. and P.B.M.; software, F.H.S.; formal analysis, F.H.S., T.F.A., R.S.S., A.P.M.V., J.C. and C.v.O.; investigation, T.F.A., R.S.S. and T.S.G.; resources, T.S.G., P.C.R., G.d.L.W., A.V.S., P.d.A.T. and E.W.; data curation, F.H.S. and T.F.A.; writing—original draft preparation, F.H.S., T.F.A., R.S.S., A.P.M.V., J.C., P.d.A.T. and C.v.O.; writing—review and editing, F.H.S., T.F.A., P.C.R., G.d.L.W., F.H.d.O. and C.v.O.; supervision, F.H.d.O., E.W. and C.v.O.; funding acquisition, A.V.S., F.H.d.O. and C.v.O. All authors have read and agreed to the published version of the manuscript.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
All genome sequences and associated metadata are published in GISAID’s EpiCoV database (EPI_SET_220829tz). To view the contributors of each sequence with details such as accession number, virus name, collection date, originating lab and submitting lab, and the list of authors, visit 10.55876/gis8.220829tz.
Conflicts of Interest
The authors declare no conflict of interest.
Funding Statement
C.v.O is funded by the University of East Anglia (UEA) and Earth and Life Systems Alliance (ELSA), Norwich Research Park, Norwich, UK.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
References
- 1.Bentley K., Evans D.J. Mechanisms and Consequences of Positive-Strand RNA Virus Recombination. J. Gen. Virol. 2018;99:1345–1356. doi: 10.1099/jgv.0.001142. [DOI] [PubMed] [Google Scholar]
- 2.Dezordi F.Z., Resende P.C., Naveca F.G., do Nascimento V.A., de Souza V.C., Dias Paixão A.C., Appolinario L., Lopes R.S., da Fonseca Mendonça A.C., Barreto da Rocha A.S., et al. Unusual SARS-CoV-2 Intrahost Diversity Reveals Lineage Superinfection. Microb. Genom. 2022;8:000751. doi: 10.1099/mgen.0.000751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Van Oosterhout C. Mitigating the Threat of Emerging Infectious Diseases; a Coevolutionary Perspective. Virulence. 2021;12:1288–1295. doi: 10.1080/21505594.2021.1920741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Nader J.L., Mathers T.C., Ward B.J., Pachebat J.A., Swain M.T., Robinson G., Chalmers R.M., Hunter P.R., van Oosterhout C., Tyler K.M. Evolutionary Genomics of Anthroponosis in Cryptosporidium. Nat. Microbiol. 2019;4:826–836. doi: 10.1038/s41564-019-0377-x. [DOI] [PubMed] [Google Scholar]
- 5.Tichkule S., Cacciò S.M., Robinson G., Chalmers R.M., Mueller I., Emery-Corbin S.J., Eibach D., Tyler K.M., van Oosterhout C., Jex A.R. Global Population Genomics of Two Subspecies of Cryptosporidium Hominis during 500 Years of Evolution. Mol. Biol. Evol. 2022;39:msac056. doi: 10.1093/molbev/msac056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Fragata I., Blanckaert A., Dias Louro M.A., Liberles D.A., Bank C. Evolution in the Light of Fitness Landscape Theory. Trends Ecol. Evol. 2019;34:69–82. doi: 10.1016/j.tree.2018.10.009. [DOI] [PubMed] [Google Scholar]
- 7.CoVariants. [(accessed on 12 August 2022)]. Available online: https://covariants.org/
- 8.Mlcochova P., Kemp S.A., Dhar M.S., Papa G., Meng B., Ferreira I.A.T.M., Datir R., Collier D.A., Albecka A., Singh S., et al. SARS-CoV-2 B.1.617.2 Delta Variant Replication and Immune Evasion. Nature. 2021;599:114–119. doi: 10.1038/s41586-021-03944-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Zhan Y., Yin H., Yin J.-Y.B. 1.617.2 (Delta) Variant of SARS-CoV-2: Features, Transmission and Potential Strategies. Int. J. Biol. Sci. 2022;18:1844. doi: 10.7150/ijbs.66881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Farheen S., Araf Y., Tang Y.-D., Zheng C. The Deltacron Conundrum: Its Origin and Potential Health Risks. J. Med. Virol. 2022;94:5096–5102. doi: 10.1002/jmv.27990. [DOI] [PubMed] [Google Scholar]
- 11.Setiabudi D., Sribudiani Y., Hermawan K., Andriyoko B., Nataprawira H.M. The Omicron Variant of Concern: The Genomics, Diagnostics, and Clinical Characteristics in Children. Front. Pediatr. 2022;10:898463. doi: 10.3389/fped.2022.898463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Outbreak.info. [(accessed on 24 August 2022)]. Available online: https://outbreak.info/
- 13.Moisan A., Mastrovito B., De Oliveira F., Martel M., Hedin H., Leoz M., Nesi N., Schaeffer J., Ar Gouilh M., Plantier J.C. Evidence of Transmission and Circulation of Deltacron XD Recombinant SARS-CoV-2 in Northwest France. Clin. Infect. Dis. 2022:ciac360. doi: 10.1093/cid/ciac360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Karbalaei M., Keikha M. Deltacron Is a Recombinant Variant of SARS-CoV-2 but Not a Laboratory Mistake. Ann. West Med. Surg. 2022;79:104032. doi: 10.1016/j.amsu.2022.104032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Maulud S.Q., Hasan D.A., Ali R.K., Rashid R.F., Saied A.A., Dhawan M., Priyanka, Choudhary O.P. Deltacron: Apprehending a New Phase of the COVID-19 Pandemic. Int. J. Surg. 2022;102:106654. doi: 10.1016/j.ijsu.2022.106654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wang L., Zhou H.-Y., Li J.-Y., Cheng Y.-X., Zhang S., Aliyari S., Wu A., Cheng G. Potential Intervariant and Intravariant Recombination of Delta and Omicron Variants. J. Med. Virol. 2022;94:4830–4838. doi: 10.1002/jmv.27939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Cov-Lineages. [(accessed on 22 September 2022)]. Available online: https://cov-lineages.org/
- 18.Dezordi F.Z., da Silva Neto A.M., Campos T.D.L., Jeronimo P.M.C., Aksenen C.F., Almeida S.P., Wallau G.L., On Behalf of the Fiocruz Covid-Genomic Surveillance Network ViralFlow: A Versatile Automated Workflow for SARS-CoV-2 Genome Assembly, Lineage Assignment, Mutations and Intrahost Variant Detection. Viruses. 2022;14:217. doi: 10.3390/v14020217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Aksamentov I., Roemer C., Hodcroft E., Neher R. Nextclade: Clade Assignment, Mutation Calling and Quality Control for Viral Genomes. J. Open Source Softw. 2021;6:3773. doi: 10.21105/joss.03773. [DOI] [Google Scholar]
- 20.Sc2rf—SARS-CoV-2 Recombinant Finder. [(accessed on 22 September 2022)]. Available online: https://github.com/lenaschimmel/sc2rf.
- 21.Larsson A. AliView: A Fast and Lightweight Alignment Viewer and Editor for Large Datasets. Bioinformatics. 2014;30:3276–3278. doi: 10.1093/bioinformatics/btu531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.O’Toole Á., Scher E., Underwood A., Jackson B., Hill V., McCrone J.T., Colquhoun R., Ruis C., Abu-Dahab K., Taylor B., et al. Assignment of Epidemiological Lineages in an Emerging Pandemic Using the Pangolin Tool. Virus Evol. 2021;7:veab064. doi: 10.1093/ve/veab064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic Local Alignment Search Tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 24.Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T.L. BLAST+: Architecture and Applications. BMC Bioinform. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Genomic Epidemiology of SARS-CoV-2 with Subsampling Focused Globally over the Past 6 Months. [(accessed on 23 June 2022)]. Available online: https://nextstrain.org/ncov/gisaid/global/
- 26.Huson D.H., Bryant D. Application of Phylogenetic Networks in Evolutionary Studies. Mol. Biol. Evol. 2006;23:254–267. doi: 10.1093/molbev/msj030. [DOI] [PubMed] [Google Scholar]
- 27.Paradis E. Pegas: An R Package for Population Genetics with an Integrated-Modular Approach. Bioinformatics. 2010;26:419–420. doi: 10.1093/bioinformatics/btp696. [DOI] [PubMed] [Google Scholar]
- 28.Martin D.P., Murrell B., Khoosal A., Muhire B. Detecting and Analyzing Genetic Recombination Using RDP4. Methods Mol. Biol. 2017;1525:433–460. doi: 10.1007/978-1-4939-6622-6_17. [DOI] [PubMed] [Google Scholar]
- 29.Ward B.J., van Oosterhout C. HybridCheck: Software for the Rapid Detection, Visualization and Dating of Recombinant Regions in Genome Sequence Data. Mol. Ecol. Resour. 2016;16:534–539. doi: 10.1111/1755-0998.12469. [DOI] [PubMed] [Google Scholar]
- 30.Mercatelli D., Triboli L., Fornasari E., Ray F., Giorgi F.M. Coronapp: A Web Application to Annotate and Monitor SARS-CoV-2 Mutations. J. Med. Virol. 2021;93:3238–3245. doi: 10.1002/jmv.26678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zulkower V., Rosser S. DNA Features Viewer: A Sequence Annotation Formatting and Plotting Library for Python. Bioinformatics. 2020;36:4350–4352. doi: 10.1093/bioinformatics/btaa213. [DOI] [PubMed] [Google Scholar]
- 32.Minh B.Q., Schmidt H.A., Chernomor O., Schrempf D., Woodhams M.D., von Haeseler A., Lanfear R. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 2020;37:1530–1534. doi: 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Sagulenko P., Puller V., Neher R.A. TreeTime: Maximum-Likelihood Phylodynamic Analysis. Virus Evol. 2018;4:vex042. doi: 10.1093/ve/vex042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Yu G. Using Ggtree to Visualize Data on Tree-Like Structures. Curr. Protoc. Bioinform. 2020;69:e96. doi: 10.1002/cpbi.96. [DOI] [PubMed] [Google Scholar]
- 35.Page A.J., Taylor B., Delaney A.J., Soares J., Seemann T., Keane J.A., Harris S.R. SNP-Sites: Rapid Efficient Extraction of SNPs from Multi-FASTA Alignments. Microb. Genom. 2016;2:e000056. doi: 10.1099/mgen.0.000056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wang S., Xu X., Wei C., Li S., Zhao J., Zheng Y., Liu X., Zeng X., Yuan W., Peng S. Molecular Evolutionary Characteristics of SARS-CoV-2 Emerging in the United States. J. Med. Virol. 2022;94:310–317. doi: 10.1002/jmv.27331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Rambaut A., Holmes E.C., O’Toole Á., Hill V., McCrone J.T., Ruis C., du Plessis L., Pybus O.G. A Dynamic Nomenclature Proposal for SARS-CoV-2 Lineages to Assist Genomic Epidemiology. Nat. Microbiol. 2020;5:1403–1407. doi: 10.1038/s41564-020-0770-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Da Silva M.S., Gularte J.S., Filippi M., Demoliner M., Girardi V., Mosena A.C.S., de Abreu Góes Pereira V.M., Hansen A.W., Weber M.N., de Almeida P.R., et al. Genomic and Epidemiologic Surveillance of SARS-CoV-2 in Southern Brazil and Identification of a New Omicron-L452R Sublineage. Virus Res. 2022;321:198907. doi: 10.1016/j.virusres.2022.198907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Brito A.F., Semenova E., Dudas G., Hassler G.W., Kalinich C.C., Kraemer M.U.G., Ho J., Tegally H., Githinji G., Agoti C.N., et al. Global Disparities in SARS-CoV-2 Genomic Surveillance. Nat Commun. 2022;13:7003. doi: 10.1038/s41467-022-33713-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lighten J., Papadopulos A.S.T., Mohammed R.S., Ward B.J., Paterson I.G., Baillie L., Bradbury I.R., Hendry A.P., Bentzen P., van Oosterhout C. Evolutionary Genetics of Immunological Supertypes Reveals Two Faces of the Red Queen. Nat. Commun. 2017;8:1294. doi: 10.1038/s41467-017-01183-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Molina-Mora J.A., Cordero-Laurent E., Calderón-Osorno M., Chacón-Ramírez E., Duarte-Martínez F. Metagenomic Pipeline for Identifying Co-Infections among Distinct SARS-CoV-2 Variants of Concern: Study Cases from Alpha to Omicron. Sci. Rep. 2022;12:9377. doi: 10.1038/s41598-022-13113-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Rockett R.J., Draper J., Gall M., Sim E.M., Arnott A., Agius J.E., Johnson-Mackinnon J., Fong W., Martinez E., Drew A.P., et al. Co-Infection with SARS-CoV-2 Omicron and Delta Variants Revealed by Genomic Surveillance. Nat. Commun. 2022;13:2745. doi: 10.1038/s41467-022-30518-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Focosi D., Maggi F. Recombination in Coronaviruses, with a Focus on SARS-CoV-2. Viruses. 2022;14:1239. doi: 10.3390/v14061239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Taghizadeh P., Salehi S., Heshmati A., Houshmand S.M., InanlooRahatloo K., Mahjoubi F., Sanati M.H., Yari H., Alavi A., Jamehdar S.A., et al. Study on SARS-CoV-2 Strains in Iran Reveals Potential Contribution of Co-Infection with and Recombination between Different Strains to the Emergence of New Strains. Virology. 2021;562:63–73. doi: 10.1016/j.virol.2021.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wright S. The roles of mutation, inbreeding, crossbreeding, and selection in evolution; Proceedings of the Sixth International Congress of Genetics; Ithaca, NY, USA. 24–31 August 1932; pp. 356–366. [Google Scholar]
- 46.Wright S. The Shifting Balance Theory and Macroevolution. Annu. Rev. Genet. 1982;16:1–20. doi: 10.1146/annurev.ge.16.120182.000245. [DOI] [PubMed] [Google Scholar]
- 47.Bulankova P., Sekulić M., Jallet D., Nef C., van Oosterhout C., Delmont T.O., Vercauteren I., Osuna-Cruz C.M., Vancaester E., Mock T., et al. Mitotic Recombination between Homologous Chromosomes Drives Genomic Diversity in Diatoms. Curr. Biol. 2021;31:3221–3232.e9. doi: 10.1016/j.cub.2021.05.013. [DOI] [PubMed] [Google Scholar]
- 48.Vos M. Why Do Bacteria Engage in Homologous Recombination? Trends Microbiol. 2009;17:226–232. doi: 10.1016/j.tim.2009.03.001. [DOI] [PubMed] [Google Scholar]
- 49.Turakhia Y., Thornlow B., Hinrichs A., McBroome J., Ayala N., Ye C., Smith K., De Maio N., Haussler D., Lanfear R., et al. Pandemic-Scale Phylogenomics Reveals the SARS-CoV-2 Recombination Landscape. Nature. 2022;609:994–997. doi: 10.1038/s41586-022-05189-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Lacek K.A., Rambo-Martin B.L., Batra D., Zheng X.-Y., Hassell N., Sakaguchi H., Peacock T., Groves N., Keller M., Wilson M.M., et al. SARS-CoV-2 Delta–Omicron Recombinant Viruses, United States. Emerg. Infect. Dis. 2022;28:1442. doi: 10.3201/eid2807.220526. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All genome sequences and associated metadata are published in GISAID’s EpiCoV database (EPI_SET_220829tz). To view the contributors of each sequence with details such as accession number, virus name, collection date, originating lab and submitting lab, and the list of authors, visit 10.55876/gis8.220829tz.