Significance
The jararaca lancehead genome provides a comprehensive road map of the genomic context of pitviper toxin genes. Comparisons of these genomic segments across the phylogeny revealed an unexpectedly high number of toxin families that originated via the direct co-option of preexisting nontoxin genes, indicating that the snake toxin arsenal was mostly assembled from local elements of the ancestral genome. These results support a new perspective in venom evolution in which gene duplications in most toxin families occurred after, rather than before, initial toxin recruitment from nontoxin genes, contributing to the evolutionary optimization of snake venoms. They also emphasize the importance of correctly identifying orthologous loci to accurately trace the genomic pathways that lead to the evolutionary origination of new traits.
Keywords: snake venom, toxin evolution, genome, gene recruitment, co-option
Abstract
Venom is a key adaptive innovation in snakes, and how nonvenom genes were co-opted to become part of the toxin arsenal is a significant evolutionary question. While this process has been investigated through the phylogenetic reconstruction of toxin sequences, evidence provided by the genomic context of toxin genes remains less explored. To investigate the process of toxin recruitment, we sequenced the genome of Bothrops jararaca, a clinically relevant pitviper. In addition to producing a road map with canonical structures of genes encoding 12 toxin families, we inferred most of the ancestral genes for their loci. We found evidence that 1) snake venom metalloproteinases (SVMPs) and phospholipases A2 (PLA2) have expanded in genomic proximity to their nonvenomous ancestors; 2) serine proteinases arose by co-opting a local gene that also gave rise to lizard gilatoxins and then expanded; 3) the bradykinin-potentiating peptides originated from a C-type natriuretic peptide gene backbone; and 4) VEGF-F was co-opted from a PGF-like gene and not from VEGF-A. We evaluated two scenarios for the original recruitment of nontoxin genes for snake venom: 1) in locus ancestral gene duplication and 2) in locus ancestral gene direct co-option. The first explains the origins of two important toxins (SVMP and PLA2), while the second explains the emergence of a greater number of venom components. Overall, our results support the idea of a locally assembled venom arsenal in which the most clinically relevant toxin families expanded through posterior gene duplications, regardless of whether they originated by duplication or gene co-option.
The evolutionary history of snakes involved striking trait transformations, such as body elongation, limb loss, the development of chemo- and thermoperception, and different sexual reproductive modes and, in some groups, the acquisition of a complex venom apparatus (1). Whereas most of these extreme adaptations are likely controlled by gene systems that are shared in common with other vertebrates, the snake venom system represents a novel key adaptive innovation.
The “advanced snakes” (Caenophidia) developed diverse fang types from the same embryonic origin as their specialized venom glands (VG) (2), which harbor a wide range of bioactive compounds used for predation and defense (3). A large body of knowledge about the evolutionary history of toxin families, selective pressures acting on specific components, and degrees of intra- and interspecific variation in venom was acquired through the sequencing of messenger RNAs (mRNAs) from snake VGs. Thus, many of the hypotheses developed in the last decade about the co-option of proteins involved in physiological functions and the evolutionary origin of venoms, for example, refs. 4 to 7, were largely based on transcript or protein data. However, fundamental questions related to identifying the evolutionary history of this key trait remain. These include the following: 1) “From which preexisting elements did the venom genes arise?” and 2) “How did these ancestral genes transform into toxin genes with unique protein domains?” Recently, large-scale genomic landscapes from venomous snakes became available in the literature (2, 8–14). Gene structures of toxins have been described (2, 15–18), although only a few gene clusters have been studied in a detailed way in viperids (17, 19–21). With these advances, the early origins and the evolutionary routes followed by snake toxins have started to be elucidated and the above questions can now be better addressed with the information provided by the genomic context of toxin genes (22).
Of particular interest, Bothrops jararaca (common name, jararaca lancehead) is a representative species of the most diverse and common genus of viperid snakes in South America and provides one of the best-studied models of viperid venom. The venom of B. jararaca is diverse in terms of different protein families (23) and different proteoforms (24, 25) present. Many of these components have been characterized (24, 26–31), and early studies on B. jararaca toxins helped to establish the basis of the kallikrein-kinin system and led to the development of antihypertensive drugs (32, 33). Although the venom of this species has been broadly investigated through transcriptomic and proteomic techniques (7, 34–37), its genomic background has yet to be determined.
Here, we address this need by sequencing the genome of B. jararaca and then conducting genome prospecting targeting venom-related genes and scaffolds. By retrieving genes for all major toxin classes known in Bothrops, we provide a comprehensive but accessible road map of toxin genes from a Viperidae snake. Moreover, by providing the genomic contexts of these genes relative to homologous loci of other snakes and vertebrates in general, we are able to infer the genes originally located in similar positions in the ancestors and, thus, to deduce the initial steps followed by nonvenom genes in becoming part of the snake venom arsenal.
Results and Discussion
B. jararaca Genome Sequencing and Strategies for Targeting Venom Genes.
Given the high content of repetitive elements predicted in Viperidae genomes (13, 38, 39), we used four different strategies to optimize the chance of obtaining full-length genes and long genomic segments of interest (Fig. 1): 1) a main hybrid assembly of short and long reads of whole-genome shotgun sequencing (HA-WGS), 2) the screening of toxin genes in independent assemblies of subsets of high quality short reads (SA-WGS), 3) the direct screening of toxin genes within unassembled long reads (LR-WGS), and 4) the high throughput bacterial artificial chromosome (BAC) sequencing and screening for toxin genes (BAC-SeqSc). The last approach was uniquely designed for this work (Fig. 1, Right) and is based on the large-scale sequencing of pools of BACs, which are screened for the presence of toxin genes, with the selected BACs then resequenced with high coverage.
Fig. 1.
Schematic diagram of the genomic sequencing strategies used to obtain toxin genes and their flanking regions in B. jararaca.
Through the k-mer analyses of the short reads, we estimated the genome size of B. jararaca to be 2.1 gigabase pairs (Gbp) (SI Appendix, Fig. S1), consistent with the size of 2.2 Gbp predicted previously (40). The assembly of the HA-WGS resulted in an N50 contig size of 163.5 kb, for a total contig length of 1.66 Gbp. We evaluated the completeness of the B. jararaca genome assembly using BUSCO (Benchmarking Universal Single-Copy Orthologs) datasets (41). From 3,354 ortholog groups searched in BUSCO, 3,096 (92.3%) were identified; from those, 2,775 (82.7% of total) were complete, and 321 (9.6%) genes were “fragmented.” The repetitive sequences totaled 285 megabase pairs (Mbp) (17% of the genome). The most common repetitive elements were retroelements (14.6%), among which the long interspersed nuclear element L2/CR1/Rex was the most abundant one (8.8%), as observed in other snakes (38). The scaffolds were deposited in GenBank under Bioproject PRJNA691605 (74). Genome browsing is available at: http://cetics.butantan.gov.br/gb2/gbrowse/bothrops_jararaca/ (75).
In addition to genomic sequences, we generated transcriptomic data for seven different tissues (VG, gut, kidney, stomach, lung, heart, and brain) of the same B. jararaca specimen used for the genome assembly. VG reads were also de novo assembled (42) and annotated by BLAST searches against UniProt and previously annotated transcripts of B. jararaca (7). We obtained 45 nonredundant full-length transcripts encoding major venom-related proteins (SI Appendix, Table S1). The quantitative toxin profile (SI Appendix, Fig. S2) was similar to those previously reported from the same or closely related species (23, 34, 43). The major toxin classes observed were SVMP (snake venom metalloproteinase) class P-III and class P-II, followed by C-type lectins (CTL), phospholipase A2 (PLA2), bradykinin-potentiating peptide and C-type natriuretic peptide precursor (BPP/CNP), and snake venom serine proteinase (SVSP), accounting for 42.7% of the total VG transcription.
The venom proteome of the same specimen was analyzed by in-solution trypsin digestion and liquid chromatography-tandem mass spectrometry (LC-MS/MS). The protein spectra were searched against the proteins predicted from the transcripts as well as reptile protein sequences available at UniProt (Dataset S1). This analysis confirmed the presence in the venom of almost all toxins predicted from the transcriptome (SI Appendix, Table S1).
The Venom-Coding Genome of B. jararaca.
To obtain an overview of the dataset of genes encoding venom components and their genomic context, we prospected for toxin loci within the whole genomic datasets of B. jararaca using the VG transcripts as probes. In total, 55 full-length venom genes (considering the presence of all exons and introns within the coding sequence [CDS]) from 12 different toxin families were identified (SI Appendix, Tables S1 and S2).
Our data showed that in B. jararaca, most toxin families (PLA2, BPP/CNP, CRISP, HYAL, NGF, VEGF-F, NUCL and PLB) are represented by single genes, although multiple unique genes were recognized for other toxins, such as SVMP (P-III and P-II classes), SVSP, and CTL (SI Appendix, Table S2). In the case of the CTL family, there are likely more genes in the genome than we were able to retrieve, since multiple transcripts were recognized in the VG (SI Appendix, Table S1). On the other hand, no SVMP of the P-I class existed in the specimen sampled in this study, since no such sequence was identified in the genome, the VG transcriptome, or the venom proteome, despite our specific efforts to screen the raw data for this type of metalloproteinase. Since other individuals have been shown to possess P-I class of SVMPs (7, 37, 44), it suggests a polymorphism in the species for this toxin.
Gene size varied greatly, from 1.7 kb (PLA2) to 40.6 kb (PLB). A representative structure from each toxin family is shown in SI Appendix, Fig. S3. Within the families containing multiple paralogs, the number and size of the exons were conserved, but the intron sizes varied greatly. Of note is that the first exons of the CRISP gene (exhibiting eight exons) and some SVSP genes (exhibiting five exons) corresponded to noncoding 5′ untranslated region (UTR) sequences. For the NGF gene, only the last exon contained the whole mature protein coding sequence.
By investigating the regions surrounding toxin genes in B. jararaca genome, we were able to identify flanking nontoxin genes and thus to recognize gene blocks that could have synteny with the genomes of other snakes, lizards, and nonsquamate Chordata. We then interrogated several Chordata genome sequences to find evidence of synteny with these blocks, reannotating the regions when necessary to assure the correct gene set in the species. The syntenic blocks recognized for each toxin class are summarized in Fig. 2, providing an overview of the whole venom gene landscape. The location of the venom genes between their flanking genes is generally conserved among species, but because of selective pressure and accelerated evolution rates (10, 12), some toxin families vary greatly in terms of copy (paralogs) number and protein primary structure.
Fig. 2.
Schematic architecture of venom gene loci of different toxins showing syntenic blocks among different species. Red pentagon, toxin gene; yellow pentagon, nontoxin ortholog of a toxin gene (or paralog if in the same species); and white pentagon, flanking nontoxin gene. In each box, the name of the ortholog representing the putative ancestral gene for the toxin family is noted bellow the toxin family name, followed in parenthesis by the gene ID of a reference gene (which is noted in blue and outlined in blue in the scheme) from an organism that do not contain the toxic character for this family. Gene names are indicated over the array of orthologs or within pentagons. Some paralogous genes are represented by one pentagon internally marked with the number of paralogs occurring in the species. Relevant pseudogenes are indicated with Ψ. Species were classified according to the following color code: green box, venomous snake; blue box, nonvenomous snake; orange box, nonsnake Squamata; and gray box, none of the above. Species codes and GenBank Genome ID or segment accession number are as follows: Bo.jara, Bothrops jararaca (this study); An.caro, Anolis carolinensis, ID: 708; Cr.adam, Crotalus adamanteus, PLA2 scaffold KX211996; Cr.atro, Crotalus atrox, PLA2 scaffold KX211994; Cr.scut, Crotalus scutulatus, ADAM28 scaffold MT032003.1; Cr.viri, Crotalus viridis, ID: 71654; Ga.gall, Gallus gallus, ID: 111; Ge.japo, Gekko japonicus, ID: 40475; La.chal, Latimeria chalumnae, ID: 3262; Mu.musc, Mus musculus, ID: 52; Op.hann, Ophiophagus hannah, ID: 10842; Po.mura, Podarcis muralis, ID: 8765; Po.vitt, Pogona vitticeps, ID: 7589; Pr.mucr, Protobothrops mucrosquamatus, ID: 18192; Ps.text, Pseudonaja textilis, ID: 72610; and Py.bivi, Python bivittatus, ID: 17893. For B. jararaca, the scheme is based on the combination of data gathered from the sequences obtained by the different strategies used in this work. CTLs and LAAO were not included in the figure since the B. jararaca scaffolds did not provide enough information to define the architecture of their loci.
For some toxin families (SVMPs and PLA2), it was possible to recognize clusters of venom genes adjacent to a related nonvenom paralog encoding a member of the same protein family that was not expressed in the VGs (red pentagons next to yellow ones in Fig. 2). In the case of PLA2, we inferred the typical gene clustering by considering that the loci sequences described in other Viperidae species contain more PLA2 genes (2, 10, 11, 13, 17, 20, 45), and we assumed the paralog flanking the venom gene as nonvenom because it was not detected in the VG transcriptome nor in the venom proteome of B. jararaca (SI Appendix, Table S1). However, the presence of only one venom PLA2 gene (and only one transcript) in the B. jararaca genome investigated here indicates secondary losses of other PLA2 genes in this specimen and, thus, intraspecific variability at this locus. It is worth mentioning that other transcripts encoding PLA2, including a K-49 PLA2, have been reported in B. jararaca VGs (7). Nevertheless, secondary losses of PLA2 genes have been well documented in rattlesnakes (17).
In other venom families (SVSP, HYAL, NGF, VEGF-F, PLB, BPP/CNP, and NUCL), the synteny of the loci indicates that these venom genes are located in the same position occupied by their putative orthologs in nonvenomous species (red pentagons aligned with yellow ones in Fig. 2). HYAL, despite occupying the position of a likely ortholog, is flanked by a gene encoding another hyaluronidase that is not expressed in the VGs, while the others (SVSP, NGF, VEGF-F, PLB, BPP/CNP, and NUCL) do not show any related nonvenom paralogs located nearby at the same locus. For CTL and LAAO, it is not possible to determine whether the corresponding gene positions in nonvenomous species are occupied by orthologs due to a lack of complete locus sequence, and for cysteine-rich secretory protein (CRISP), it is not clear if the adjacent related gene has a role in the venom.
With the above general overview of the venom genes from B. jararaca genome and the additional genomic components compiled from public databases and literature, it was possible to explore in more detail the origins and the evolution of specific toxins present in Viperidae venoms. Below, we describe our inferences for four toxin classes of B. jararaca venom (SVMP, SVSP, BPP/CNP, and VEGF-F), and we then discuss the general significance of these results for a broad understanding of snake venom evolution in general.
SVMP Genes Show a Discrepancy between Domain and Exon Losses during the P-III to P-II Transition.
The HA-WGS strategy of B. jararaca genome identified most of the SVMP genes, while the BAC-SeqSc approach recovered seven BAC clones (∼150 kb each) providing physical corroboration for some genes (Fig. 3A). In total, 20 different (<93% identity) P-III SVMP and seven different (<85% identity) P-II SVMP genes were identified. The BJARBC_30E11N1 scaffold, generated from a single BAC clone, contained one P-II gene (BJARBC_SVMP2_g07) followed by a downstream P-III gene (BJARBC_SVMP3_g03), providing physical evidence of the adjacent positioning of two SVMP classes at the locus. The SA-WGS contig BJARHA_S804283_A also exhibited P-II genes following a string of five P-III genes that are downstream of the flanking ADAM28 gene, which is considered the ancestral gene of all SVMPs (5, 21). Although we were unable to reconstruct the entire locus, the SA-WGS and BAC-SeqSc data together allowed us to infer that B. jararaca SVMP genes are organized in a large cluster containing multiple paralogs, starting with the ADAM28 gene (Fig. 3A), and this segment is likely flanked by STC1, NEFM, and NEFL genes, as observed in other species. Our results are in accordance with other work that have described a large tandem array of SVMP genes in snakes (11, 13, 14, 20, 21).
Fig. 3.
The SVMP gene structure and arrangement at the locus. (A) Architecture of the ADAM28 genomic locus in different vertebrates (not in scale). The putative SVMP ancestral gene ADAM28 (yellow arrow) and flanking genes (STC1, NEFM, and NEFL: white arrows) form a syntenic block among vertebrates. Orange and beige arrows represent ADAM family genes in humans (ADAMDEC1 and ADAM7). Red and pink arrows represent genes from the SVMP classes P-III and P-II, respectively. Solid lines are contiguous sequences, and dotted lines indicate uncertain order or no contiguity. Blue bars represent regions covered by BAC. (B) Schematic alignment of SVMP gene structures showing the conservation of exons (squares) between SVMP P-III and P-II. A short and a long deletion at exon 14 of SVMP P-II are marked. These deletions result in the loss of the Cys-rich domain and the shortening of the disintegrin-like sequence through the acquisition of a new stop codon (red star) preceding the original one (black star). (C) Details of the nucleotide alignment with the encoded amino acid residues between the two neighboring SVMP genes belonging to the P-III and P-II classes (BJARBC_SVMP3_g20 and BJARBC_SVMP2_g07, respectively) in the region between exons 14 and 17.
It is currently accepted that P-II class SVMPs arose once from a P-III class ancestor at the base of the Viperidae radiation, which may have occurred via gene duplication followed by domain loss (21, 46, 47). A more complete analysis of the exon/intron arrangement of the P-III and the P-II SVMP genes in B. jararaca highlighted some details about the initial process leading to the generation of P-II SVMPs. In particular, B. jararaca P-III SVMP genes have 17 exons (the CDS starts in exon 1 and ends in exon 17), similar to the ADAM28 gene (up to the Cys-rich domain) but differing from a P-III gene of Echis ocellatus (18) in which exons 4, 5, and 6 have merged into a single exon. B. jararaca P-II SVMP genes have 15 exons (the CDS starts in exon 1 and ends in exon 14) (Fig. 3B). The alignment of a P-III SVMP with a neighboring P-II SVMP gene found in the same BAC showed correspondence of the first 14 exons (Fig. 3B). However, we observed that the segment corresponding to part of the disintegrin domain and the entire Cys-rich domain, which was lost upon the P-III to P-II transition, unexpectedly starts in the middle of exon 14, expands to include the entire exon 15 (according to P-III numbering), and ends at the end of exon 16 (long deletion in Fig. 3 B and C). Therefore, the borders of the domains and the borders of the exons do not exactly match. This is in agreement with the recent observations by Giorgianni and colleagues showing that this deletion is conserved among all P-II SVMP genes in Crotalus atrox (21), further corroborating the hypothesis of a single origin of this class of SVMPs (46). However, a simple deletion of entire exons is not sufficient to generate the actual C-terminal region of a P-II SVMP. Instead, an intraexon event would be necessary to complete the deletion of the entire segment.
In fact, we noted another short deletion of 25 bp within exon 14 (short deletion in Fig. 3 B and C), which caused a frameshift leading to a premature stop codon. Without the acquisition of this stop codon, the simple deletion of the following exons would result in a dysfunctional C-terminal sequence of the protein, likely compromising its structure and function. This short deletion is part of the reason why the disintegrin domains present in P-II precursors are shorter than P-III disintegrin-like domains. The deletion caused the direct removal of eight amino acid residues, and the introduction of a stop codon immediately thereafter prevented the translation of the remaining part of the disintegrin-like domain. The 3′ UTR of a P-II transcript was consequently added, with a small extension of 65 bp (Fig. 3), but the remaining 3′ UTR was mostly unchanged, likely preserving regulatory sites of the mRNAs. Our results reinforce the idea that the evolution of SVMPs in Viperidae is intimately associated with intron/exon indels (47, 48) and provide a further explanation for the acquisition of the premature stop codon in P-II SVMPs.
SVSP Locus Shows Homology to Gilatoxins.
The B. jararaca genomic scaffolds containing SVSP genes revealed that these genes are clustered and organized in tandem (Fig. 2). We observed that some of the SVSP genes are preceded by genes of the cytochrome c oxidase 6B1 (cox 6B1) subunit or its pseudogenes, and that these cox 6B1 pseudogenes are present in the Protobothrops mucrosquamatus SVSP genes (National Center for Biotechnology Information [NCBI] genomic sequence: NW_015386730). This pattern suggests that SVSP duplications in snakes may have involved a genomic segment comprising these two genes and/or that cox 6B1 may have facilitated the SVSP duplication process.
A syntenic locus of the SVSP genes located between the flanking SCAN-domain containing protein-1 and RBM42 genes could be identified across different squamates (Fig. 2). While the locus exhibits essentially the same SVSP expansion in the Viperidae species P. mucrosquamatus as in B. jararaca, in the Elapidae species Pseudonaja textilis, there is no expansion of SVSPs (a single SVSP gene exists, and it exhibits low expression in the VGs). More interestingly, in Toxicofera lizards with sequenced genomes, this locus contains a serine proteinase gene referred to as gilatoxin-like gene. Gilatoxin is a serine proteinase that has been demonstrated to be a major component of the venom from Heloderma sp. and other venomous Anguimorpha lizards (49, 50). The identification of a conserved genomic position for the venom serine proteinases of snakes and lizards suggests possible orthology between them. This favors a possible single origin of this specific toxin class in Toxicofera (50) or at least indicates a single ancestral gene source, which could have been recruited one or more times during the evolution of snakes and lizards. Interestingly, the VT2R26 vomeronasal receptor gene has expanded in Anolis carolinensis (Fig. 2), suggesting that this locus is prone to gene duplication via unknown mechanisms.
BPPs Were Added to an Ancestral CNP Precursor.
The BPP/CNP precursor is one of the most abundant transcripts in B. jararaca VGs (SI Appendix, Table S1). The BPP/CNP gene, described here, is ∼11 kb in length and has two introns, but only one intron interrupts the CDS region (Fig. 4). The coding region of the precursor is encoded by two exons, while a third region comprises most of the 3′ UTR sequence. The first exon contains the signal peptide and the region with the BPP repeats, whereas the second exon contains the spacer and the CNP.
Fig. 4.
BPP/CNP gene structure. (A) Schematic alignment of BPP/CNP and CNP genes from different organisms emphasizing the correspondence of introns and exons, the conservation of domain structures, and the extension of exon 1 harboring the BPPs in Viperidae. Species are classified according to the following color code: green box, venomous snake; orange box, nonsnake Squamata; and gray box: none of the above. (B) Part of the BPP/CNP gene sequence from B. jararaca and its translation, showing that BPPs are restricted to exon 1.
The similarities between parts of the B. jararaca BPP/CNP precursor (51) and vertebrate CNP indicate that the first may have originated from the latter. However, the absence of BPPs and the presence of a C-terminal extension in the Elapidae natriuretic peptide precursor, resembling vertebrate brain natriuretic peptide, suggested a nonhomologous origin of these precursors in Viperidae and Elapidae (4). We subsequently found greater similarities of a CNP precursor from Dipsadidae with both Viperidae BPP/CNP and Elapidae natriuretic peptide precursors and hypothesized that all of these precursors were in fact orthologous and derived from a CNP gene (52).
The structure of the BPP/CNP gene identified in this study (Fig. 4) shows the same overall structure as the CNP gene of other vertebrates, and both genes occupy the same position in the locus (Fig. 2). Moreover, intron 1 is positioned just between the last BPP repeat and the beginning of the spacer, indicating that the divergent region (containing the BPPs) is restricted to the end of the first exon. BPP acquisition seems to have occurred as an extension of exon 1 and not via the insertion or shuffling of a new exon, since no such sequence has been identified in other genomes to our knowledge. By reviewing data from other available genomes, we could recognize the same extended first exon in the P. mucrosquamatus (Viperidae) BPP gene (Gene ID: 107296050). The corresponding exons in other squamates (A. carolinensis and Gekko japonicus) and in Homo sapiens encode only the signal peptide and a short prodomain. Therefore, BPPs indeed seem to have arisen over the CNP gene, apparently without any gene duplication, since we did not locate any paralog of the “endogenous” CNP gene in any of these genomes. Unfortunately, the lack of annotated genes for venom natriuretic peptide precursors in Elapidae and Dipsadidae prevents a more robust confirmation of the shared origin of venom CNP in these families with the Viperidae BPP/CNP.
VEGF-F Gene Locus Indicates a Non–VEGF-A Origin.
Snake venom vascular endothelial growth factor (svVEGF or VEGF-F) (53) is part of the VEGF superfamily of growth factors, which also includes placental growth factors (PGFs). The VEGF-F gene from Protobothrops (former Trimeresurus) flavoviridis was first amplified and sequenced in 2009, as was the endophysiological VEGF-A gene from the same species (16). Although these two genes are very different in size, they supposedly show some conservation of short segments of intronic sequences; therefore, it was hypothesized that VEGF-F could have originated from a duplication of VEGF-A followed by accelerated evolution (16). However, the genomic context of these genes could not be observed in that study.
Here, we identified a single gene encoding VEGF-F in B. jararaca (BAC clone BJARBC_02H08Ma1). In this scaffold, the VEGF-F gene is flanked by the downstream genes PPP1R13L and ERCC2 (Fig. 5 A, Left). Looking for this set of genes in other species, we observed a similar organization in the P. mucrosquamatus genome, including an upstream RTN2+FOSB gene block. A similar set of genes was found in more distantly related species of snakes and in other vertebrates. In the snakes P. textilis and Python bivittatus, the lizard Pogona vitticeps, and the amphibian Microcaecilia unicolor the “growth factor” gene positioned at this locus is named after VEGF-F–like, probably due to the high similarity of the encoded protein to the snake toxins, whereas in the coelacanth Latimeria chalumnae, it is named after “PGF-like” (placental growth factor-like) (Fig. 5A). We note that PGF is also a member of the VEGF family of growth factors.
Fig. 5.
(A) Architecture of the venom VEGF-F gene and nonvenom VEGF-A gene loci in synteny among different organisms. Red pentagon, VEGF-F toxin gene; yellow pentagon, PGF-like or VEGF-F–like genes; green pentagon, VEGF-A gene; and the white pentagon represents adjacent nonrelated genes. Dots at the end of solid lines indicate scaffold ends. Blue bar represents region covered by BAC. Gene representations are not to scale. Species were classified according to the following color code: green box, venomous snake; blue box, nonvenomous snake; orange box, nonsnake Squamata; and gray box, none of the above. (B) Summarized phylogenetic tree of the VEGF family of growth factor focusing the origin of VEGF-F (snake venom VEGFs) from the PGF-like/VEGF-F–like ortholog. The complete phylogenetic analysis is shown in SI Appendix, Fig. S4. (C) Schematic comparison of VEGF-F and VEGF-A genes in three Squamata, pointing out the levels of conservation throughout these genes. Percentage values on the right represent pairwise identity of CDS regions.
We also identified the B. jararaca VEGF-A gene and its flanking genes, based on which we retrieved VEGF-A loci from multiple species (Fig. 5 A, Right). As observed in Fig. 5A, the VEGF-A gene is placed in a completely different genomic context than VEGF-F. Therefore, there are two different loci, each of which is relatively conserved among the Chordata phylogeny, and they represent distinct genes encoding similar proteins belonging to the VEGF family. We performed a phylogenetic reconstruction of VEGF genes retrieved from both loci, as well as with other classes of VEGF from several vertebrates (Fig. 5B and SI Appendix, Fig. S4). There is a robust grouping of VEGF-F from Viperidae snakes nested within the VEGF-F–like/PGF-like clade of sequences from other Squamates and Chordata.
Given this scenario, we now suggest that the VEGF-F gene (the snake venom VEGF) is, in fact, an ortholog of the gene positioned at the same site in other organisms, sometimes annotated as “PGF-like,” and is not a result of a recent (after snake appearance) duplication and neofunctionalization of a VEGF-A (or other VEGF-like) gene positioned elsewhere. The PGF-like gene has likely been positioned in that locus since the time of the ancestral vertebrates, as shown by the conservation of the gene block in Latimeria, at the base of Sarcopterygii. The early origin of the whole ortholog group composed of VEGF-F/VEGF-F–like and PGF-like cannot be deduced from our phylogenetic analysis, but the fact that it is not nested within the robustly supported VEGF-A clade indicates the common ancestor of them preceded the appearance of snakes.
Indeed, more detailed observation of the structures of VEGF-A and VEGF-F (Fig. 5C) revealed that the two paralogs are very different in size, exon number, and show very low identity in the sequence composition of their noncoding regions. The coding regions of the two paralogs show some similarity (39 to 42% within the same species, Fig. 5C), while the conservation within each ortholog is higher, even for distantly related Squamata (e.g., 47% for VEGF-F and 87% form VEGF-A between B. jararaca and Pogona lizard, Fig. 5C). Our hypothesis is that in an ancestral Viperidae, the protein derived from the PGF-like/VEGF-F–like gene was co-opted without gene duplication to be a component of the venom and later it underwent a process of functional specialization. The resulting toxin is similar to the well-characterized VEGF-A, as well as to any VEGF family member, thus providing the suggestion for its name when it was discovered (53).
General Inferences About Toxin Gene Recruitment.
The genome of B. jararaca described here allowed us to identify syntenies between the gene arrays flanking the toxin genes and the respective genome segments in other organisms, thus enabling us to examine what kinds of genes are present in similar positions across venomous and nonvenomous species. This has permitted to infer if a related nontoxin gene was likely present at the locus in an ancestral snake and to check if this gene is present in extant snakes. The presence in an extant venomous snake of both the toxin gene and its related nontoxin paralog predicted to exist in the ancestral snake is suggestive of an ancestral duplication in the locus, whereas the presence of only the toxin gene at the position where the ancestral snake had a related nontoxin gene will indicate the absence of ancestral duplication in the locus. This locus structure-based approach represents an alternative way of tracking toxin gene origins that has been applied in some cases (2, 11, 17–19, 54, 55) and which is independent of the traditional method based on reconstructing the phylogeny of toxin sequences to infer the ortholog ancestral.
With respect to the whole set of 12 toxin families present in the Viperidae B. jararaca (Fig. 2), we identified the two scenarios above for the recruitment of ancestral nontoxin genes. They correspond to the mechanisms pointed out by Vonk et al. (2) based on the genomic organization of three toxin families from the Elapidae Ophiophagous hannah, referred to as “duplication of nontoxin genes” and “gene hijacking/modification.” These mechanisms relate to more general concepts in gene evolution, referred to as, respectively, “neofunctionalization model” (lato senso) and “gene co-option without duplication” or “moonlighting” (56–60), which have been considered for explaining the recruitment of snake toxins, for example, refs. 2, 3, 6, 11, 54, 61, 62. Our analyses indicate which mechanism likely occurred in nine out of the 12 toxin families present in B. jararaca and highlight that these mechanisms occurred mostly locally in the genome of ancestral venomous snake.
Under the duplication of nontoxin gene mechanism, genes have been recruited after the duplication of an ancestral gene existing within the locus, without the direct co-option of the original gene, to become a toxin. This is the case for highly abundant toxins such as SVMP and PLA2, for which a closely related nonvenom paralog is still present flanking the venom genes. In these cases, secondary rounds of copy expansion may have followed the initial duplication, allowing the neofunctionalization of specific toxin paralogs. SVMP is a clear example of this, since its multigene cluster starts just 3′ to the ADAM28 gene (Fig. 3). Likewise, in the case of PLA2, a non-VG–expressed PLA2 IIGc is present within the same locus (Fig. 2), although Jackson and Koludarov (54) considered a potential co-option of PLA2 to venom prior to duplication based on the very low expression of this gene in the VGs of some Crotalinae. Under the gene hijacking/modification mechanism, genes have been recruited by the direct specialization of ancestral genes with nonvenom functions, preexisting within the locus, into toxin genes. This is the case for SVSP, BPP/CNP, and the less abundant ancillary toxins HYAL, NGF, VEGF-F, PLB, and NUCL, whose genes are located at the same position putatively occupied by their ancestral genes and without a nearby closely related nonvenom paralog in the venomous species. VEGF-F is a good example of this process, as we demonstrated that it is placed neither near the VEGF-A gene nor at a random site, but it occupies the genomic position of a preexisting member of the VEGF family (Fig. 5). Our results are in agreement with what has been proposed based on the genomic context for the recruitment of SVMP (2, 11, 20, 21) and PLA2 (17) by ancestral gene duplication and for HYAL and PLB (2) by hijacking/modification, and we provide further support for the gene hijacking/modification mechanism by associating five other families (SVSP, NGF, VEGF-F, BPP/CNP, and NUCL) with this mechanism.
Interestingly, ancestral gene co-options occurring in the locus seem to explain the emergence of most toxin classes, according to our analysis. This is similar to what was observed for the origin of venom genes in parasitoid wasp (55), in which a minor part of the venom genes showed evidence of ancestral gene duplications whereas a greater number likely derived from single-copy ancestral genes. Nevertheless, we should consider that in snakes, most of these directly co-opted genes are minor venom components (except for SVSP and BPP/CNP) and do not represent the most prevalent toxins in Viperidae venoms (58). The ancestral gene duplication, generally considered to be the primary mechanism of toxin recruitment, continues to be supported in our analysis as underlying the recruitment of the highly abundant toxins (PLA2 and SVMP) in these venoms, which are likely the most relevant for venom function.
An intriguing question is how the snake deals with the loss of the functional products of the nonvenom genes directly co-opted to the venom arsenal. A clue to answering this question could be the fact that the majority of toxins arising from ancestral gene co-option belong to families with preexisting paralogs (clustered or spread in the genome) that may assume physiological functions once a member is diverted to venom. However, it seems likely that these preexisting paralogs will not produce proteins with exactly the same function and so any sort of loss will likely result in a shift in physiological function. Another possibility is that the gene products recruited had dual function (both endogenous and toxic), at least at the early stages of their recruitment. The dynamic nature of the toxin recruitment process, balancing gene products toward venom or nonvenom function, was already shown by the phylogenetic positioning of nonvenom proteins within toxin clades (63) and from the identification of basal levels of toxin expression in nonvenom organs of B. jararaca (7). This is suggestive that these animals may have tolerated some toxicity level in their bodies, favoring the possibility of a dual function of co-opted genes, although a significant presence of circulating toxins was not demonstrated in snakes. The exact mechanism allowing the start of the co-option process to venom, especially in the absence of gene duplication, is yet to be elucidated and it certainly has many nuances for each toxin family but ends either with an increase in gene expression in the VGs or a process of gradually restricting the expression to this tissue (64).
For the cases of SVMP and PLA2, however, early duplication events may have facilitated an escape from adaptive conflict (65), in which an ancestral gene is constrained to not specialize in an intrinsic secondary function due to the selective pressure on its primary function. A classic example is the vertebrate δ-crystalline eye protein, which is presumed to have been encoded by a single gene for arginosuccinate lyase in the common archosaur ancestor, then underwent an ancestral duplication followed by the loss of enzymatic activity and subsequent specialization to produce δ1, a crystalline structural protein. However, in the same eye system, α- and βγ-crystallines, thought to be derived from preexisting multiparalogue genes, are believed to have been recruited via a nonduplicative process from a chaperone gene (59). In fact, well-documented cases of gene neofunctionalization before or after ancient duplication are rare (66–68), and the venom genes addressed here in their orthologous context represent a system in which to explore such events in greater detail.
Independent of the initial recruitment process, the most abundant toxin families in Viperidae (SVMP, SVSP, PLA2, and CTL) are those that underwent more expansion at their loci, indicating that most relevant venom pathological effects are more closely associated with secondary expansion of relevant genes into multiple paralogs than with the type of initial recruitment. Since paralogs within each family are not exact copies but divergent genes known to be under accelerated evolution (12), it does not seem likely that pressure driving the accumulation of high levels of proteins in venom was the selective pressure underlying the expansion of these genes. A functional pressure driving the availability of diversified important gene products is more likely, perhaps for the fine-tuning of receptor interactions and prey specificity in different environments.
In conclusion, when the genomic landscape of toxin genes and venom loci in B. jararaca is considered in a comparative context with related organisms it demonstrates that the Viperidae venom arsenal was assembled from locally existing elements. More broadly, it illustrates how important it is to consider the genomic background from which innovation arises. We predict that additional venomous snake genomes will be critical for evaluating the generality of the mechanisms proposed for the evolution of this iconic example of a molecular adaptation.
Materials and Methods
Specimen Sampling.
An adult female individual of B. jararaca from Embu das Artes, São Paulo State, Brazil was used as a source of DNA, RNA, and venom. We followed protocol 1131/13 approved by the Committee of Ethics on the Use of Animals of the Butantan Institute. The specimen is registered in the Herpetological Collection of the Butantan Institute (IBSP84406).
Genome Sequencing and Toxin Gene Locus Identification.
The B. jararaca genome was assembled de novo using a hybrid approach (HA-WGS), utilizing both Pacific Biosciences long reads at 25× coverage (read N50 9,474 bp) and Illumina 100 bp paired-end data at around 60× coverage, employing MaSuRCA version 3.2.8 (69). The genome size of B. jararaca was determined using Jellyfish 2.2.3 (70) and the GenomeScope tool (71). The detailed sequencing and the complementary assemble strategy based on short reads (SA-WGS) are described in SI Appendix, Supplementary Methodology.
BAC-SeqSc was uniquely designed for this work (Fig. 1, Right), detailed in SI Appendix, Supplementary Methodology. It was based on the sequencing of pools of 12 BACs containing long genomic segments (150 to 250 kbp), screening them for the presence of toxin genes, and resequencing the selected BACs with high coverage.
All scaffolds generated via HA-WGS, SA-WGS, and BAC-SeqSc strategies were screened for segments matching toxin fragments through BLASTn searches using toxin sequences obtained from the de novo transcriptome as well as other B. jararaca toxin sequences available in GenBank as queries. Long reads were also screened to identify missing genes, providing additional data to manually link scaffolds and solve gene structures. The scaffolds containing toxin genes were manually annotated with CLC Genomics Workbench version 9 to 11 or Geneious version 10. UTRs and CDSs were annotated for each gene whenever possible, and automatically predicted exon/intron boundaries were manually checked for consistency following the AG-TC rule.
RNA sequencing (RNA-Seq).
Total RNA from the VG, gut, kidney, stomach, lung, heart, and brain was extracted with TRIzol (Thermo), and polyA+ RNAs were obtained via magnetic bead purification (DYNAbeads, Life Technologies). The mRNA concentration was estimated with a Quant-iT RiboGreen Kit (Invitrogen). Sequencing libraries were constructed using the TruSeq RNA Sample Prep Kit version 2 and sequenced on HiSeq1500 equipment (Illumina). We used the software Tophat2 (72) and bowtie2 (73) for genome read mapping. Trinity software (42) was used for de novo analysis and guided by the draft genome assembly. Detailed procedures of the RNA-seq analysis are provided in SI Appendix, Supplementary Methodology.
Venom Proteome Analysis and Toxin Identification.
The venom from the specimen described above was analyzed by protein tandem mass spectrometry using a bottom-up shotgun approach. Fresh venom was obtained through milking before VG extraction. Detailed procedures of the proteomic analysis are provided in SI Appendix, Supplementary Methodology.
Supplementary Material
Acknowledgments
This work was supported by the São Paulo Research Foundation (Grants 2013/07467-1, 2012/00177-5, and 2016/50127-5 to I.L.M.J.-d.-A.; 2013/07964-0 to D.D.A.; 2015/03509-7 to V.L.V.; and 2018/26520-4 to P.G.N.), the Conselho Nacional de Desenvolvimento Científico e Tecnológico (Grant 304532/20134 to I.L.M.J.-d.-A.), and NSF Grant Division of Environmental Biology 1638872 to H.L.G. We thank Dr. Jesus Ferro, Prof. Dr. Anete Pereira de Souza, and Dr. Danilo Augusto Sforça for helping with BAC library preparation, Dr. João Carlos Setubal and George Willian Condomitti, MSc for assistance with the initial bioinformatic analysis, and the technicians of the Next Generation Sequencing core at the Instituto Butantan.
Footnotes
The authors declare no competing interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2015159118/-/DCSupplemental.
Data Availability
Sequences data have been deposited in GenBank (PRJNA691605) (74) and a genome browsing tool is available at:http://cetics.butantan.gov.br/gb2/gbrowse/bothrops_jararaca/ (75).
References
- 1.Greene H. W., Fogden M., Fogden P., Snakes: The Evolution of Mystery in Nature (University of California Press, 1997). [Google Scholar]
- 2.Vonk F. J., et al., The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system. Proc. Natl. Acad. Sci. U.S.A. 110, 20651–20656 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Casewell N. R., Wüster W., Vonk F. J., Harrison R. A., Fry B. G., Complex cocktails: The evolutionary novelty of venoms. Trends Ecol. Evol. 28, 219–229 (2013). [DOI] [PubMed] [Google Scholar]
- 4.Fry B. G., From genome to “venome”: Molecular origin and evolution of the snake venom proteome inferred from phylogenetic analysis of toxin sequences and related body proteins. Genome Res. 15, 403–420 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Casewell N. R., On the ancestral recruitment of metalloproteinases into the venom of snakes. Toxicon 60, 449–454 (2012). [DOI] [PubMed] [Google Scholar]
- 6.Reyes-Velasco J., et al., Expression of venom gene homologs in diverse python tissues suggests a new model for the evolution of snake venom. Mol. Biol. Evol. 32, 173–183 (2015). [DOI] [PubMed] [Google Scholar]
- 7.Junqueira-de-Azevedo I. L. M., et al., Venom-related transcripts from Bothrops jararaca tissues provide novel molecular insights into the production and evolution of snake venom. Mol. Biol. Evol. 32, 754–766 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yin W., et al., Evolutionary trajectories of snake genes and genomes revealed by comparative analyses of five-pacer viper. Nat. Commun. 7, 13107 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Castoe T. A., et al., The Burmese python genome reveals the molecular basis for extreme adaptation in snakes. Proc. Natl. Acad. Sci. U.S.A. 110, 20645–20650 (2013).Corrected in: Proc. Natl. Acad. Sci. U.S.A.111, 3194 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Aird S. D., et al., Population genomic analysis of a pitviper reveals microevolutionary forces underlying venom chemistry. Genome Biol. Evol. 9, 2640–2649 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Suryamohan K., et al., The Indian cobra reference genome and transcriptome enables comprehensive identification of venom toxins. Nat. Genet. 52, 106–117 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Shibata H., et al., The habu genome reveals accelerated evolution of venom protein genes. Sci. Rep. 8, 11300 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Schield D. R., et al., The origins and evolution of chromosomes, dosage compensation, and mechanisms underlying venom regulation in snakes. Genome Res. 29, 590–601 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Margres M. J., et al., The Tiger Rattlesnake genome reveals a complex genotype underlying a simple venom phenotype. Proc. Natl. Acad. Sci. U.S.A. 118, e2014634118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Itoh N., et al., Organization of the gene for batroxobin, a thrombin-like snake venom enzyme. Homology with the trypsin/kallikrein gene family. J. Biol. Chem. 263, 7628–7631 (1988). [PubMed] [Google Scholar]
- 16.Yamazaki Y., et al., Snake venom Vascular Endothelial Growth Factors (VEGF-Fs) exclusively vary their structures and functions among species. J. Biol. Chem. 284, 9885–9891 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dowell N. L., et al., The deep origin and recent loss of venom toxin genes in rattlesnakes. Curr. Biol. 26, 2434–2445 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sanz L., Harrison R. A., Calvete J. J., First draft of the genomic organization of a PIII-SVMP gene. Toxicon 60, 455–469 (2012). [DOI] [PubMed] [Google Scholar]
- 19.Yamaguchi K., et al., The finding of a group IIE phospholipase A2 gene in a specified segment of Protobothrops flavoviridis genome and its possible evolutionary relationship to group IIA phospholipase A2 genes. Toxins (Basel) 6, 3471–3487 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Dowell N. L., et al., Extremely divergent haplotypes in two toxin gene complexes encode alternative venom types within rattlesnake species. Curr. Biol. 28, 1016–1026.e4 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Giorgianni M. W., et al., The origin and diversification of a novel protein family in venomous snakes. Proc. Natl. Acad. Sci. U.S.A. 117, 10911–10920 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kerkkamp H. M. I., et al., Snake genome sequencing: Results and future prospects. Toxins (Basel) 8, 360 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Valente R. H., et al., Bothrops jararaca accessory venom gland is an ancillary source of toxins to the snake. J. Proteomics 177, 137–147 (2018). [DOI] [PubMed] [Google Scholar]
- 24.Serrano S. M. T., Oliveira A. K., Menezes M. C., Zelanis A., The proteinase-rich proteome of Bothrops jararaca venom. Toxin Rev. 33, 169–184 (2014). [Google Scholar]
- 25.Augusto-de-Oliveira C., et al., Dynamic rearrangement in snake venom gland proteome: Insights into Bothrops jararaca intraspecific venom variation. J. Proteome Res. 15, 3752–3762 (2016). [DOI] [PubMed] [Google Scholar]
- 26.Fujimura Y., et al., Isolation and chemical characterization of two structurally and functionally distinct forms of botrocetin, the platelet coagglutinin isolated from the venom of Bothrops jararaca. Biochemistry 30, 1957–1964 (1991). [DOI] [PubMed] [Google Scholar]
- 27.Zingali R. B., Jandrot-Perrus M., Guillin M. C., Bon C., Bothrojaracin, a new thrombin inhibitor isolated from Bothrops jararaca venom: Characterization and mechanism of thrombin inhibition. Biochemistry 32, 10794–10802 (1993). [DOI] [PubMed] [Google Scholar]
- 28.Coelho A. L. J., et al., Effects of jarastatin, a novel snake venom disintegrin, on neutrophil migration and actin cytoskeleton dynamics. Exp. Cell Res. 251, 379–387 (1999). [DOI] [PubMed] [Google Scholar]
- 29.Serrano S. M. T., et al., A novel phospholipase A2, BJ-PLA2, from the venom of the snake Bothrops jararaca: Purification, primary structure analysis, and its characterization as a platelet-aggregation-inhibiting factor. Arch. Biochem. Biophys. 367, 26–32 (1999). [DOI] [PubMed] [Google Scholar]
- 30.Ianzer D., et al., Identification of five new bradykinin potentiating peptides (BPPs) from Bothrops jararaca crude venom by using electrospray ionization tandem mass spectrometry after a two-step liquid chromatography. Peptides 25, 1085–1092 (2004). [DOI] [PubMed] [Google Scholar]
- 31.Paine M. J. I., Desmond H. P., Theakston R. D. G., Crampton J. M., Purification, cloning, and molecular characterization of a high molecular weight hemorrhagic metalloprotease, jararhagin, from Bothrops jararaca venom. Insights into the disintegrin gene family. J. Biol. Chem. 267, 22869–22876 (1992). [PubMed] [Google Scholar]
- 32.Raudonat H. W., Rocha e Silva M., Separation of the Bradykinin Releasing Enzyme from the Clotting Factor in Venom from Bothrops Jararaca (Naunyn-Schmiedeberg’s Arch. für Exp. Pathol. und Pharmakologie, 1962). [DOI] [PubMed] [Google Scholar]
- 33.Ferreira S. H., Bartelt D. C., Greene L. J., Isolation of bradykinin-potentiating peptides from Bothrops jararaca venom. Biochemistry 9, 2583–2593 (1970). [DOI] [PubMed] [Google Scholar]
- 34.Cidade D. A. P., et al., Bothrops jararaca venom gland transcriptome: Analysis of the gene expression pattern. Toxicon 48, 437–461 (2006). [DOI] [PubMed] [Google Scholar]
- 35.Zelanis A., et al., Analysis of the ontogenetic variation in the venom proteome/peptidome of Bothrops jararaca reveals different strategies to deal with prey. J. Proteome Res. 9, 2278–2291 (2010). [DOI] [PubMed] [Google Scholar]
- 36.Dias G. S., et al., Individual variability in the venom proteome of juvenile Bothrops jararaca specimens. J. Proteome Res. 12, 4585–4598 (2013). [DOI] [PubMed] [Google Scholar]
- 37.Gonçalves-Machado L., et al., Combined venomics, venom gland transcriptomics, bioactivities, and antivenomics of two Bothrops jararaca populations from geographic isolated regions within the Brazilian Atlantic rainforest. J. Proteomics 135, 73–89 (2016). [DOI] [PubMed] [Google Scholar]
- 38.Castoe T. A., et al., Discovery of highly divergent repeat landscapes in snake genomes using high-throughput sequencing. Genome Biol. Evol. 3, 641–653 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Pasquesi G. I. M., et al., Squamate reptiles challenge paradigms of genomic repeat element evolution set by birds and mammals. Nat. Commun. 9, 2774 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Atkin N. B., Mattinson G., Beçak W., Ohno S., The comparative DNA content of 19 species of placental mammals, reptiles and birds. Chromosoma 17, 1–10 (1965). [DOI] [PubMed] [Google Scholar]
- 41.Simão F. A., Waterhouse R. M., Ioannidis P., Kriventseva E. V., Zdobnov E. M., BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015). [DOI] [PubMed] [Google Scholar]
- 42.Grabherr M. G., et al., Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Amazonas D. R., et al., Molecular mechanisms underlying intraspecific variation in snake venom. J. Proteomics 181, 60–72 (2018). [DOI] [PubMed] [Google Scholar]
- 44.Zelanis A., et al., A transcriptomic view of the proteome variability of newborn and adult Bothrops jararaca snake venoms. PLoS Negl. Trop. Dis. 6, e1554 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Koludarov I., et al., Reconstructing the evolutionary history of a functionally diverse gene family reveals complexity at the genetic origins of novelty bioRxiv [Preprint] (2020). 10.1101/583344. Accessed 10 February 2021. [DOI]
- 46.Casewell N. R., Wagstaff S. C., Harrison R. A., Renjifo C., Wüster W., Domain loss facilitates accelerated evolution and neofunctionalization of duplicate snake venom metalloproteinase toxin genes. Mol. Biol. Evol. 28, 2637–2649 (2011). [DOI] [PubMed] [Google Scholar]
- 47.Sanz L., Calvete J. J., Insights into the evolution of a snake venom multi-gene family from the genomic organization of Echis ocellatus SVMP genes. Toxins (Basel) 8, E216 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kini R. M., Accelerated evolution of toxin genes: Exonization and intronization in snake venom disintegrin/metalloprotease genes. Toxicon 148, 16–25 (2018). [DOI] [PubMed] [Google Scholar]
- 49.Utaisincharoen P., Mackessy S. P., Miller R. A., Tu A. T., Complete primary structure and biochemical properties of gilatoxin, a serine protease with kallikrein-like and angiotensin-degrading activities. J. Biol. Chem. 268, 21975–21983 (1993). [PubMed] [Google Scholar]
- 50.Fry B. G., et al., Early evolution of the venom system in lizards and snakes. Nature 439, 584–588 (2006). [DOI] [PubMed] [Google Scholar]
- 51.Murayama N., et al., Cloning and sequence analysis of a Bothrops jararaca cDNA encoding a precursor of seven bradykinin-potentiating peptides and a C-type natriuretic peptide. Proc. Natl. Acad. Sci. U.S.A. 94, 1189–1193 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Ching A. T. C., et al., Some aspects of the venom proteome of the Colubridae snake Philodryas olfersii revealed from a Duvernoy’s (venom) gland transcriptome. FEBS Lett. 580, 4417–4422 (2006). [DOI] [PubMed] [Google Scholar]
- 53.Junqueira-de-Azevedo I. L., Farsky S. H., Oliveira M. L., Ho P. L., Molecular cloning and expression of a functional snake venom vascular endothelium growth factor (VEGF) from the Bothrops insularis Pit Viper. J. Biol. Chem. 276, 39836–39842 (2001). [DOI] [PubMed] [Google Scholar]
- 54.Jackson T. N. W., Koludarov I., How the toxin got its toxicity. Front. Pharmacol. 11, 574925 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Martinson E. O., Mrinalini Y. D., Kelkar Y. D., Chang C. H., Werren J. H., The evolution of venom by co-option of single-copy genes. Curr. Biol. 27, 2007–2013.e8 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ohno S., Evolution by Gene Duplication (Springer Berlin Heidelberg, 1970). [Google Scholar]
- 57.Lynch M., Conery J. S., The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000). [DOI] [PubMed] [Google Scholar]
- 58.Tasoulis T., Isbister G. K., A review and database of snake venom proteomes. Toxins (Basel) 9, E290 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.True J. R., Carroll S. B., Gene co-option in physiological and morphological evolution. Annu. Rev. Cell Dev. Biol. 18, 53–80 (2002). [DOI] [PubMed] [Google Scholar]
- 60.Copley S. D., An evolutionary perspective on protein moonlighting. Biochem. Soc. Trans. 42, 1684–1691 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Hargreaves A. D., Swain M. T., Logan D. W., Mulley J. F., Testing the Toxicofera: Comparative transcriptomics casts doubt on the single, early evolution of the reptile venom system. Toxicon 92, 140–156 (2014). [DOI] [PubMed] [Google Scholar]
- 62.Bayona-Serrano J. D., et al., Replacement and parallel simplification of nonhomologous proteinases maintain venom phenotypes in rear-fanged snakes. Mol. Biol. Evol. 37, 3563–3575 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Casewell N. R., Huttley G. A., Wüster W., Dynamic evolution of venom proteins in squamate reptiles. Nat. Commun. 3, 1066 (2012). [DOI] [PubMed] [Google Scholar]
- 64.Hargreaves A. D., Swain M. T., Hegarty M. J., Logan D. W., Mulley J. F., Restriction and recruitment-gene duplication and the origin and evolution of snake venom toxins. Genome Biol. Evol. 6, 2088–2095 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Hughes A. L., The evolution of functionally novel proteins after gene duplication. Proc. Biol. Sci. 256, 119–124 (1994). [DOI] [PubMed] [Google Scholar]
- 66.Hittinger C. T., Carroll S. B., Gene duplication and the adaptive evolution of a classic genetic switch. Nature 449, 677–681 (2007). [DOI] [PubMed] [Google Scholar]
- 67.Des Marais D. L., Rausher M. D., Escape from adaptive conflict after duplication in an anthocyanin pathway gene. Nature 454, 762–765 (2008). [DOI] [PubMed] [Google Scholar]
- 68.Deng C., Cheng C.-H. C., Ye H., He X., Chen L., Evolution of an antifreeze protein by neofunctionalization under escape from adaptive conflict. Proc. Natl. Acad. Sci. U.S.A. 107, 21593–21598 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Zimin A. V., et al., The MaSuRCA genome assembler. Bioinformatics 29, 2669–2677 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Marcais G., Kingsford C., Jellyfish : A Fast K-Mer Counter (Tutorialis e Manuais, 2012). [Google Scholar]
- 71.Vurture G. W., et al., GenomeScope: Fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Kim D., et al., TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Langmead B., Salzberg S. L., Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Junqueira-de-Azevedo I. L. M., Genome sequencing and assembly, raw sequence reads, transcriptome or gene expression data deposited in GenBank database under Bioproject accession number PRJNA691605 https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA691605. Accessed 12 January 2021.
- 75.Junqueira-de-Azevedo I. L. M., Genome Browse for Bothrops jararaca genome http://cetics.butantan.gov.br/gb2/gbrowse/bothrops_jararaca/. Accessed 12 January 2021.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequences data have been deposited in GenBank (PRJNA691605) (74) and a genome browsing tool is available at:http://cetics.butantan.gov.br/gb2/gbrowse/bothrops_jararaca/ (75).