SUMMARY
The genetic origin of novel traits is a central but challenging puzzle in evolutionary biology. Among snakes, phospholipase A2 (PLA2)-related toxins have evolved in different lineages to function as potent neurotoxins, myotoxins, or hemotoxins. Here, we traced the genomic origin and evolution of PLA2 toxins by examining PLA2 gene number, organization, and expression in both neurotoxic and non-neurotoxic rattlesnakes. We found that even though most North American rattlesnakes do not produce neurotoxins, the genes of a specialized heterodimeric neurotoxin predate the origin of rattlesnakes and were present in their last common ancestor (~22 mya). The neurotoxin genes were then deleted independently in the lineages leading to the Western Diamondback (Crotalus atrox) and Eastern Diamondback (C. adamanteus) rattlesnakes (~6 mya), while a PLA2 myotoxin gene retained in C. atrox was deleted from the neurotoxic Mojave rattlesnake (C. scutulatus; ~4 mya). The rapid evolution of PLA2 gene number appears to be due to transposon invasion that provided a template for non-allelic homologous recombination.
INTRODUCTION
A central goal of evolutionary biology is to elucidate the origins of novelties—new traits that allow the performance of new functions and open up new ways of living [1]. Such traits may involve the emergence of new anatomical structures, new biochemical activities, or both. For example, among fish, the evolution of weight-bearing limbs and a neck enabled one group to leave the water and colonize the land [2], while the evolution of antifreeze proteins enabled another group, the notothenioids, to dominate the subfreezing waters of the Antarctic [3].
In order to understand the evolutionary paths to novelties, it is necessary to trace the origins of the genomic information that encode them. The central mechanistic issue is the degree to which novelties are the result of changes in gene number, sequence, or regulation. For morphological innovations, the weight of evidence suggests that new structures and pattern elements arise through changes in the regulation of highly conserved developmental regulatory proteins [4, 5]. But with respect to biochemical novelties, the requirements for and contributions of gene duplication, coding sequence evolution, or regulatory sequence evolution are not as clear [6–9].
Animal venoms offer some outstanding examples of biological novelty. Venoms evolved independently in numerous taxa (e.g., cone snails, spiders, bees, snakes, and the platypus), are delivered by an array of structures, and comprised of varying mixtures of prey-disabling polypeptides. Snake venoms in particular are well characterized because of their medical importance and relative abundance. Within snakes, different types of venoms have evolved. The elapids, which include cobras, kraits, and coral snakes, are primarily neurotoxic, whereas the viperids, which include adders, pit vipers, and pitless vipers, largely produce hemorrhagic and necrotizing venoms [10].
The evolutionary origins of snake venom proteins and the genetic bases of venom diversity are as yet not well understood [11, 12]. One general issue is to what degree differences among species are due to differences in gene number or gene regulation [13]. It is has been proposed, for example, that major differences in venom composition between related viperid species are due to transcriptional and posttranscriptional regulatory mechanisms [14], whereas other studies have asserted that gene duplication and divergence accounts for species differences [15, 16]. However, nearly all studies of snake venom diversity have been conducted without the benefit of genome sequences that are necessary to identify orthologous genes and to untangle the contributions of gene content and regulation. Only a single venomous snake genome, the king cobra (Ophiophagus hannah), has been analyzed to date [17].
Here, we have examined the genomic basis of toxin origins and venom diversity among one group of New World pit vipers, the rattlesnakes. While many North American species such as the large-bodied Western Diamondback (Crotalus atrox) and Eastern Diamondback (C. adamanteus) produce hemotoxic and myotoxic venoms that contain phospholipase A2 (PLA2) toxins [18, 19], the Mojave rattlesnake (C. scutulatus) as well as most Central and South American species produce a novel, potent heterodimeric presynaptic neurotoxin composed of one acidic and one basic PLA2 polypeptide chain [20, 21]. We show that the rattlesnake neurotoxin subunit genes arose from a group 2 PLA2 gene complex that expanded dramatically in the lineage leading to the crotalids and were present in the last common ancestor of rattlesnakes. Remarkably, we find that, despite their relatively recent divergence (4–7 million years ago [mya]) [22], each lineage has deleted three to four entire genes but retains and expresses a different subset of PLA2 genes. We propose that the rapid evolution of PLA2 gene number and venom diversity has been catalyzed by non-allelic homologous recombination (NAHR) seeded by conserved blocks of transposon repeats dispersed throughout the complex.
RESULTS
Closely Related Rattlesnakes Possess and Express Different Subsets of PLA2 Genes
Previous analyses of C. scutulatus, C. atrox, and C. adamanteus venoms have shown that members of the secreted PLA2 group 2 gene family are abundant components in each venom. However, the PLA2 proteins in these closely related species have distinct characteristics: C. scutulatus produces a neurotoxic heterodimer derived from a basic PLA2 and an acidic non-toxic PLA2, which is processed into three peptides [23, 24]; C. atrox produces three PLA2 molecules including a basic PLA2, a full-length acidic PLA2, and a myotoxic PLA2 with a Lys amino acid substitution at the catalytic Asp49 site [18]; and C. adamanteus produces one full-length acidic PLA2 [19].
The first question we sought to address through comparative genomics was whether these species possess different sets of genes or share the same set of PLA2 genes and regulate them differently to generate venom diversity. We employed a low-quality draft assembly of the C. atrox genome to design exon probes for PLA2 and flanking genes, used these probes to screen three whole-genome bacterial artificial chromosome (BAC) libraries (C. scutulatus, C. atrox, and C. adamanteus), and then isolated, sequenced, assembled, and annotated the positive BAC clones.
We located all described venom PLA2 genes in a cluster between two non-venom-expressed PLA2 genes (homologs of mammalian Pla2g2e and Pla2g2f) that are flanked, in turn, by two unrelated genes Otud3 and Mul1, respectively (Figure 1). The presence of a cluster of group 2 PLA2 genes flanked by Otud3 and Mul1 is a conserved feature of tetrapods. However, the complement of full-length venom PLA2 genes was different in each rattlesnake genome comprising five genes in C. scutulatus, four in C. atrox, and three in C. adamanteus (Figure 1).
Figure 1. The Different Structures and Expression of the PLA2 Gene Complexes of C. scutulatus, C. atrox, and C. adamanteus.

(A–C) The relative positions and orientations (denoted with arrowheads) of the C. scutulatus (A), C. atrox (B), and C. adamanteus (C) full-length PLA2 genes between the conserved Otud3 and Mul1 genes are indicated. The identities of each major expressed venom gland PLA2 gene are indicated above the schematic. Below each schematic are the strand-specific coverage plots for aligned venom gland RNA-seq reads that overlap exons. Exons for each gene (color matched) are shown between the coverage plots. The predicted translation products of the expressed gene models (mapped reads) are identical (100% amino acid identity) to proteins previously detected in venom. Pla2-gB2 encodes Mojave toxin subunit B (Mtx B); Pla2-gA2 encodes Mojave toxin subunit A (Mtx A); for other gene nomenclature, see text. Note that the complement of genes differs between species and that each species expresses different major transcripts in the venom gland. See also Figures S1 and S4 and Table S1.
We identified each gene using phylogenetic analysis of the predicted proteins (Figure S1A; Table S1). The internal cluster of PLA2 genes constitutes a monophyletic clade of group 2 PLA2-related genes that is unique to snakes and distinct from Pla2g2e and Pla2g2f. We will designate this group as Pla2g2g and use the shorter form Pla2-g in lieu of Pla2g2g. We identify each gene with the “Pla2-g” prefix followed by “A,” “B,” or “C” for acidic, basic, or non-venom-expressed PLA2 types and a number (1, 2) for the paralog group within each type.
Only two full-length genes are shared across all three species, Pla2-gA1 and Pla2-gC1 (Figures 1A–1C). The remaining genes are found in one or two species. Importantly, the genes encoding the acidic and basic subunits of the C. scutulatus Mojave neurotoxin, known as MtxA (Pla2-gA2) and MtxB (Pla2-gB2), are found only in C. scutulatus (Figure 1A); there is no trace of an ortholog in either C. atrox or C. adamanteus (Figures 1B and 1C).
Similarly, only C. atrox possesses the non-enzymatic Pla2-gK gene, a basic PLA2 gene with a characteristic aspartic acid to lysine substitution at residue 49 that is typical of group 2 phospholipases (Figure 1B). This substitution is found in a subfamily of viperid phospholipases and reduces calcium binding and catalytic activity on phospholipid substrates but may increase myotoxicity [25].
C. adamanteus has the smallest PLA2 cluster in which only the Pla2-gA1 and Pla2-gC1 genes are intact (Figure 1C). It does possess a Pla2-gB1 gene, but the locus contains a frameshift mutation due to a single nucleotide deletion in the second exon and is presumed to be a pseudogene. Interestingly, we identified one full-length non-venom PLA2 gene (Pla2-d) that, while present in most tetrapod genomes, is not present in either C. scutulatus or C. atrox.
Analysis of PLA2 gene expression in venom glands revealed that different PLA2 genes produce the major transcripts in each species. Transcription at the neurotoxic C. scutulatus PLA2 complex predominantly consists of Pla2-gB2 (Mtx-B) and Pla2-gA2 (Mtx-A), but not Pla2-gA1 (Figure 1A), whereas in C. atrox Pla2-gA1 and Pla2-gK are the major transcripts (Figure 1B), and in C. adamanteus only Pla2-gA1 is strongly expressed (Figure 1C). The distinct transcriptional signatures suggest that evolution of species-specific gene expression has evolved in conjunction with gene number to generate venom diversity.
All of the Pla2-g genes are composed of four coding exons of conserved lengths with highly similar intron sequences. Furthermore, the Pla2-g genes are tandemly arrayed with the basic members (B1, B2, C1, C2, K) on one strand and the acidic genes (A1 and A2) in the opposite orientation. The differences in gene content between species could be due to lineage-specific additions to the PLA2 complex and/or losses from a larger ancestral complex. To determine which might be the case, we sought to reconstruct the ancestral state of the PLA2 complex.
The Deep Origin and Recent Independent Losses of Neurotoxin and Myotoxin Genes
The three species studied here belong to a small clade that includes other neurotoxic and non-neurotoxic members (Figure 2; for details on the ascertainment of neurotoxicity, see Table S2). In principle, the presence of neurotoxins (or other PLA2 proteins) in certain species and absence from other species could be due to independent gains or to a single origin followed by loss(es). Based upon the topology of the species tree and the assumption of parsimony, the distribution of neurotoxins could be explained by either (1) one origin of neurotoxins followed by three independent losses (Figure S2A) or (2) two independent gains followed by two losses (Figure S2B). In order to identify which may be the case, we focused on the evolutionary relationships between neurotoxins.
Figure 2. The Most Recent Common Ancestor of Rattlesnakes Was Neurotoxic.

Trimmed species phylogeny with venom type represented as black (neurotoxic) or white (non-neurotoxic) boxes. Within this clade, the most parsimonious interpretation of the neurotoxic venom distribution is that the most recent common ancestor (MRCA) possessed neurotoxic venom and three lineages (x) have independently lost neurotoxicity. See also Figures S2–S4 and Tables S1 and S2.
The heterodimeric neurotoxin proteins from C. scutulatus and C. durissus terrificus are the best studied. Both phylogenetic (Figure S3; green clade “A2”) and structural evidence indicate that the proteins are orthologous. Specifically, the respective acidic and basic chains share sets of residues that have been implicated in heterodimer formation and function, as well as numerous additional residues that are not shared with other paralogs (Figures S5 and S6). We also note that the neurotoxic gA2 acidic polypeptides contain internal residues that are removed proteolytically (Figure S4); this processing is important for the folding of the A chain and heterodimer formation [23]. Such extensive sequence identity and shared structural features are unlikely to have evolved convergently. The most parsimonious inference explaining the existence of these highly similar heterodimeric neurotoxins is that the most recent common ancestor of C. scutulatus and C. durissus terrificus possessed these proteins.
This inference has important implications when considered in the context of the phylogenetic relationships between these two species and the other species studied here. Based upon the topology of the species tree [22] and the orthology of the C. scutulatus and C. durissus neurotoxins, the most parsimonious inference explaining the distribution of neurotoxic venom is that the most recent common ancestor of this clade possessed the neurotoxin (denoted most recent common ancestor [MRCA] in Figure 2). Thus, the absence of neurotoxins in C. atrox and C. adamanteus would be due to independent lineage-specific gene losses (denoted by Xs in Figure 2).
The ancestry of the neurotoxin appears, in fact, to be much deeper than this small clade. Heterodimeric neurotoxins and neurotoxic activity have been reported in species outside of this clade including C. horridus (Timber rattlesnake) and Sistrurus catenatus [26, 27] (Western massasauga rattlesnake), a member of a sister genus to Crotalus. In principle, these neurotoxins could represent independent evolutionary gains (Figures S2C and S2D). However, the subunits of these neurotoxins cluster phylogenetically with those from C. scutulatus and C. durissus terrificus (Figure S3; green clade “A2”), indicating that they are orthologs. The distribution of these orthologous neurotoxins is best explained by a deep origin, before the split between Crotalus and Sistrurus, followed by several independent losses (Figure S2C). Hence, we infer that the most recent common ancestor of Crotalus and Sistrurus—of all rattlesnakes—was neurotoxic.
The presence of the PLA2-gK protein in C. atrox is also due to retention of a gene that evolved deep within crotalinae (pit vipers) and that has been lost numerous times. Phylogenetic analysis indicates that the PLA2-gK protein of C. atrox is orthologous to PLA2-gK proteins found in other crotalid genera including Trimerusurus and Bothrops (Figure S3; red clade “K”). Likewise, the presence of the Pla2-d gene in C. adamanteus and tetrapods indicates a deep origin (Figure S1B; “PLA2GD” clade), and its absences in C. atrox and C. scutulatus are due to recent independent losses.
The deep origin of orthologs missing from certain species indicates that the ancestral rattlesnake PLA2 complex was larger than what is observed in living species. The ancestral PLA2 complex likely contained at least six venom PLA2 genes (Pla2-g members: A1, A2, B1, B2, C1, K), which has subsequently been reduced by whole gene deletions. This evolutionary scenario led us to investigate both how the larger complex evolved and why gene deletion has been so frequent.
The Neurotoxin Genes Arose via Duplication within an Expanding PLA2 Complex
Gene duplication has been proposed to explain the diversity of PLA2 proteins in venoms [28]; however, in the absence of comparative genomic information, it is difficult to determine the order of duplication events. In order to reconstruct the relative birth order of rattlesnake PLA2 genes, we looked for evidence of genomic duplications by self-aligning each locus and filtering for large (≥1 kilobase pairs), high identity (~90%) sequence pairs located between the flanking non-venom genes Otud3 and Mul1. We then used their relative evolutionary distance and protein phylogeny to infer the relative order of gene evolution. While pairwise evolutionary distance measures could be confounded by positive selection, we expect that only a small minority of sites would be subject to selection, with a resulting modest effect on the heterogeneity of evolutionary rates among duplicated regions. We identified several duplicated segments that shed light on the origin of neurotoxin subunits, the acidic and basic PLA2s, and the Pla2-g complex itself, respectively.
The duplication with the least pairwise sequence divergence (and presumably the most recent origin) was found in C. scutulatus and spans four Pla2 genes, comprised of two gene pairs in a head-to-head orientation (gC1::gA1 and gC2::gA2) (Figure 3A, SD14; Figure S5). In addition to the intact gC::gA PLA2 gene pairs, we also identified single orphaned exons from Pla2-e and Pla2-d between both of the gC::gA gene pairs (Figure S5). These single exon gene fragments are probably remnants of old genomic rearrangements. The conservation and synteny of these gene fragments indicate that the two gene gC∷gA unit of SD14 duplicated in one event in the rattlesnake lineage. With respect to the origin of neurotoxins, this duplication event ultimately gave rise to the distinct gA1 and gA2 (MtxA) paralogs. It also gave rise to the gC2, which combined with evidence above allows us to infer that the MRCA of all rattlesnakes possessed this seventh Pla2-g gene.
Figure 3. Venom Pla2 Genes Arose via Sequential Duplications.

(A–C) Blocks of high sequence identity that overlap the venom genes in C. scutulatus (A), C. adamanteus (B), and C. atrox (C) are shown as gray boxes linked by arcs above the PLA2 group 2 complex (drawn to scale). The duplication identifiers are to the left of each sequence pair (SD14, etc.). The duplications are ordered from top to bottom by the level of sequence divergence. Dashed arcs link duplicated regions on opposite strands. Solid arcs link duplicated regions on the same strand. (D) Model of the expansion of the PLA2 group 2 complex that gave rise to the inferred, ancestral rattlesnake complex with at least seven Pla2-g genes, based on the sequence divergence of specific duplicated regions and protein phylogeny. Genes are colored according to the naming convention of modern genes. Boxes on left are color matched to the duplicated genes. (i) The ancestral chromosome is inferred to have four distinct Pla2 group 2 family members, including a single Pla2-g gene, which is most similar to the gC gene in modern rattlesnakes. (ii) The earliest duplication gave rise to gC and the basic Pla2s (SD9, SD11, SD13). (iii) Tandem duplication of basic-Pla2s (SD7). (iv) Inversion of a basic Pla2 yields the head-to-head basic:acidic (gC∷gA) Pla2 unit (SD10, 27, 40, 42). (v) A large duplication of the gC∷gA unit expanded the Pla2 complex to seven genes (SD14).
See Figure S5 for evolutionary distances between duplicated sequences.
A different duplicated segment present in all three species provides evidence that the first acidic-Pla2 gene was derived from the duplication and inversion of a basic Pla2 gene. This duplication spans the genomic regions containing the Pla2-gB paralogs and the Pla2-gA1 gene (Figures 3A–3C; SD40, SD27, SD10). The greater evolutionary distance between these segments indicates that these events occurred prior to the duplication that gave rise to the second acidic Pla2 gene (SD14, see above). This inference regarding birth order is consistent with the protein phylogeny that also indicates that the acidic Pla2 genes derive from a basic Pla2 ancestor (Figure S3).
Among the three species analyzed here, only C. atrox possesses multiple venom-expressed basic PLA2 genes. The two C. atrox basic PLA2s (gB1 and gK) are included within another duplication (Figure 3B; SD7). Based on sequence divergence, it appears that the duplication of basic PLA2 genes preceded the emergence of acidic PLA2 genes (Figure S5).
The first step in the expansion of the Pla2-g complex appears to have been the origin of a second basic Pla2. This inference is supported by a duplication shared among all three species that includes the non-venom-expressed Pla2-gC1 homolog and a basic PLA2 (gB2, gK, gB1 in C. scutulatus, C. atrox, C. adamanteus, respectively) (Figures 3A–3C; SD13, SD9, SD11). These three duplications display the highest relative sequence divergence between pairs, indicating that they are the oldest and thus the first duplications in the complex.
To infer the relative order of events that generated the seven-gene ancestral PLA2 complex, we have combined the observed genetic distances of these duplicated regions with protein phylogenetic analysis. The most consistent scenario is that basic PLA2 genes emerged following a series of duplications of basic PLA2s with Pla2-gC retaining the most ancestral characteristics (Figure 3D, i). This process was followed by an inversion and the emergence of an acidic PLA2 (Figure 3D, ii). Most recently, the gC∷gA gene unit was duplicated, establishing the two gA paralogs found in rattlesnakes (Figure 3D, iv). We next sought to understand mechanisms that could be responsible for generating the tandem array of Pla2-g genes and their frequent subsequent deletion.
Independent Gene Gains and Losses Are Associated with Transposon-Rich Repeats
The most prevalent mechanism for generating a tandem array of duplicated sequences is NAHR [29]. NAHR requires sequence homology for pairing during recombination and results in both a duplicated region on one chromosome and deletion on the sister chromosome. We searched for sequence features across the PLA2 region that might explain how NAHR-mediated structural rearrangements could have generated the different PLA2 complexes.
We identified conserved blocks of sequence in intergenic regions between the Pla2-g genes in all rattlesnake species that could be substrates for NAHR (Figure 4A). The larger sequence blocks contain discrete clusters of identifiable transposable elements (TEs) (Figure 4A). In particular, one conserved cluster is composed of two class I retrotransposons (ERV1-10 and LINE CR1) and a class II DNA transposon (hAT) (Figure 4A). In all three species, this sequence block, spanning ~3.3 kb, is located between Pla2g2e and a basic PLA2 gene (Figure 4A). A similar TE cluster is also present in C. atrox between Pla2-gB1 and gK, and in C. scutulatus between the gC2∷gA2 and gC1∷gA1 gene pairs (Figure 4A). We also identified a similar but smaller (640–900 bp) conserved non-genic sequence block adjacent to Pla2-f; however, there are no identifiable TEs within this sequence (Figure 4A). C. scutulatus and C. atrox both lack Pla2-d, and it is possible that a cis-NAHR event between clusters of TEs in this region deleted the full-length gene. The conserved, high sequence identity of TE clusters located between the tandemly arrayed venom Pla2 genes suggests that these sequences have been a permissive substrate for gene duplication and deletion via NAHR.
Figure 4. Conserved, Dispersed Repetitive Sequence Blocks May Facilitate Gene Duplication and Deletion.

(A) All three rattlesnake species PLA2 complexes have conserved sequence blocks (boxes) that can include conserved clusters of transposable element (TE) sequences (white boxes, expanded below locus) and that share individual TE components, orientation and location. Gray boxes share high identity with gray regions of large conserved sequence blocks outside of the TE clusters. These elements may serve as permissive substrates for NAHR-mediated gene duplications and deletions. Pla2-g genes are depicted by arrows with color scheme, and abbreviations are the same as in Figure 1; the position of lineage-specific deletions is denoted by parentheses. TE abbreviations are as follows: ERV, endogenous retrovirus 1–10 Ami; hAT, DNA transposon; CR1, LINE CR1-1 element. (B) A rearranged C. atrox PLA2 complex with a duplicated gC1∷gA1 gene pair was discovered in one of four specimens analyzed. Orange bar indicates the novel sequence relative to the standard C.atrox chromosome. Four exons (white arrow) of an Otud3 pseudogene (ψ Otud3) are between gA1′ and gB1, while an intact Otud3 gene is still present downstream of Pla2-e.
Indeed, we found evidence that the number of genes within the Pla2 complex varies within species. In one of four individual C. atrox specimens we examined, we found a rearranged Pla2 region with two additional venom genes (Pla2-gC′ and Pla2-gA1′) and a similar second large sequence block adjacent to Pla2-e (Figure 4D). This novel haplotype raises the possibility that the highly similar TE clusters may be generating variation in gene numbers via NAHR in extant rattlesnake populations.
DISCUSSION
We have shown that the most recent common ancestor of rattlesnakes possessed a complex of at least seven Pla2-g genes and that different subsets of those genes have been deleted or retained in the species we examined. This scenario raises the questions of when the complex expanded in the course of snake evolution, and why toxin genes were subsequently lost.
The Origins of the Rattlesnake PLA2 Complex and Neurotoxin
To trace the genesis of the rattlesnake PLA2 complex, we examined the structure of the gene complex outside of rattlesnakes. We annotated the Otud3 to Mul1 region from the king cobra (O. hannah) genome, an elapid which shared a common ancestor with rattlesnakes about 60 mya [30]. The cobra has Pla2-d, Pla2-e, and Pla2-f genes that are syntenic with and orthologous to the rattlesnake genes, but we identified only a single syntenic Pla2-g gene in the cobra genome (Figure 5, top; Figure S1B). Importantly, phylogenetic analysis indicates that this cobra Pla2-g gene is most closely related to the rattlesnake Pla2-gC1 gene (Figures S1 and S2), which we have inferred to be the oldest gene in the rattlesnake complex. Since we found no evidence of duplication of Pla2-g genes, nor any conserved blocks of TE sequences or fragments of Pla2-e genes in the cobra genome, it appears plausible that the single Pla2-g gene in the king cobra represents a primitive state of the complex, and the expansion of the Pla2-g family occurred after the divergence of the viperid and elapid lineages (Figure 5, “PLA2-g duplication”).
Figure 5. The Deep Origin and Recent Independent Losses of Rattlesnake Venom Toxin Genes.

Model depicting the evolution of the PLA2 group 2G gene complex. Significant evolutionary events are mapped onto a simplified cladogram of venomous snakes, and the PLA2 complexes of various species are shown. The complex is inferred to have expanded from a single Pla2-g gene as is present today in the elapid O. hannah (top). The number of Pla2-g genes expanded in viperids, and some gained expression in venom. Further duplication and the differentiation of distinct Pla2-g types, including the components of the neurotoxic heterodimer (gA2 and gB2) and the Lys49 myotoxin (gK), occurred in the pit viper (Crotalinae) lineage (middle). The MRCA of rattlesnakes possessed at least seven Pla2-g genes, which has been reduced by unique lineage-specific losses (gray circles) in extant rattlesnakes (bottom, deleted genes denoted by faded colors in brackets). See also Figure S3 and Table S1.
The viperids have diverged into four subfamilies; rattlesnakes belong to the Crotalinae (pit vipers), a clade that includes both Old World and New World genera. It is estimated that an Asian ancestor gave rise to the New World clade ~22 mya [30] and that rattlesnakes evolved ~12–14 mya [22, 30]. In order to better delimit the evolution of the complex and the heterodimeric neurotoxin within pit vipers, we compared the inferred rattlesnake complex with PLA2 genes that have been characterized from Asian pit vipers. A sequence of part of the PLA2 complex of Protobothrops flavoviridis suggests the Pla2-g family expansion occurred early in the pit viper lineage [31]. Multiple Pla2-g genes including the gA, gB, and gK members are present in the P. flavoviridis PLA2 region, as are sequence blocks homologous to rattlesnake TEs (Figure 5, “PLA2-g expansion”).
However, the genes encoding the acidic and basic subunits of the heterodimeric neurotoxin are not present in P. flavoviridis, which taken alone could indicate that those genes were not present in the common ancestor of Protobothrops and Crotalus. However, it was recently reported that one species of Asian pit viper, Gloydius intermedius, produces a heterodimeric neurotoxin, the subunits of which are orthologous to those of rattlesnakes [32] (Figure 5, center). This would indicate that the neurotoxin arose after the divergence between pit vipers and so-called “true vipers” but before the split between Old World and New World pit vipers (Figure 5, “neurotoxic heterodimer”).
It has long been appreciated that the diversity of snake Pla2 toxins evolved by gene duplication and that some duplicates acquired novel activities through neofunctionalization [10, 13, 25, 28]. It has been difficult, however, to trace the specific origins of individual Pla2 venom proteins as we have done here. We suggest that this difficulty stems from a general reliance on venom protein sequences in the absence of genomic data. Because venom protein data would not include non-venom-expressed genes such as C1 (a key “parental” gene in the Pla2-g group), nor contain non-coding sequence data that can help to identify duplicated segments, nor information about gene synteny, the ability to trace history is limited. That limitation may be further compounded by the sorts of gene losses we have documented here which would be difficult to ascertain without genomic data in a phylogenetic context. Similarly, reliance on transcriptomic data without the benefit of genome sequence can lead to misinterpretations of gene family composition and expansion [19].
Diversity via Gene Loss
The most surprising finding from this study is that species-specific differences in rattlesnake venom PLA2 toxins are not due to new gene duplications, as might have been expected [33, 34], but are largely due to gene loss. The rattlesnake MRCA had a full complement of gA, gB, and gK genes and possessed a diverse set of potential toxins. The pharmacologic effects of the Pla2-gA2/gB2 neurotoxin are well characterized, but other rattlesnake venom PLA2 proteins (including PLA2-gA1, PLA2-gB1, and PLA2-gK) also possess important myotoxic and/or hemotoxic activities [35]. Each species examined here retained the Pla2-gA1 gene, but it is only expressed in C. atrox and C.adamanteus venom (Figure 1). The C. scutulatus lineage also retained the PLA2-gA2/gB2 neurotoxic heterodimer but lost the gB1 and gK myotoxin paralogs, whereas the C. atrox lineage retained gB1 and gK but lost the gA2/gB2 neurotoxin subunit genes. The C. adamanteus lineage lost the gB1 function (via pseudogenization) as well as the gK and gA2/gB2 genes (Figure 5, “lineage-specific gene loss”). All of these losses are irreversible changes, which prompts the following question: given the potency of venom toxins on prey, why would such biochemical weapons be abandoned? Gene loss or inactivation in other organisms—of globin genes in Antarctic fish [36], opsin genes in blind cavefish [37], olfactory genes in trichromatic primates [38]—are all correlated with shifts in ecological niches.
There is evidence that ecological shifts in snake diets shape venom composition [39]. In addition, there is prey-specific variation in rattlesnake venom toxicity [40] and prey that evolve resistance to particular venoms [41]. These kinds of observations prompted Lynch [28] to speculate that “dietary shifts after speciation runs the Pla2 gene repertoire through a ‘selective sieve”’ such that “gene[s] which are no longer effective in subduing prey species are lost.” We surmise that the loss of neurotoxin or myotoxin genes reflects just such dietary shifts in rattlesnake prey choice that, coupled with increased production of other kinds of toxins, made these PLA2 toxins dispensable.
EXPERIMENTAL PROCEDURES
Specimen Collection and Biological Sample Preparation
Adult female C. atrox and C. scutulatus animals housed in the serpentarium at the Texas A&M – Kingsville National Natural Toxins Research Center (NNTRC) were euthanized (IACUC approval # 2010-09-01A). Venom glands, blood, liver, kidney, heart, lung, and muscle tissue samples were dissected and snap frozen for preservation until nucleic acid extractions. The NNTRC also provided a blood sample from an adult C. adamanteus animal.
Genome Assembly
A C. atrox draft genome was assembled using overlapping paired-end (PE) (~180 base pair [bp] insert length) and jumping libraries (three and eight kilo-base pair [kb] insert lengths) and the ALLPATHS-LG [42] program. Any adaptor sequence and low quality bases at read ends were trimmed from the overlapping PE reads using Trimmomatic [43]. The jumping libraries (Nextera Mate Pair reads) were filtered for reads containing the presence of an external junction adaptor with Nextclip [44]. The ALLPATHS-LG “HAPLOIDIFY” option was set to “TRUE” and the “CLOSE_UNIPATH_GAPS” option set to “FALSE” to improve assembly contiguity.
We then made a BLAST [45] database with the draft assembly and queried the genome with venom PLA2 proteins present in databases. This identified a contig with multiple PLA2 hits that we roughly annotated (see below). This contig allowed us to design exon primers for venom Pla2 genes and non-venom-flanking genes. These primers were then used to make probes for screening the BAC libraries (Pla2-for: CCGAGAAAATCTGGACACGTA, Pla2-rev: AAGCCAATTGAGTGCAAAGC; Mul1-for: GAACACACATGGCCACACTC, Mul1-rev: ATCCACCAGAGGACGAACAC).
BAC Library Construction and Screening
Frozen liver tissue (C. atrox) or whole blood (C. scutulatus and C. adamanteus) were sent to Amplicon Express for high molecular weight genomic DNA extraction and library construction. The resulting genomic libraries consisted of ~73,000 arrayed clones (5–7X genome coverage) with each clone containing an insert length of 80–150 kb. The arrayed library clones were spotted on Hybond-N+ filter membranes for screening.
Radiolabeled (32P) probes designed to exonic sequences of venom and venom-flanking genes were hybridized to the filters [46]. Positive clones were picked from the library and streaked on plates, and single colonies were grown overnight at 37°C in 500 mL LB containing chloramphenicol. The BAC plasmid was induced to multi-copy with auto-induction solution (sterile solution of glucose [16.7%] and arabinose [3.33%]) for 6–8 hr before processing using the standard QIAGEN midi-prep protocol.
BAC Clone Library Preparation, Sequencing, and Assembly
The University of Michigan DNA sequencing core prepared the Pacific Biosciences sequencing libraries using 10 μg of BAC DNA according to the standard protocol with a size selection of large (>5,000 bp) DNA fragments. The library for single BAC clones was sequenced in a single SMRT cell. The raw reads were aligned to the E. coli genome using BLASR [47] to identify potential contaminating sequence. Reads aligning to the E. coli genome were excluded from the assembly workflow.
In order to assemble this data, we initiated an Amazon Web Services (AWS) extra-large compute instance and mounted the Pacific Biosciences Amazon Machine Image (AMI: smrtanalysis-2.3.0; ami-20fb4848). The raw reads that did not align to the E. coli genome were uploaded to the AWS instance and assembled using the accuracy optimized HGAP2 (Hierarchical Genome Assembly Protocol) algorithm [48].
Assembly Evaluation
First, potential contaminating bacterial contigs were identified using BLAST [45] and the NCBI bacterial genome database and removed from further analysis. Next the BAC vector (pCCBAC1) sequence and the BAC clone end-sequences were aligned using LAST [49] to the remaining contig. The vector sequence was removed from the contig, and the appropriate ends (non-overlapping) were stitched together to yield a single contig representing the BAC clone insert. The corrected reads were aligned to the vector sliced merged sequence with LAST [49]. The read alignments and coverage were inspected for approximate evenness across the contig.
We used an independent molecular biology approach involving restriction digest mapping to confirm the validity of the assembled contig and post-assembly processing steps. First, an in silico restriction enzyme (HpaI, Sca I, and BstZ17I with XhoI) digest of the assembled contig predicted a collection of DNA fragments spanning a broad size range. Next, the BAC clone was digested with the restriction enzymes, and the digest products were visualized on an ethidium bromide stained agarose gel. Then the predicted and observed DNA fragment sizes were compared. All of the clone assemblies presented here showed congruence between the in silico and physical restriction enzyme digest products.
Annotation of Venom Loci
For ab initio gene prediction, FGENESH was run on the raw sequence using chicken parameters [50]. The FGENESH output protein sequences were BLAST against the NCBI database of human and rattlesnake proteins to identify the candidate genes. This approach accurately identified exon coordinates and full-length proteins for most genes. Genes were identified as venom genes if the BLAST hit was to a known venom sequence and if phylogenetic analysis (see below) confirmed the BLAST result. However, computational annotation of venom genes near Pla2-e gene fragments (pseudogenes) merged venom and pseudogene models with the result being missing, frameshifted, and small (2 nt) exons. For these challenging gene models, accurate coordinate determination relied on manual annotation with the following pieces of evidence: BLAST hit coordinates using known venom proteins as query sequences and venom gland transcript alignments.
RNA Isolation, Sequencing, Assembly
Venom gland RNA was isolated using the standard Trizol method. 1 μg of total RNA was provided to the University of Wisconsin - Madison DNA sequencing core for strand-aware Illumina TruSeq RNA-sequencing library preparation. The sequencing libraries were created using size selected RNA (300–800 bp). The venom gland libraries were sequenced in a single HiSeq2500 lane for 2 × 150 cycles producing reads 150 nucleotides in length.
The RNA-seq reads were pre-processed by trimming both at the 5′ end (7 nt) and any Illumina adaptor sequence using Trimmomatic [43] (command options: ILLUMINACLIP: /path/to/adapterSeqs/TruSeq3-PE-2.fa:2:30:12 HEAD-CROP:7 MINLEN:50). The trimmed reads were assembled with Trinity [51].
Processed RNA reads were aligned with Bowtie2 [52] (command option: –b2-very-sensitive) to the respective contigs to get a qualitative view of expressed Pla2 genes and visualized using Gviz [53]. We considered a venom gene to be expressed in venom if reads aligned to an annotated venom gene for which proteins have previously been detected in venom.
Identification of Duplicated Sequences
To identify structural variants (insertions, deletions, inversions, duplications), the contigs were aligned to self with LASTZ [54] (command options: –self–gfextend–chain–gapped–format = mapping). The alignment results were filtered for regions ≥1,000 nt and >85% nucleotide identity. The duplicated regions were plotted to scale with the gene models using genoplotR [55].
Evolutionary Distance between Duplicated Regions
Pairwise alignments of the duplicated regions were performed in MUSCLE [56]. Alignment gaps were removed and the evolutionary distance calculated under multiple models in the Ape R package [57]. The relative distance measures are robust to model choice with and without gamma correction.
TE Annotation
TEs were identified using the Repeatmasker web server (Smit, A.F.A., Hubley, R., and Green, P. RepeatMasker Open-4.0. 2013–2015. http://www.repeatmasker.org) (version: open-4.0.5, RMLib: 20140131, Dfam: 1.3). We used the cross_match search engine run in slow mode (higher sensitivity).
Phylogenetic Analysis
Sequences used in our protein phylogenies were pulled from hypothetical translation of genomic regions or from the UniProt and NCBI databases. See Table S1 for a complete list of sequences and GenBank accession numbers. In addition to our BAC data, genomic regions from python (P. bivitattus), cobra (O. hannah), and the habu snake (P. flavoviridis) were analyzed to generate Pla2 gene models [17, 31, 58]. To construct protein phylogenies, we first performed multiple protein sequence alignment with the program MUSCLE [56]. Phylogenetic trees were then calculated using maximum likelihood (phyML) [59] in the program SeaView [60] (v.4.4.2). Bootstrap analysis to assess node support was based on 1,000 replicate trees. Trees were then formatted with FigTree (v.1.4.0; http://tree.bio.ed.ac.uk/software/Figuretree/). The protein sequence alignments (and sequences) have been deposited as part of a Figshare project and are available at the following http://dx.doi.org/10.6084/m9.figshare.3467768.
Supplementary Material
Highlights.
Contrary to assumptions, the most recent common rattlesnake ancestor was neurotoxic
Gene number in the phospholipase A2 complex is evolutionarily dynamic
Neurotoxin and myotoxin genes were lost independently in different lineages
Venom diversity is due to both gene duplication and gene loss
Acknowledgments
We thank Mark Hockmuller for expert care and handling of the venomous snakes; Marie Adams at the University of Wisconsin - Madison DNA sequencing core for technical advice; the University of Michigan DNA sequencing facility for technical assistance; and past and present S.B.C. lab members for stimulating discussions and comments. This work was supported by the Howard Hughes Medical Institute (S.B.C.) and by the Office of the Director, NIH award number P40OD010960 (E.E.S.).
Footnotes
ACCESSION NUMBERS
The accession numbers for the C. scutulatus clone 102I5 assembly and raw reads reported in this paper are Genbank: KX211993 and SRA: SRR3478362, respectively. The accession numbers for the C. atrox clone 152I6 assembly and raw reads reported in this paper are Genbank: KX211994 and SRA: SRR3478363, respectively. The accession numbers for the C. atrox clone 91J7 assembly and raw reads reported in this paper are Genbank: KX211995 and SRA: SRR3478364, respectively. The accession numbers for the C. adamanteus clone 29M24 assembly and raw reads reported in this paper are Genbank: KX211996 and SRA: SRR3478365, respectively. The accession number for the C. scutulatus venom gland RNA-seq experiments reported in this paper is SRA: SRR3478366. The accession number for the C. atrox venom gland RNA-seq experiments reported in this paper is SRA: SRR3478367.
AUTHOR CONTRIBUTIONS
N.L.D., M.W.G., and S.B.C. conceived and designed experiments. N.L.D., M.W.G., V.A.K., J.E.S., and E.E.S. performed research and contributed animal specimens. N.L.D., M.W.G., and S.B.C. analyzed data and wrote the paper. N.L.D. and M.W.G. contributed equally to this work.
SUPPLEMENTAL INFORMATION
Supplemental Information includes five figures and two tables and can be found with this article online at http://dx.doi.org/10.1016/j.cub.2016.07.038.
References
- 1.Mayr E. Animal Species and Evolution. Cambridge: Belknap Press of Harvard University Press; 1963. [Google Scholar]
- 2.Daeschler EB, Shubin NH, Jenkins FA., Jr A Devonian tetrapod-like fish and the evolution of the tetrapod body plan. Nature. 2006;440:757–763. doi: 10.1038/nature04639. [DOI] [PubMed] [Google Scholar]
- 3.Chen L, DeVries AL, Cheng CH. Evolution of antifreeze glycoprotein gene from a trypsinogen gene in Antarctic notothenioid fish. Proc Natl Acad Sci USA. 1997;94:3811–3816. doi: 10.1073/pnas.94.8.3811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Carroll SB. Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell. 2008;134:25–36. doi: 10.1016/j.cell.2008.06.030. [DOI] [PubMed] [Google Scholar]
- 5.Wagner GP, Lynch VJ. Evolutionary novelties. Curr Biol. 2010;20:R48–R52. doi: 10.1016/j.cub.2009.11.010. [DOI] [PubMed] [Google Scholar]
- 6.Bergthorsson U, Andersson DI, Roth JR. Ohno’s dilemma: evolution of new genes under continuous selection. Proc Natl Acad Sci USA. 2007;104:17004–17009. doi: 10.1073/pnas.0707158104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Conant GC, Wolfe KH. Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet. 2008;9:938–950. doi: 10.1038/nrg2482. [DOI] [PubMed] [Google Scholar]
- 8.Soskine M, Tawfik DS. Mutational effects and the evolution of new protein functions. Nat Rev Genet. 2010;11:572–582. doi: 10.1038/nrg2808. [DOI] [PubMed] [Google Scholar]
- 9.Harms MJ, Thornton JW. Evolutionary biochemistry: revealing the historical and physical causes of protein properties. Nat Rev Genet. 2013;14:559–571. doi: 10.1038/nrg3540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Fry BG, Scheib H, van der Weerd L, Young B, McNaughtan J, Ramjan SFR, Vidal N, Poelmann RE, Norman JA. Evolution of an arsenal: structural and functional diversification of the venom system in the advanced snakes (Caenophidia) Mol Cell Proteomics. 2008;7:215–246. doi: 10.1074/mcp.M700094-MCP200. [DOI] [PubMed] [Google Scholar]
- 11.Fry BG, Casewell NR, Wüster W, Vidal N, Young B, Jackson TNW. The structural and functional diversification of the Toxicofera reptile venom system. Toxicon. 2012;60:434–448. doi: 10.1016/j.toxicon.2012.02.013. [DOI] [PubMed] [Google Scholar]
- 12.Hargreaves AD, Swain MT, Logan DW, Mulley JF. Testing the Toxicofera: comparative transcriptomics casts doubt on the single, early evolution of the reptile venom system. Toxicon. 2014;92:140–156. doi: 10.1016/j.toxicon.2014.10.004. [DOI] [PubMed] [Google Scholar]
- 13.Hargreaves AD, Swain MT, Hegarty MJ, Logan DW, Mulley JF. Restriction and recruitment-gene duplication and the origin and evolution of snake venom toxins. Genome Biol Evol. 2014;6:2088–2095. doi: 10.1093/gbe/evu166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Casewell NR, Wagstaff SC, Wüster W, Cook DAN, Bolton FMS, King SI, Pla D, Sanz L, Calvete JJ, Harrison RA. Medically important differences in snake venom composition are dictated by distinct postgenomic mechanisms. Proc Natl Acad Sci USA. 2014;111:9205–9210. doi: 10.1073/pnas.1405484111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rokyta DR, Wray KP, Lemmon AR, Lemmon EM, Caudle SB. A high-throughput venom-gland transcriptome for the Eastern Diamondback Rattlesnake (Crotalus adamanteus) and evidence for pervasive positive selection across toxin classes. Toxicon. 2011;57:657–671. doi: 10.1016/j.toxicon.2011.01.008. [DOI] [PubMed] [Google Scholar]
- 16.Malhotra A, Creer S, Pook CE, Thorpe RS. Inclusion of nuclear intron sequence data helps to identify the Asian sister group of New World pitvipers. Mol Phylogenet Evol. 2010;54:172–178. doi: 10.1016/j.ympev.2009.09.007. [DOI] [PubMed] [Google Scholar]
- 17.Vonk FJ, Casewell NR, Henkel CV, Heimberg AM, Jansen HJ, McCleary RJR, Kerkkamp HME, Vos RA, Guerreiro I, Calvete JJ, et al. The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system. Proc Natl Acad Sci USA. 2013;110:20651–20656. doi: 10.1073/pnas.1314702110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Calvete JJ, Fasoli E, Sanz L, Boschetti E, Righetti PG. Exploring the venom proteome of the western diamondback rattlesnake, Crotalus atrox, via snake venomics and combinatorial peptide ligand library approaches. J Proteome Res. 2009;8:3055–3067. doi: 10.1021/pr900249q. [DOI] [PubMed] [Google Scholar]
- 19.Margres MJ, McGivern JJ, Wray KP, Seavy M, Calvin K, Rokyta DR. Linking the transcriptome and proteome to characterize the venom of the eastern diamondback rattlesnake (Crotalus adamanteus) J Proteomics. 2014;96:145–158. doi: 10.1016/j.jprot.2013.11.001. [DOI] [PubMed] [Google Scholar]
- 20.Sampaio SC, Hyslop S, Fontes MRM, Prado-Franceschi J, Zambelli VO, Magro AJ, Brigatte P, Gutierrez VP, Cury Y. Crotoxin: novel activities for a classic β-neurotoxin. Toxicon. 2010;55:1045–1060. doi: 10.1016/j.toxicon.2010.01.011. [DOI] [PubMed] [Google Scholar]
- 21.John TR, Smith LA, Kaiser II. Genomic sequences encoding the acidic and basic subunits of Mojave toxin: unusually high sequence identity of non-coding regions. Gene. 1994;139:229–234. doi: 10.1016/0378-1119(94)90761-7. [DOI] [PubMed] [Google Scholar]
- 22.Hendry CR, Guiher TJ, Pyron RA. Ecological divergence and sexual selection drive sexual size dimorphism in New World pitvipers (Serpentes: Viperidae) J Evol Biol. 2014;27:760–771. doi: 10.1111/jeb.12349. [DOI] [PubMed] [Google Scholar]
- 23.Faure G, Xu H, Saul FA. Crystal structure of crotoxin reveals key residues involved in the stability and toxicity of this potent heterodimeric β-neurotoxin. J Mol Biol. 2011;412:176–191. doi: 10.1016/j.jmb.2011.07.027. [DOI] [PubMed] [Google Scholar]
- 24.Massey DJ, Calvete JJ, Sánchez EE, Sanz L, Richards K, Curtis R, Boesen K. Venom variability and envenoming severity outcomes of the Crotalus scutulatus scutulatus (Mojave rattlesnake) from Southern Arizona. J Proteomics. 2012;75:2576–2587. doi: 10.1016/j.jprot.2012.02.035. [DOI] [PubMed] [Google Scholar]
- 25.Lomonte B, Rangel J. Snake venom Lys49 myotoxins: from phospholipases A(2) to non-enzymatic membrane disruptors. Toxicon. 2012;60:520–530. doi: 10.1016/j.toxicon.2012.02.007. [DOI] [PubMed] [Google Scholar]
- 26.Glenn JL, Straight RC, Wolt TB. Regional variation in the presence of canebrake toxin in Crotalus horridus venom. Comp Biochem Physiol Pharmacol Toxicol Endocrinol. 1994;107:337–346. doi: 10.1016/1367-8280(94)90059-0. [DOI] [PubMed] [Google Scholar]
- 27.Sanz L, Gibbs HL, Mackessy SP, Calvete JJ. Venom proteomes of closely related Sistrurus rattlesnakes with divergent diets. J Proteome Res. 2006;5:2098–2112. doi: 10.1021/pr0602500. [DOI] [PubMed] [Google Scholar]
- 28.Lynch VJ. Inventing an arsenal: adaptive evolution and neofunctionalization of snake venom phospholipase A2 genes. BMC Evol Biol. 2007;7:2. doi: 10.1186/1471-2148-7-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Gu W, Zhang F, Lupski JR. Mechanisms for human genomic rearrangements. PathoGenetics. 2008;1:4. doi: 10.1186/1755-8417-1-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wüster W, Peppin L, Pook CE, Walker DE. A nesting of vipers: phylogeny and historical biogeography of the Viperidae (Squamata: Serpentes) Mol Phylogenet Evol. 2008;49:445–459. doi: 10.1016/j.ympev.2008.08.019. [DOI] [PubMed] [Google Scholar]
- 31.Ikeda N, Chijiwa T, Matsubara K, Oda-Ueda N, Hattori S, Matsuda Y, Ohno M. Unique structural characteristics and evolution of a cluster of venom phospholipase A2 isozyme genes of Protobothrops flavoviridis snake. Gene. 2010;461:15–25. doi: 10.1016/j.gene.2010.04.001. [DOI] [PubMed] [Google Scholar]
- 32.Yang ZM, Guo Q, Ma ZR, Chen Y, Wang ZZ, Wang XM, Wang YM, Tsai IH. Structures and functions of crotoxin-like heterodimers and acidic phospholipases A2 from Gloydius intermedius venom: insights into the origin of neurotoxic-type rattlesnakes. J Proteomics. 2015;112:210–223. doi: 10.1016/j.jprot.2014.09.009. [DOI] [PubMed] [Google Scholar]
- 33.Malhotra A, Creer S, Harris JB, Thorpe RS. The importance of being genomic: Non-coding and coding sequences suggest different models of toxin multi-gene family evolution. Toxicon. 2015;107(Pt B):344–358. doi: 10.1016/j.toxicon.2015.08.009. [DOI] [PubMed] [Google Scholar]
- 34.Rokyta DR, Wray KP, Margres MJ. The genesis of an exceptionally lethal venom in the timber rattlesnake (Crotalus horridus) revealed through comparative venom-gland transcriptomics. BMC Genomics. 2013;14:394. doi: 10.1186/1471-2164-14-394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kini RM. Excitement ahead: structure, function and mechanism of snake venom phospholipase A2 enzymes. Toxicon. 2003;42:827–840. doi: 10.1016/j.toxicon.2003.11.002. [DOI] [PubMed] [Google Scholar]
- 36.Cocca E, Ratnayake-Lecamwasam M, Parker SK, Camardella L, Ciaramella M, di Prisco G, Detrich HW., 3rd Genomic remnants of alpha-globin genes in the hemoglobinless antarctic icefishes. Proc Natl Acad Sci USA. 1995;92:1817–1821. doi: 10.1073/pnas.92.6.1817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yang J, Chen X, Bai J, Fang D, Qiu Y, Jiang W, Yuan H, Bian C, Lu J, He S, et al. The Sinocyclocheilus cavefish genome provides insights into cave adaptation. BMC Biol. 2016;14:1. doi: 10.1186/s12915-015-0223-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Gilad Y, Wiebe V, Przeworski M, Lancet D, Pääbo S. Loss of olfactory receptor genes coincides with the acquisition of full trichromatic vision in primates. PLoS Biol. 2004;2:e5. doi: 10.1371/journal.pbio.0020005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Daltry JC, Wüster W, Thorpe RS. Diet and snake venom evolution. Nature. 1996;379:537–540. doi: 10.1038/379537a0. [DOI] [PubMed] [Google Scholar]
- 40.Gibbs HL, Mackessy SP. Functional basis of a molecular adaptation: prey-specific toxic effects of venom from Sistrurus rattlesnakes. Toxicon. 2009;53:672–679. doi: 10.1016/j.toxicon.2009.01.034. [DOI] [PubMed] [Google Scholar]
- 41.Biardi JE, Coss RG. Rock squirrel (Spermophilus variegatus) blood sera affects proteolytic and hemolytic activities of rattlesnake venoms. Toxicon. 2011;57:323–331. doi: 10.1016/j.toxicon.2010.12.011. [DOI] [PubMed] [Google Scholar]
- 42.Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA. 2011;108:1513–1518. doi: 10.1073/pnas.1017351108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Leggett RM, Clavijo BJ, Clissold L, Clark MD, Caccamo M. NextClip: an analysis and read preparation tool for Nextera Long Mate Pair libraries. Bioinformatics. 2014;30:566–568. doi: 10.1093/bioinformatics/btt702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Sambrook J, Fritsch EF, Maniatis T. Molecular Cloning: A Laboratory Manual. Second. Cold Spring Harbor Laboratory Press; 1989. [Google Scholar]
- 47.Chaisson MJ, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics. 2012;13:238. doi: 10.1186/1471-2105-13-238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10:563–569. doi: 10.1038/nmeth.2474. [DOI] [PubMed] [Google Scholar]
- 49.Kie1basa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011;21:487–493. doi: 10.1101/gr.113985.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Solovyev V, Kosarev P, Seledsov I, Vorobyev D. Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 2006;7(Suppl 1):1–12. doi: 10.1186/gb-2006-7-s1-s10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Hahne F, Durinck S, Ivanek R, Mueller A, Lianoglou S, Tan G, Parsons L, Pai S. Gviz: Plotting data and annotation information along genomic coordinates (R package version 0.99.8) [Google Scholar]
- 54.Harris RS. PhD thesis. The Pennsylvania State University; 2007. Improved pairwise alignment of genomic DNA. [Google Scholar]
- 55.Guy L, Kultima JR, Andersson SGE. genoPlotR: comparative gene and genome visualization in R. Bioinformatics. 2010;26:2334–2335. doi: 10.1093/bioinformatics/btq413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Paradis E, Claude J, Strimmer K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics. 2004;20:289–290. doi: 10.1093/bioinformatics/btg412. [DOI] [PubMed] [Google Scholar]
- 58.Castoe TA, de Koning APJ, Hall KT, Card DC, Schield DR, Fujita MK, Ruggiero RP, Degner JF, Daza JM, Gu W, et al. The Burmese python genome reveals the molecular basis for extreme adaptation in snakes. Proc Natl Acad Sci USA. 2013;110:20645–20650. doi: 10.1073/pnas.1314475110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59:307–321. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]
- 60.Gouy M, Guindon S, Gascuel O. SeaView version 4: a multi-platform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol. 2010;27:221–224. doi: 10.1093/molbev/msp259. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
