Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2020 May 12;117(21):11399–11408. doi: 10.1073/pnas.1914536117

Structural venomics reveals evolution of a complex venom by duplication and diversification of an ancient peptide-encoding gene

Sandy S Pineda a,b,1,2,3,4, Yanni K-Y Chin a,c,1, Eivind A B Undheim c,d,e, Sebastian Senff a, Mehdi Mobli c, Claire Dauly f, Pierre Escoubas g, Graham M Nicholson h, Quentin Kaas a, Shaodong Guo a, Volker Herzig a,5, John S Mattick b,6, Glenn F King a,2
PMCID: PMC7260951  PMID: 32398368

Significance

The venom of the Australian funnel-web spider is one of the most complex chemical arsenals in the natural world, comprising thousands of peptide toxins. These toxins have a diverse range of pharmacological activities and vary in size from short (3 to 4 kDa) to long (8 to 9 kDa). It is unclear how spiders evolved such complex venoms and whether there is an evolutionary relationship between short and long toxins. Here, we introduce a “structural venomics” approach to show that the venom of Australian funnel-web spiders evolved primarily by duplication and elaboration of a single ancestral knottin gene; short toxins are simple knottins whereas most long toxins are either highly elaborated single-domain knottins or double-knot toxins created by intragene duplications.

Keywords: spider venom, venom evolution, structural venomics, transcriptomics, proteomics

Abstract

Spiders are one of the most successful venomous animals, with more than 48,000 described species. Most spider venoms are dominated by cysteine-rich peptides with a diverse range of pharmacological activities. Some spider venoms contain thousands of unique peptides, but little is known about the mechanisms used to generate such complex chemical arsenals. We used an integrated transcriptomic, proteomic, and structural biology approach to demonstrate that the lethal Australian funnel-web spider produces 33 superfamilies of venom peptides and proteins. Twenty-six of the 33 superfamilies are disulfide-rich peptides, and we show that 15 of these are knottins that contribute >90% of the venom proteome. NMR analyses revealed that most of these disulfide-rich peptides are structurally related and range in complexity from simple to highly elaborated knottin domains, as well as double-knot toxins, that likely evolved from a single ancestral toxin gene.


Spiders evolved from an arachnid ancestor in the Late Ordovician around 450 million years ago (1), and they have since become one of the most successful animal lineages on the planet, with >100,000 extant species (2). A key contributor to their evolutionary success is the use of venom to capture prey and defend against predators. The constant selection pressure on venoms over hundreds of millions of years has enabled them to evolve into complex mixtures of bioactive compounds with a diverse range of pharmacological activities.

Spider venoms are a heterogeneous mixture of salts, low molecular weight organic compounds (<1 kDa), linear and disulfide-rich peptides (DRPs) (typically, 3 to 9 kDa with three to six disulfide bonds), and proteins (10 to 120 kDa) (35). However, peptides are the major components of most spider venoms, with some containing >1,000 peptides (6). The majority of these peptides are “short” DRPs (2.5 to 5 kDa), but there is also a significant proportion of “long” DRPs (6.5 to 8.5 kDa) (5). As the primary function of spider venom is to rapidly immobilize prey, it is perhaps not surprising that most spider-venom DRPs that have been functionally characterized target neuronal ion channels and receptors (5, 7).

Although spider-venom DRPs have been shown to adopt a variety of three-dimensional (3D) structures, including the Kunitz (8), prokineticin/colipase (9), disulfide-directed β-hairpin (DDH) (10), and inhibitor cystine knot (ICK) fold (11), the majority of spider-venom DRP structures solved to date conform to the ICK motif. The ICK motif is defined as an antiparallel β-sheet stabilized by a cystine knot (11). In spider toxins, the β-sheet typically comprises only two β-strands although a third N-terminal strand is sometimes present (12). The cystine knot comprises a “ring” formed by two disulfide bonds and the intervening sections of the peptide backbone, with a third disulfide piercing the ring to create a pseudoknot (11). This knot provides ICK peptides (also known as knottins) (13) with exceptional resistance to chemicals, heat, and proteases (14, 15), which has made them of interest as drug and insecticide leads (5, 14, 16). Some spider toxins show minor (17) or more significant (18) elaborations of the basic ICK fold involving an additional stabilizing disulfide bond. More recently, “double-knot” spider toxins have been reported in which two structurally independent ICK domains are joined by a short linker (19, 20).

Like other folds with stabilizing disulfide bridges, knottins display a remarkable diversity of biological functions, including modulation of many different types of ligand- and voltage-gated ion channels (5). Despite strong conservation of the knottin scaffold across a taxonomically diverse range of spiders, several factors have hampered analysis of their evolutionary history (21). First, it is not until recently that a large number of knottin precursor sequences have become available from venom-gland transcriptomes. Second, the disulfide framework in small DRPs generally constrains the peptide fold to such an extent that most noncysteine residues can be mutated without damaging the peptide’s structural integrity, a luxury not afforded to most globular proteins (21). Thus, evolution of DRPs is typically characterized by the accumulation of many mutations, leaving very few conserved residues available for deep evolutionary analyses (21). Third, very few structures have been solved for DRPs larger than 5 kDa, making it unclear whether just a few or many of the larger DRPs are highly derived or duplicated knottins.

The only attempt so far to study the phylogeny of spider-venom knottins concluded that both orthologous diversification and lineage-specific paralogous diversification were important in generating the diverse arsenal of spider-venom knottins (22). These authors also reported that knottins that act as gating modifiers of voltage-gated ion channels likely evolved on multiple independent occasions. However, whether this resulted from convergent functional radiation of toxins or independent evolution of gating modifiers from nonvenom knottins remains unclear. The lack of nonvenom gland homologs and the use of only a single nonspider knottin outgroup precluded firm conclusions regarding the toxin, or nontoxin, origins of these convergently evolved spider-venom knottins. Thus, the deep evolutionary history of spider-venom knottins remains enigmatic (22).

Here, we report the combined use of proteomics, transcriptomics, and structural biology to determine the complete repertoire and evolutionary history of DRPs expressed in the venom of Hadronyche infensa, a member of the family of lethal Australian funnel-web spiders. This structural venomics approach has provided the most comprehensive overview to date of the toxin arsenal of a spider, and it revealed that the incredibly diverse repertoire of spider-venom DRPs is largely derived from a single ancestral knottin gene.

Results and Discussion

Mass Spectrometry (MS) Reveals a Complex Spider-Venom Peptidome.

MS analyses revealed that the venom peptidome of H. infensa is extremely complex. We used two complementary MS platforms to examine the distribution of peptide masses in pooled venom from female H. infensa (23). Analysis of secreted venom using matrix-assisted laser desorption/ionization tandem time-of-flight (MALDI-TOF/TOF) and Orbitrap MS resulted in 2,053 and 1,649 distinguishable peptide masses, respectively (Fig. 1 AD); only 651 masses were shared between the two MS datasets, leading to a total of 3,051 unique venom peptides (Fig. 1E). This is considerably more than the ∼1,000 peptides reported to be present in venom of the Australian funnel-web spider Hadronyche versuta (6) although this difference is likely a reflection of recent improvements in MS sensitivity rather than an indication of major differences in venom complexity between these closely related species. We conclude that Australian funnel-web spiders have one of the most complex venom peptidomes reported for a terrestrial animal, with similar diversity to those of marine cone snails (24).

Fig. 1.

Fig. 1.

Mass profile of H. infensa venom. (A) RP-HPLC chromatogram showing fractionation of crude H. infensa venom. Absorbance at 215 nm is shown in dark gray (left ordinate-axis) while mass count for each HPLC fraction is shown in light gray (right ordinate-axis). (B) Distribution of MALDI-TOF/TOF masses as a function of RP-HPLC retention time. (C and D) Histograms showing distribution and abundance of peptides in H. infensa venom as detected using (C) MALDI-TOF/TOF and (D) Orbitrap MS. Masses are grouped in 500-Da bins. Gray bars indicate cumulative total number of toxins (right ordinate-axis). (E) Euler plot showing overlap between mass counts generated via MALDI-TOF/TOF and Orbitrap MS. (F) A 3D landscape of H. infensa venom showing the correlation between RP-HPLC retention time and peptide mass and abundance generated by MALDI-TOF/TOF analysis. (Inset) A photo of a female H. infensa. Image credit: Bastian Rast (photographer).

The distribution of peptide masses in H. infensa venom is bimodal. Most peptides fall in the mass range 2.5 to 5.5 kDa, but there is a significant cohort of larger peptides with mass 6.5 to 8.5 kDa (Fig. 1 C and D). This bimodal distribution matches that previously described for venom from related funnel-web spiders (6, 25), various tarantulas (26), and the spitting spider Scytodes thoracica (27), and it is also reflected in the mass profile generated for all spider toxins reported to date (5). As reported previously for Australian funnel-web spiders (6, 25), the 3D venom landscape (Fig. 1F) revealed no correlation between peptide mass and peptide hydrophobicity, as judged by reversed-phase (RP) high-pressure liquid chromatography (HPLC) retention time.

Transcriptomics Uncovers the Biochemical Diversity of H. infensa Venom.

Consistent with MS analysis of secreted venom, sequencing of a venom-gland transcriptome from H. infensa revealed a biochemically diverse venom, with at least 33 toxin superfamilies (Fig. 2). In light of their likely toxic function, each superfamily of toxins was named, as suggested previously (28), after gods/deities of death, destruction, or the underworld. Expressed sequence tags were sequenced using the 454 platform and assembled using MIRA v3.2 (29). This produced a total of 26,980 contigs and 7,194 singlets, with an average contig length of 496 base pairs (bp) (maximum length 3,159 bp, N50 674 bp). After assembly, all contigs and singlets were submitted to Blast2GO (30) to acquire BLAST and functional annotations. Sequences were then grouped into three categories: 1) proteins and other enzymes (61%); 2) toxins and toxin-like peptides (14%); and 3) sequences with no hits in ArachnoServer (31) or the National Center for Biotechnology Information (NCBI, 25%) databases (SI Appendix, Fig. S1). Recovered BLAST hits included other spider toxins, as well as peptides and proteins found in other venomous arthropods with recently annotated genomes, including the tick Ixodes scapularis, the honey bee Apis mellifera, the jumping ant Harpegnathos saltator, and the parasitoid wasp Nasonia vitripennis.

Fig. 2.

Fig. 2.

Overview of the venom proteome of H. infensa. The venom proteome of H. infensa consists of 33 toxin superfamilies (SF1 to SF33). For each SF, the cysteine framework is shown in blue, and the 3D fold is classified as ICK, putative ICK, double ICK, non-ICK, or unknown. Light blue boxes enclose previously solved structures of superfamily members from H. infensa. Black boxes enclose structures of orthologous superfamily members from related mygalomorph spiders or, in the case of enzymes and CRiSP proteins, orthologs from venomous hymenopterans (bees and wasps). Red boxes enclose structures solved in the current study (SF6, SF22, SF23, and SF26). Stars enclosed by dashed black boxes signify toxin superfamilies for which no structural information is currently available. For each of the structures, β-strands are shown in blue, helices are green, and core disulfide bonds are shown as solid red tubes. For DRPs containing an ICK motif, additional disulfide bonds that do not form part of the core ICK motif are shown as orange tubes. Toxin superfamilies are named after gods or deities of death, destruction, and the underworld. For each structure shown, PDB accession numbers are given in the lower right corner.

To maximize discovery of DRPs, we used an in-house algorithm to identify potential open reading frames (ORFs). Both annotated and previously unidentified peptides were then combined into a single file that was used to search for processing signals in all precursors. After determination of cleavage regions (SI Appendix, Fig. S2), toxins were classified into superfamilies (SFs) based on their signal-peptide sequence (SI Appendix, Fig. S3), total number of cysteine residues, and their known or predicted cysteine framework (Fig. 2 and SI Appendix, Fig. S3). A brief description of each superfamily is provided in SI Appendix, Figs. S4–S36.

Transcript abundance for each superfamily was estimated using RSEM v1.2.31 (32), and these values are reported as transcripts per million (TPM) in SI Appendix, Table S1 and Fig. 3. Abundance varied greatly from very low for SF11, SF21, and SF25 (TPM 2 to 12) to very high for SF2, SF3, SF4, SF13, and SF23 (TPM 9,000 to 13,000). Due to the low throughput of 454 sequencing compared to newer sequencing technologies, we investigated the extent of sequence coverage by random sampling of the total read data to produce 11 fractional datasets and then determined the total number of superfamilies recovered for each data subset (SI Appendix, Fig. S37). Even at the lowest level of sampling (5%), 16 of 33 superfamilies were recovered, and detection saturation (30 of 33 superfamilies) was reached when 80% of reads were used for reassembly. Not surprisingly, the more reads added to the assembly, the more isoforms we recovered, and, similarly, superfamilies with low transcript abundance needed higher sampling to be detected. We conclude that the 454 sequencing platform provided good coverage of the H. versuta venome.

Fig. 3.

Fig. 3.

Abundance of transcripts encoding DRPs and proteins obtained from sequencing of an H. infensa venom-gland transcriptome. Blue bars represent protein-encoding transcripts while red and gray bars denote transcripts encoding DRPs that do or don’t have an ICK scaffold, respectively. ICK transcripts dominate the venom-gland transcriptome. Superfamily numbers correspond to those shown in Fig. 2.

Using this approach, we determined that the venom gland of H. infensa expresses at least 26 families of DRPs, three families of enzymes (hyaluronidase, lipase, and phospholipase A2 [PLA2]), one family of cysteine-rich secretory proteins (CRiSPs), and three families of uncharacterized secreted proteins (Figs. 2 and 3). Hyaluronidase and PLA2 have been convergently recruited into many animal venoms (33). Venom hyaluronidases act as hemostatic factors (33) or facilitate spread of neurotoxins by enhancing tissue permeability (34). Sequence diversity across taxa is minimal, and no new activities have been reported for venom hyaluronidases (33). In contrast, venom PLA2s are diverse, and they have acquired a range of new functions, some of which are independent of their catalytic activity (35). There are only a few reports of lipases in animal venoms, but the H. infensa venom lipase has significant homology to those found in wasp venoms (36, 37). It is unclear what function lipases might play in spider venom. Finally, consistent with an actively regenerating venom gland, the transcriptome is enriched with transcripts encoding proteins involved in translation, processing, folding, and posttranslational modification of venom toxins (SI Appendix, Fig. S38).

Proteomic Analyses Confirm the Biochemical Complexity of H. infensa Venom.

To further probe the proteomic complexity of H. infensa venom, we searched our annotated transcriptome with tandem MS spectra generated from fractionated, reduced, alkylated, and trypsin-digested venom. Using a stringent confidence threshold (<1% false discovery rate [FDR] calculated from a decoy-based FDR analysis in ProteinPilot v.4.5.1), we detected 1,108 of the 1,224 venom proteins and peptides predicted from the transcriptomic analyses. Using the same criteria but a different search program (Mascot v2.5.1), we identified 877 of the same of the 1,224 venom proteins and peptides, 849 of which were identified by both programs (SI Appendix, File 1). These proteomically identified venom components spanned 22 (ProteinPilot) or 20 (Mascot) of the predicted 33 superfamilies, including all of the most diverse superfamilies, and three of the seven predicted protein superfamilies. There was a strong correlation between low expression and lack of proteomic detection; the predicted peptide superfamilies with cumulative expression values <100 TPM (Fig. 3) accounted for all peptide superfamilies not detected in the venom. Taken together, our transcriptomic and proteomic analyses indicate that H. infensa has a highly complex peptide-dominated venom.

The Venom of H. infensa Contains both Classical and Highly Elaborated ICK Toxins.

Twenty-six of the 33 toxin superfamilies identified in the venom-gland transcriptome of H. infensa are DRPs (Fig. 2 and SI Appendix, Fig. S3). Thus, DRPs are the dominant polypeptides in the venom, even when transcript abundance is considered (Fig. 3). Eight of the 26 DRP superfamilies could be confidently classified as knottins based on prior determination of the 3D structure of one of the superfamily members (SF3) or strong homology with an orthologous knottin (i.e., from another spider venom) whose structure has been elucidated (SF8 to -10, SF13, SF16, SF17, and SF20). A further three superfamilies were confidently predicted to be knottins because their cysteine framework and intercysteine spacing conform to the ICK motif (SF4, SF19, and SF24), albeit with an additional disulfide bond in the case of SF4. In addition, SF1 is a family of double-knot toxins, one of whose structure was recently solved (Fig. 4A) (20). Thus, 12 of the 26 DRP superfamilies can be assigned as knottins. Only six DRP superfamilies could be confidently assigned as not being knottins: SF2 corresponds to the MIT/Bv8/prokineticin superfamily (38), SF5 is a family of β-hairpin toxins with high homology to the antimicrobial peptide gomesin isolated from hemocytes of the tarantula Acanthoscurria gomesiana (39), SF11 is a family of cystatin-like toxins, SF12 is a derived MIT/Bv8/prokineticin family with four rather than the canonical five disulfide bonds, and SF7 and SF21 have less than the minimum of six cysteine residues required to form an ICK motif.

Fig. 4.

Fig. 4.

Structures of selected DRPs found in the venom of H. infensa, highlighting key structural innovations in “short” and “long” peptide toxins. (A) Schematic representation of the SF1 double-knot toxin, which is comprised of two independently folded ICK domains joined by an inflexible linker. (B) SF6 family peptides adopt a typical ICK fold with several unique elaborations, including an extended C-terminal tail that is stapled to the rest of the structure via an additional disulfide bond (“tail-lock”), an enlarged intercystine loop 4 that contains an α-helical insertion, and a highly extended N-terminal region that includes a six-residue α-helix and a short two-stranded β-sheet. (C) SF26 DRPs also adopt a highly elaborated knottin fold with a C-terminal tail tail-lock, a five-residue 310 helix in the core ICK region between CysII and CysIII, and a long N-terminal extension that includes a short α-helix. (D) SF23 DRPs only differ from classical ICK toxins by having an extra disulfide bond (“β-sheet staple”) that stabilizes the β-hairpin loop. (E) Structural alignment of the ICK core regions of DRPs from SF1, SF6, SF17, SF22, and SF23, highlighting the strong conservation of the ICK motif (DDH in the case of SF22) regardless of the extent of structural elaborations outside this core region. (F) SF22 DRP represents a previously unidentified toxin fold comprised of two independent structural domains connected by a disulfide bond. The C-terminal domain forms a DDH core while the N-terminal domain adopts a DABS fold. In all panels, disulfide bonds comprising the ICK motif are shown as red tubes while additional noncore disulfide bonds are shown as orange tubes. β-sheets and α-helices are highlighted in blue and green, respectively.

Eight of the 26 DRP superfamilies with three or more disulfide bonds have sequences with unknown structure and biological activity (SF6, SF14, SF15, SF18, SF22, SF23, SF25, and SF26). In order to better understand the structural diversity and evolutionary origin of spider-venom DRPs, we attempted to determine the structure of representative members of SF6, SF14, SF26, and homologous sequences to those of SF22 and SF23. These superfamilies were prioritized over SF15, SF18, and SF25, which contain only low-abundance transcripts (Fig. 3). Peptides were expressed in the periplasm of Escherichia coli using a previously optimized system (40), then purified using nickel-affinity chromatography followed by RP-HPLC (SI Appendix, Fig. S5). All of the chosen peptides were successfully produced with the exception of SF14, which failed to express in E. coli. The SF6, SF22, SF23, and SF26 peptides were uniformly labeled with 15N and 13C, and their structures were determined using an NMR strategy we described previously that takes advantage of nonuniform sampling to expedite data acquisition and improve structure quality (41). Each of the determined structures is of high precision and has excellent stereochemical quality, as evidenced by the structural statistics presented in SI Appendix, Table S2.

The SF6 family member adopts a highly derived knottin fold with several unique elaborations (Fig. 4B): 1) The β-hairpin loop encompasses an eight-residue α-helix; 2) an extended C-terminal tail is locked in place by a disulfide bridge to the adjacent C-terminal β-strand of the ICK motif; and 3) a highly extended N-terminal region (i.e., preceding the first cysteine residue) includes a short six-residue α-helix and a β-strand that forms a small β-sheet via hydrogen bonds with residues in the CysIV–CysV intercystine loop. The SF26 family member also adopts a highly modified knottin fold (Fig. 4C), with the following elaborations of the classical ICK motif: 1) Similar to SF6 toxins, an extended C-terminal tail is stapled in place by a disulfide bridge between the C-terminal Cys residue and a Cys residue at the base of the adjacent C-terminal β-strand of the ICK motif; 2) the long N-terminal region includes a short six-residue 310 helix followed by an extended segment that stretches across toward the much longer β-sheet of the cystine knot; and 3) a five-residue 310 helix is inserted in the CysII–CysIII loop in the ICK motif. The SF23 family member adopts a prototypical knottin fold with an additional stabilizing disulfide bond at the base of the β-hairpin loop (Fig. 4D), which is a relatively common elaboration in spider toxins (42). Superimposition of the 3D structures of SF6, SF23, SF26, and either of the two ICK domains of SF1 reveals that, regardless of the size and structural elaborations acquired in these peptides (Fig. 4E and Movie S1), the central ICK motif remains essentially the same, despite enormous variations in amino acid sequence.

Elucidation of the 3D structure of toxin U33-TRTX-Cg1c from the Chinese tarantula Chilobrachys guanxiensis, which is homologous to SF22, revealed that these toxins do not adopt an ICK fold. The overall topology of SF22 (Fig. 4F) is similar to the prokineticin/colipase fold adopted by other five-disulfide DRPs, such as the SF2 and SF12 toxins, but they are not homologous. The prokineticin/colipase fold found in vertebrate proteins can be viewed as a head-to-tail duplication of two DDH motifs with various elaborations on the loops (10). SF22 also contains two core motifs. The C-terminal motif is structurally similar to the C-terminal subdomain of prokineticin and colipase, and it adopts a DDH motif with a long 17-residue loop between CysVI and CysVIII (cf. four to six residues in the equivalent loop of most ICK motifs). The extended loop bridges over to the N-terminal motif, stabilized by an intersubdomain disulfide bond (CysIII–CysVII) and a hydrogen bond between Cys15 (CysIV) and Tyr49. However, in contrast with prokineticin and colipase, the N-terminal region does not adopt a DDH fold. Rather, the N-terminal core is composed of two stacked antiparallel β-hairpins in which the parallel strands in the adjacent hairpins are covalently linked by two disulfide bonds (Fig. 4F). This stacked-hairpin fold resembles the “mini-granulin” fold seen in diverse proteins, such as granulin, serine proteases, growth factors, and cone snail venom peptides (43, 44). Given the uncertain evolutionary history and diverse taxa that express this protein fold, we refer to it here as a “disulfide-stabilized antiparallel β-hairpin stack” (DABS). Covalent linkage of a DDH and DABS domain has not been reported in venom peptides, and thus SF22 represents a new class of toxins.

ICK Toxins Dominate the Venom Arsenal of H. infensa.

Our combined transcriptomic, proteomic, and structural biology data revealed that ICK toxins are the major contributor to the diversity of the H. infensa venom peptidome, with 15 of the 26 identified DRP superfamilies comprised of simple or highly derived knottins. Moreover, analysis of transcript abundance revealed that 11 of the 14 most highly expressed DRP superfamilies are ICK toxins (Fig. 3). Overall, based on transcript abundance, ICK toxins represent ∼91% of the total venom peptidome. If one includes the seven superfamilies of putative protein toxins (SF27–SF33), which are all expressed at very low levels, the venom proteome is composed of 90.5% ICK toxins, 9.1% non-ICK DRPs, and 0.4% protein toxins. We conclude that ICK toxins dominate the venom arsenal of H. infensa both in terms of molecular diversity and abundance.

To expand on the activities reported for ICK toxins from funnel-web spider venom (SI Appendix, Table S4), we examined whether SF4, SF6, SF23, and SF26 are insecticidal. All four toxins induced paralysis when injected into sheep blowflies (SI Appendix, Table S3). SF4 and SF23 were only active at high dose, and flies recovered within 24 h. SF6 caused irreversible paralysis, leading to death at moderate doses (9.7 nmol/g), while SF26 paralyzed all flies within 30 min, even at low dose (0.3 nmol/g), but the effects were reversible. The activity of SF26 is comparable to other potent insecticidal spider toxins tested in this assay (45).

Phylogenetic Analyses Suggest Structural Radiation of a Single Lineage of Spider-Venom ICK Toxins.

In order to examine whether ICK toxins in funnel-web spider venom have a common origin, we examined their phylogenetic relationship with homologous sequences from nonvenom gland spider tissues (hypodermal tissue, book lung, and leg), as well as tissue from the whip scorpion Mastigoproctus giganteus, a nonvenomous arachnid. All known spider-venom sequences deposited in UniProtKB were also used (Fig. 5 and SI Appendix). Briefly, sequence files were downloaded, trimmed, and assembled, and then contigs/singlets were analyzed to identify ICK variants. These ICK variants included non-venom-gland transcripts from H. infensa (leg contig DN34637_c0_g1_i4) and Haplopelma hainanum (book-lung contigs DN16443_c0_g1_i1, DN40049_c0_g1_i5, and i6, and DN40049_c0_g2_i1) that encoded peptide sequences that are identical or near identical to confirmed venom components. In the case of H. infensa, the leg transcript was identical to that of ω-HXTX-Hi1h (venom gland contig RL5_rep_c1832), which was confidently identified in the venom (SI Appendix). Similarly, three transcripts from the book lung transcriptome of H. hainanum are identical with toxins previously characterized from the venom of this species. Detection of these sequences in nonvenom gland transcriptomes is likely due to “leaky” expression of bona fide toxin genes, which has been reported in other venomous lineages (46).

Fig. 5.

Fig. 5.

Evolution of spider-venom ICK toxins. Maximum likelihood tree showing the phylogenetic relationship between H. infensa DRPs and DRPs isolated from the venom gland or other tissues of other spider species. The tree was rooted using the whip scorpion M. giganteus as the outgroup. Bootstrap values are shown at each node, except for nodes where support was 100. The tree shows that, although many of the phylogenetic relationships between superfamilies remain unresolved, all venom-derived DRPs form a well-supported monophyletic clade. Superfamily sequences belonging to H. infensa are highlighted in blue text, and representative structures for each superfamily are shown. DRP sequences from muscle and other tissues are highlighted in red; all other sequences (denoted by the light blue broken circle) represent venom DRPs. Venom peptides isolated from other species have their corresponding accession numbers/common toxin names listed in the labels, with the exception of the H. hainanum superfamily XVI clade. A summary of all accessions used can be found in SI Appendix.

Strikingly, our analyses of nonvenom gland tissues only returned hits to peptides with simple “classical” ICK folds. Thus, the highly elaborated ICK structures observed for several venom-peptide superfamilies in H. infensa appear to represent extreme cases of structural diversification following functional recruitment as toxins. Phylogenetic reconstruction of these and all other spider ICK peptides identified using our approach grouped all confirmed venom knottins into a well-supported monophyletic clade (Fig. 5). Our analysis also clustered several putative venom toxins with sequences from the leg muscle transcriptome of Liphistius malayanus, which could indicate either a physiological role for these putative toxins or multiple recruitment of ICK peptides into spider venom. However, regardless of the number of ICK recruitments into venom, our results suggest that the vast majority of the structural diversity of ICK toxins, which make up the bulk of the molecular diversity in the venom of H. infensa, arose from a single lineage of weaponized “classical” ICK fold.

Elaborations on the Basic Knottin Disulfide Architecture Facilitate ICK Diversification.

Although we were unable to resolve the higher order phylogenetic relationships between ICK toxin superfamilies, our analyses highlight the importance of loss and gain of disulfide bonds in their evolutionary radiation (Fig. 5). Functional radiation of ICK and other DRP folds is thought to occur primarily by diversification of the intercystine loop regions, which does not affect the structural integrity of the peptides. However, our data show that elaborations on the disulfide architecture itself can play a prominent role in the diversification of DRPs. While the core ICK motif contains only three disulfide bonds (11), our data suggest that the ancestral ICK spider toxin either contained a fourth disulfide bond stabilizing the β-sheet in loop 4 or that this additional disulfide was gained on at least two separate occasions. A loss of intramolecular disulfide bonds may seem counterintuitive during the evolution of venom DRPs for which stability is paramount. However, intriguingly, the stabilizing loop-4 disulfide bridge was present in all homologs identified in the outgroup datasets. Moreover, disulfide loss appears to have been crucial during evolution of DDH toxins (10) from ICK precursors in scorpions (21), suggesting that this scenario may not be as unlikely as previously postulated (10, 47).

Regardless of whether the ancestral ICK toxin contained the loop-4 disulfide, our data indicate that gain of disulfide bonds underlies the myriad of structural variations in ICK spider toxins. This not only highlights the importance of disulfide bonds in stabilizing the structure of spider toxins but further illustrates the extraordinary versatility of the ICK fold. The permissiveness of the ICK fold to both sequence variation (21) and disulfide elaborations likely explains why it appears to be the most widespread cysteine-rich fold known, being found in diverse taxa, including arachnids, fungi, insects, molluscs, plants, sea anemones, sponges, and even viruses (21, 48). ICK peptides also appear to constitute the most abundant fold in the “dark proteome” (49), suggesting that we still have much to learn about their structural and functional plasticity.

In summary, our structural venomics approach revealed the pathway for evolution of complex venoms in Australian funnel-web spiders. Multiple duplications of an ancestral ICK gene were followed by periods of diversification, generation of structural variants with additional disulfide bonds, and selection of variants via adaptive evolution to generate small disulfide-rich toxins. Limited intragenic duplication of an ICK gene produced double-knot toxins, but most venom peptides with mass >6 kDa arose from diverse elaborations of a single ICK scaffold, often accompanied by inclusion of additional disulfide bonds (Fig. 6). Thus, the extraordinary pharmacological diversity of spider venom, which is one of the most complex chemical arsenals in the natural world, is conferred by a panel of structurally related DRPs that all evolved from an ancestral ICK toxin.

Fig. 6.

Fig. 6.

Overview of structural innovations in spider-venom ICK peptides. Schematic overview of the mechanisms by which an ancestral ICK or DDH toxin was duplicated, conjugated, and elaborated upon to form the diversity of ICK scaffolds found in extant mygalomorph spider venoms. Gray numbered circles represent cysteine residues, with core disulfide bonds that form the cystine knot or DDH motif indicated by solid red lines. Additional noncore disulfides are highlighted in orange, with dashed lines indicating interdomain disulfide bonds. Blue arrows and green rectangles denote β-strands and α-helices, respectively. N and C termini are labeled, and PDB accession codes for representative structures are given below each schematic peptide fold.

Materials and Methods

Spider and Venom Collection.

H. infensa (mudjar nhiling guran) were collected on private land from Orchid Beach, K'gari (Fraser Island), QLD, Australia. Spiders were housed individually at 23 to 25 °C in plastic containers in dark cabinets. Adult female spiders were aggravated using forceps until venom was expressed on the fang tips, and then the venom was collected continuously by aspiration until no more venom could be obtained. We assume this corresponds to exhaustion of the venom-gland contents, and hence the venom collected should closely represent the complete venom proteome. Venom was stored at –20 °C. If impurities, such as soil or sand, were detected, the venom was diluted 10-fold with ultrapure water, and contaminants were removed using a low-protein–binding 0.22-μm Ultrafree-MC centrifugal filter (Millipore).

Venom Profiling.

The venom profile of H. infensa was obtained using a combination of RP-HPLC and MS techniques (6). Offline liquid chromatography (LC)-MALDI-TOF/TOF was utilized to maximize the amount of mass information extracted. Liquid chromatography-electrospray ionization (LC-ESI) OrbiTrap MS data were acquired as described (50). A conservative analysis was used to manually curate both MS datasets. Analysis criteria included: 1) Only spectra with well-defined peaks were considered as “real”; 2) a signal-to-noise ratio of 15 was set before masses were recorded; 3) masses were excluded if they met the following criteria relative to another recorded mass: +16 Da and +32 Da (oxidized), –18 Da (dehydrated), –17 Da (deamidated), +22 Da (sodium adduct); 4) masses <1,000 Da, which were presumed to correspond to small organic compounds such as polyamines or matrix clusters, rather than peptides, were excluded from the analysis. Duplicated masses were eliminated from adjacent fractions, and a final list of sorted masses was generated. Potential dimers and doubly charged species were also eliminated (±3 to 5 Da). Then 3D contour plots, or “venom landscapes,” were constructed from LC-MALDI-TOF/TOF MS data as described (6).

For proteomic analysis, 5 mg of milked venom pooled from three specimens was fractionated on a Shimadzu Prominence HPLC (Kyoto, Japan) using a Vydac C18 column (300 Å, 5 μm, 4.6 mm × 250 mm). Venom was eluted using a gradient of 5 to 80% solvent B (90% acetonitrile [ACN], 0.043% trifluoroacetic acid [TFA]) in solvent A (0.05% TFA) over 2 h. Twelve fractions were manually collected (one every 10 min), and absorbance was monitored at 214 nm and 280 nm. Fractions were lyophilized and reconstituted in ultrapure water, and 1/10 were removed for proteomic analyses. Fractionated proteins and peptides were reduced with dithiothreitol (5 mM in 50 mM ammonium bicarbonate, 15% ACN, pH 8), alkylated with iodoacetamide (10 mM in 50 mM ammonium bicarbonate, 15% ACN, pH 8), and digested overnight with trypsin (30 ng/µL in 15 µL of 50 mM ammonium bicarbonate, 15% ACN, pH 8). Upon completion of digestion, formic acid (FA) was added to a final concentration of 5%, and the digested samples were desalted using a C18 ZipTip (Thermo Fisher, Waltham, MA).

Desalted, digested samples were dried in a vacuum centrifuge and dissolved in 0.5% FA, and 2 µg were analyzed by LC–tandem MS (LC-MS/MS) on an AB Sciex 5600TripleTOF (AB Sciex, Framingham, MA) equipped with a Turbo-V source heated to 550 °C and coupled to a Shimadzu Nexera UHPLC. Samples were fractionated on an Agilent Zorbax stable-bond C18 column (2.1 × 100 mm, 1.8 μm, 300 Å pore size) using a gradient of 1 to 40% solvent B (90% ACN, 0.1% FA) in 0.1% FA over 45 min, using a flow rate of 180 µL/min. MS1 survey scans were acquired at 300 to 1,800 m/z over 250 ms, and the 20 most intense ions with a charge of +2 to +5 and an intensity of at least 120 counts per second were selected for MS2. The unit mass precursor ion inclusion window was ±0.7 Da, and isotopes within ±2 Da were excluded from MS2, which scans were acquired at 80 to 1,400 m/z over 100 ms and optimized for high resolution. For protein identification, MS/MS spectra were searched against the translated annotated H. infensa venom-gland transcriptome using ProteinPilot v4.5.1 (AB SCIEX, Framingham, MA). Searches were run as thorough identification searches, specifying tryptic digestion and cysteine alkylation by iodoacetamide. Decoy-based FDRs were estimated by ProteinPilot; for protein identification, we used a protein confidence cutoff corresponding to a local FDR of <1%. Spectra were also manually examined to further eliminate false positives. MS/MS data were also searched against the venom-gland transcriptome using Mascot v2.5.1 (Matrix Science, Boston, MA) using a decoy-based FDR cutoff of <1%. For this search, we set the instrument to ESI-quadrupole time-of-flight (QUAD-TOF), specified trypsin as the cleavage enzyme, allowed up to one missed cleavage, and used a peptide tolerance of ±10 parts per million and MS/MS tolerance of ±0.1 Da, with cysteine carbamidomethylation as a fixed modification. C-terminal amidation and N-terminal pyroglutamate formation were allowed as variable modifications. These searches returned 3,012 and 2,944 hits for ProteinPilot and Mascot, respectively.

Complementary DNA (cDNA) Library Construction and Sequencing.

Three spiders were milked via aggravation to deplete their glands of venom. Three days later, they were anesthetized, and their venom glands were dissected and placed in TRIzol reagent (Life Technologies). Total RNA from pooled venom glands was extracted following the standard TRIzol protocol. Messenger RNA (mRNA) enrichment from total RNA was performed using an Oligotex direct mRNA mini kit (Qiagen). RNA quality and concentration were measured using a Bioanalyzer 2100 pico chip (Agilent Technologies).

A cDNA library was constructed from 100 μg of mRNA using the standard Roche cDNA rapid library preparation and emPCR method. Sequencing was carried at the Australian Genome Research Facility using a ROCHE GS-FLX sequencer. The raw standard flowgram file (.SFF) was processed using Cangs software, and low-quality sequences were discarded (Phred score cutoff of 25) (51). De novo assembly was performed using MIRA software v3.2 (29) using the following parameters: -GE:not = 4–project = Hinfensa–job = denovo,est,accurate,454 454_SETTINGS -CL:qc = no -AS:mrpc = 1 -AL:mrs = 99,egp = 1. Consensus sequences from contigs and singlets were submitted to Blast2GO to acquire functional annotations (30). In parallel to the functional annotation, an in-house algorithm (Toxin|seek) was written to allow prediction of potential toxin ORFs that may represent novel DRPs. Once annotated and predicted, lists were merged, redundancies were removed, and signal sequences were determined using SignalP v3.1 (52). Putative propeptide cleavage sites were predicted using a sequence logo analysis (53) of all known spider precursors (SI Appendix, Fig. S2). After identifying all processing signals, toxins were classified into superfamilies based on their signal sequence and cysteine framework. Signal sequences were considered different if the level of amino acid sequence identity was <65%; the pairwise comparison of signal sequences in SI Appendix, Fig. S3 shows that, for most signal sequence comparisons, the level of identity is <30%.

To examine the extent of sequence coverage provided by the 454 platform, we generated 11 subsets of data that randomly sampled the entire read set in 10% increments. Each randomly sampled dataset was assembled as previously described using MIRA v3.2. Each dataset was then submitted to Tox|Blast, a module of Tox|Note (31), or Transdecoder v5.5.0. Hits were isolated from the Tox|Blast output. We used cd-hit2d v4.6 (54) to cluster and compare results from the Tox|Blast/Transdecoder outputs (settings: -c 0.6 -n 4 -d 100) and then recorded the number of superfamilies recovered for each data subset (SI Appendix, Fig. S37).

Nomenclature.

Toxins were named using the rational nomenclature described previously (55). Spider taxonomy was taken from the World Spider Catalog v19.5 (https://wsc.nmbe.ch).

Peptide Expression and Purification.

Genes encoding toxins of interest were synthesized using codons optimized for E. coli expression and subcloned into a pLICC_D168 vector by GeneArt (Regensburg, Germany). Plasmids were transformed into E. coli BL21 (λDE3), and toxin expression and purification were carried out using published methods (40) (see SI Appendix, Fig. S5 for an example). Peptides were separated from salts, His6-MBP and His6-TEV protease using a Shimadzu Prominence HPLC system (Kyoto, Japan). Separation was performed on a Jupiter C4 reverse-phase HPLC column (250 × 10 mm, 300 Å, 10 μm; Phenomenex), using a flow rate of 5 mL/min and an elution gradient of 5 to 60% solvent B over 30 min. Absorbance was monitored at 214 nm and 280 nm, and fractions were manually collected. Purified peptides were lyophilized and reconstituted in water or buffer, and approximate peptide concentration was determined from absorbance at 280 nm measured using a NanoDrop spectrophotometer (Thermo Scientific). ESI-MS spectra were acquired using an API 2000 LC-MS/MS triple quadrupole mass spectrometer. Mass spectra were obtained using positive ion mode over m/z range of 400 to 1,900 Da.

Structure Determination.

The structures of toxins from SF6, SF22, SF23, and SF26 were determined using heteronuclear NMR. Each sample contained 300 µL of 13C/15N-labeled peptide (100 to 300 µM) in 20 mM 2-(N-morpholino)ethanesulfonic acid (MES) buffer, 0.02% NaN3, 5% D2O, pH 6. NMR data were acquired at 25 °C on an Avance II+ 900 MHz spectrometer (Bruker BioSpin) equipped with a cryogenically cooled triple resonance probe. Resonance assignments were obtained from two-dimensional (2D) 1H-15N heteronuclear single-quantum coherence (HSQC), 2D 1H-13C HSQC, 3D HNCACB, 3D CBCA(CO)NH, 3D HNCO, 3D HBHA(CO)NH, and four-dimensional (4D) HCC(CO)NH–total correlated spectroscopy (41) spectra. The 3D and 4D spectra were acquired using nonuniform sampling and were reconstructed using the maximum entropy algorithm in the Rowland NMR Toolkit. 13C-aliphatic, 13C-aromatic and 15N-edited nuclear Overhauser effect spectroscopy (NOESY)-HSQC spectra were acquired using uniform sampling for extraction of interproton distance restraints.

Resonance assignments and integration of NOESY peaks were achieved using SPARKY 3 (56), followed by automatic peak list assignment, extraction of distance restraints, and structure calculation using the torsion angle dynamics package CYANA 3.0 (57). Dihedral-angle restraints derived from TALOS chemical shift analysis (58) were also integrated in the calculation, with the restraint range set to twice the estimated SD. CYANA was used to calculate 200 structures from random starting conformations, and then the best 20 structures were selected based on their stereochemical quality as judged using MolProbity (59). Structural statistics and Protein Data Bank (PDB) accession codes are summarized in SI Appendix, Table S2.

Homology Modeling.

Superfamilies identified to have a structural counterpart in the PDB (SI Appendix, Table S5) were modeled using automodel parameters in Modeler v9.12 (60). A total of 100 to 500 models were generated for each structure. The lowest energy models were selected and visualized using PyMOL v.2.0 (Schrödinger, LLC).

Phylogenetic Analyses.

Bayesian inference (BI) of phylogeny and maximum likelihood (ML) reconstruction were used to examine the evolution of ICK toxins. To search for nonvenom outgroups, we examined datasets from one species of whip scorpion (M. giganteus [Uropygi]; SRR1145698), a nonvenomous sister group to spiders (61), and nonvenom gland transcriptomes from six species of taxonomically diverse spiders: leg muscle from H. infensa (PRJEB35676), Macrothele calpeiana (SRR6994009), L. malayanus (SRR1145736), and Neoscona arabesca (SRR1145741); hypodermal tissue from Cupiennius salei (SRR880446); and book-lung tissue from H. hainanum (SRR2155568). Fastq files were downloaded, trimmed using trimmomatic v0.39 (62) with window quality 30, window size 4, minimum read length 60 bp, and then assembled using default settings in Trinity v2.3 (63). The resulting assemblies were combined, and predicted coding DNA sequences (CDSs) were extracted using a combination of tools (31, 64). Accession numbers for all sequences are given in SI Appendix, File 2. The resulting sequences were filtered based on the presence of a signal peptide predicted using SignalP v4.1 (52) and aligned with MAFFT v7.304b using a local Smith–Waterman alignment method with affine gap cost (L-INS-I option) (65), and then the alignment was adjusted manually to correct misaligned structurally equivalent cysteine residues before realigning the intercysteine loops using the MAFFT regional realignment script. The resulting alignment was used to estimate phylogenetic relationships via ML reconstruction using IQ-TREE v.6.12 (66). The TESTNEW command was used to allow IQ-Tree to determine the best fitting model according to Bayesian information criterion analysis, and node support was tested by ultrafast bootstrap approximation (UFBoot) (67) using 10,000 iterations.

We also estimated phylogenetic relationships via BI using MrBayes v3.2.2 (68), with standard settings, except that we specified the same amino acid substitution model identified by IQ-TREE (69, 70). The Markov chain Monte Carlo algorithm was run for 10,000,000 generations with sampling frequency of 500. Two or four runs were performed simultaneously with chain temperatures of 0.05, 0.1, 0.2, or 0.5. However, none of the calculations reached convergence as judged by comparison of mean SD of split frequencies (SD > 0.5). A comparison of phylogenies obtained by ML and BI analysis of a reduced dataset that did converge (SI Appendix, Fig. S40) showed good agreement between the two approaches and supported the ML phylogeny based on the full sequence alignment (SI File 3). For the BI tree summary, the log-likelihood score of each saved tree was plotted against the number of generations to establish the point at which the log likelihood scores reached asymptote. Posterior probabilities for clades were established by constructing a majority-rule consensus tree for all trees generated after burn-in. Trees were visualized and exported using Archaeopteryx v0.99 (www.phylosoft.org/archaeopteryx) and FigTree v1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/).

Blowfly Toxicity Assay.

Recombinant toxins were dissolved in water and injected into the ventrolateral thoracic region of sheep blowflies (Lucilia cuprina; mass 23.9 to 36.5 mg) as previously described (42). The extent of lethality and paralysis was scored at 0.5, 1, and 24 h postinjection.

Data Deposition.

Metadata and annotated nucleotide sequences were deposited in the European Nucleotide Archive under project accessions PRJEB6062 (ERA298588) and PRJEB35693. Atomic coordinates were deposited in the Protein Data Bank under accession codes 2N6N, 2N6R, 6BA3, and 2N8K while the corresponding NMR chemical shifts were deposited in BioMagResBank under accession codes BMRB25774, BMRB25778, BMRB25853, and BMRB30352. MS data we deposited to the ProteomeXchange Consortium (71) via the PRIDE (72) partner repository with the dataset identifier PXD016886.

Supplementary Material

Supplementary File
pnas.1914536117.sapp.pdf (12.9MB, pdf)
Supplementary File
Download video file (76.8MB, mp4)
Supplementary File
pnas.1914536117.sd01.xlsx (90.8KB, xlsx)
Supplementary File
pnas.1914536117.sd02.xlsx (21.6KB, xlsx)
Supplementary File
pnas.1914536117.sd03.txt (112.8KB, txt)

Acknowledgments

We thank David Wilson for collection of H. infensa; Philip Lawrence for curating MS data; Julie Klint, Niraj Bende, and Jessie Er for recombinant peptide production; Lars Ellgaard for alerting us to the mini-granulin fold; Geoff Brown (Department of Agriculture and Fisheries, Queensland, Australia) for supply of blowflies; and Michael Nuhn and Lien Li (European Molecular Biology Laboratory-Australia Bioinformatics Resource) for help with data submission to European Nucleotide Archive. This research was supported by Australian National Health & Medical Research Council Principal Research Fellowship APP1136889 (to G.F.K.), Australian Research Council Discovery Early Career Researcher Award Fellowship DE160101142 (to E.A.B.U.), Research Council of Norway Funding Scheme for Independent Projects–Young Research Talents Fellowship 287462, and a University of Queensland International Postgraduate Scholarship (to S.S.P.).

Footnotes

Competing interest statement: C.D. is affiliated with Thermo Fisher Scientific.

This article is a PNAS Direct Submission.

Data deposition: Atomic coordinates for protein structures determined in this study were deposited in the Protein Data Bank under accession codes 2N6N, 2N6R, 6BA3, and 2N8K while corresponding NMR chemical shifts were deposited in BioMagResBank under accessions BMRB25774, BMRB25778, BMRB25853, and BMRB30352. Metadata and annotated nucleotide sequences were deposited in the European Nucleotide Archive under project accessions PRJEB6062 (ERA298588) and PRJEB35693. Mass spectrometry data has been deposited to ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD016886.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1914536117/-/DCSupplemental.

References

  • 1.Lozano-Fernandez J. et al., A molecular palaeobiological exploration of arthropod terrestrialization. Philos. Trans. R. Soc. Lond. B Biol. Sci. 371, 20150133 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Mora C., Tittensor D. P., Adl S., Simpson A. G., Worm B., How many species are there on Earth and in the ocean? PLoS Biol. 9, e1001127 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Schroeder F. C. et al., NMR-spectroscopic screening of spider venom reveals sulfated nucleosides as major components for the brown recluse and related species. Proc. Natl. Acad. Sci. U.S.A. 105, 14283–14287 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kuhn-Nentwig L., Stocklin R., Nentwig W., Venom composition and strategies in spiders: Is everything possible? Adv. In Insect Phys. 40, 1–86 (2011). [Google Scholar]
  • 5.King G. F., Hardy M. C., Spider-venom peptides: Structure, pharmacology, and potential for control of insect pests. Annu. Rev. Entomol. 58, 475–496 (2013). [DOI] [PubMed] [Google Scholar]
  • 6.Escoubas P., Sollod B., King G. F., Venom landscapes: Mining the complexity of spider venoms via a combined cDNA and mass spectrometric approach. Toxicon 47, 650–663 (2006). [DOI] [PubMed] [Google Scholar]
  • 7.Smith J. J. et al., ““Therapeutic applications of spider-venom peptides”” in Venoms to Drugs: Venom as a Source for the Development of Human Therapeutics, King G. F., Ed. (The Royal Society of Chemistry, 2015), pp. 221–244. [Google Scholar]
  • 8.Yuan C.-H. et al., Discovery of a distinct superfamily of Kunitz-type toxin (KTT) from tarantulas. PLoS One 3, e3414 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Morales R. A. V. et al., Chemical synthesis and structure of the prokineticin Bv8. ChemBioChem 11, 1882–1888 (2010). [DOI] [PubMed] [Google Scholar]
  • 10.Wang X. et al., Discovery and characterization of a family of insecticidal neurotoxins with a rare vicinal disulfide bridge. Nat. Struct. Biol. 7, 505–513 (2000). [DOI] [PubMed] [Google Scholar]
  • 11.Pallaghy P. K., Nielsen K. J., Craik D. J., Norton R. S., A common structural motif incorporating a cystine knot and a triple-stranded β-sheet in toxic and inhibitory polypeptides. Protein Sci. 3, 1833–1839 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.King G. F., Tedford H. W., Maggio F., Structure and function of insecticidal neurotoxins from australian funnel-web spiders. Toxin Rev. 21, 361–389 (2002). [Google Scholar]
  • 13.Postic G., Gracy J., Périn C., Chiche L., Gelly J. C., KNOTTIN: The database of inhibitor cystine knot scaffold after 10 years, toward a systematic structure modeling. Nucleic Acids Res. 46, D454–D458 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kolmar H., Natural and engineered cystine knot miniproteins for diagnostic and therapeutic applications. Curr. Pharm. Des. 17, 4329–4336 (2011). [DOI] [PubMed] [Google Scholar]
  • 15.Herzig V., King G. F., The cystine knot is responsible for the exceptional stability of the insecticidal spider toxin ω-hexatoxin-Hv1a. Toxins (Basel) 7, 4366–4380 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.King G. F., Venoms as a platform for human drugs: Translating toxins into therapeutics. Expert Opin. Biol. Ther. 11, 1469–1484 (2011). [DOI] [PubMed] [Google Scholar]
  • 17.Fletcher J. I. et al., The structure of a novel insecticidal neurotoxin, ω-atracotoxin-HV1, from the venom of an Australian funnel web spider. Nat. Struct. Biol. 4, 559–566 (1997). [DOI] [PubMed] [Google Scholar]
  • 18.Bende N. S. et al., The insecticidal neurotoxin Aps III is an atypical knottin peptide that potently blocks insect voltage-gated sodium channels. Biochem. Pharmacol. 85, 1542–1554 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bohlen C. J. et al., A bivalent tarantula toxin activates the capsaicin receptor, TRPV1, by targeting the outer pore domain. Cell 141, 834–845 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chassagnon I. R. et al., Potent neuroprotection after stroke afforded by a double-knot spider-venom peptide that inhibits acid-sensing ion channel 1a. Proc. Natl. Acad. Sci. U.S.A. 114, 3750–3755 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Undheim E. A. B., Mobli M., King G. F., Toxin structures as evolutionary tools: Using conserved 3D folds to study the evolution of rapidly evolving peptides. BioEssays 38, 539–548 (2016). [DOI] [PubMed] [Google Scholar]
  • 22.Rodríguez de la Vega R. C., A note on the evolution of spider toxins containing the ICK-motif. Toxin Rev. 24, 383–395 (2005). [Google Scholar]
  • 23.Escoubas P., Quinton L., Nicholson G. M., Venomics: Unravelling the complexity of animal venoms with mass spectrometry. J. Mass Spectrom. 43, 279–295 (2008). [DOI] [PubMed] [Google Scholar]
  • 24.Dutertre S. et al., Deep venomics reveals the mechanism for expanded peptide diversity in cone snail venom. Mol. Cell. Proteomics 12, 312–329 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Palagi A. et al., Unravelling the complex venom landscapes of lethal Australian funnel-web spiders (Hexathelidae: Atracinae) using LC-MALDI-TOF mass spectrometry. J. Proteomics 80, 292–310 (2013). [DOI] [PubMed] [Google Scholar]
  • 26.Escoubas P., Rash L., Tarantulas: Eight-legged pharmacists and combinatorial chemists. Toxicon 43, 555–574 (2004). [DOI] [PubMed] [Google Scholar]
  • 27.Zobel-Thropp P. A., Correa S. M., Garb J. E., Binford G. J., Spit and venom from scytodes spiders: A diverse and distinct cocktail. J. Proteome Res. 13, 817–835 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Pineda S. S. et al., Diversification of a single ancestral gene into a successful toxin superfamily in highly venomous Australian funnel-web spiders. BMC Genomics 15, 177 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Chevreux B. et al., Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 14, 1147–1159 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Conesa A. et al., Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005). [DOI] [PubMed] [Google Scholar]
  • 31.Pineda S. S. et al., ArachnoServer 3.0: An online resource for automated discovery, analysis and annotation of spider toxins. Bioinformatics 34, 1074–1076 (2018). [DOI] [PubMed] [Google Scholar]
  • 32.Li B., Dewey C. N., RSEM: Accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Fry B. G. et al., The toxicogenomic multiverse: Convergent recruitment of proteins into animal venoms. Annu. Rev. Genomics Hum. Genet. 10, 483–511 (2009). [DOI] [PubMed] [Google Scholar]
  • 34.Biner O. et al., Isolation, N-glycosylations and function of a hyaluronidase-like enzyme from the venom of the spider Cupiennius salei. PLoS One 10, e0143963 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Kini R. M., Excitement ahead: Structure, function and mechanism of snake venom phospholipase A2 enzymes. Toxicon 42, 827–840 (2003). [DOI] [PubMed] [Google Scholar]
  • 36.de Graaf D. C. et al., Insights into the venom composition of the ectoparasitoid wasp Nasonia vitripennis from bioinformatic and proteomic studies. Insect Mol. Biol. 19 (suppl. 1), 11–26 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Vincent B. et al., The venom composition of the parasitic wasp Chelonus inanitus resolved by combined expressed sequence tags analysis and proteomic approach. BMC Genomics 11, 693 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Wen S. et al., Discovery of an MIT-like atracotoxin family: Spider venom peptides that share sequence homology but not pharmacological properties with AVIT family proteins. Peptides 26, 2412–2426 (2005). [DOI] [PubMed] [Google Scholar]
  • 39.Silva P. I. Jr., Daffre S., Bulet P., Isolation and characterization of gomesin, an 18-residue cysteine-rich defense peptide from the spider Acanthoscurria gomesiana hemocytes with sequence similarities to horseshoe crab antimicrobial peptides of the tachyplesin family. J. Biol. Chem. 275, 33464–33470 (2000). [DOI] [PubMed] [Google Scholar]
  • 40.Klint J. K. et al., Production of recombinant disulfide-rich venom peptides for structural and functional analysis via expression in the periplasm of E. coli. PLoS One 8, e63865 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Mobli M., Stern A. S., Bermel W., King G. F., Hoch J. C., A non-uniformly sampled 4D HCC(CO)NH-TOCSY experiment processed using maximum entropy for rapid protein sidechain assignment. J. Magn. Reson. 204, 160–164 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Bende N. S. et al., The insecticidal spider toxin SFI1 is a knottin peptide that blocks the pore of insect voltage-gated sodium channels via a large β-hairpin loop. FEBS J. 282, 904–920 (2015). [DOI] [PubMed] [Google Scholar]
  • 43.Jin A. H. et al., Conotoxin Φ-MiXXVIIA from the superfamily G2 employs a novel cysteine framework that mimics granulin and displays anti-apoptotic activity. Angew. Chem. Int. Ed. Engl. 56, 14973–14976 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Nielsen L. D. et al., The three-dimensional structure of an H-superfamily conotoxin reveals a granulin fold arising from a common ICK cysteine framework. J. Biol. Chem. 294, 8745–8759 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Bende N. S. et al., A distinct sodium channel voltage-sensor locus determines insect selectivity of the spider toxin Dc1a. Nat. Commun. 5, 4350 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Smith J. J., Undheim E. A. B., True lies: Using proteomics to assess the accuracy of transcriptome-based venomics in centipedes uncovers false positives and reveals startling intraspecific variation in Scolopendra subspinipes. Toxins (Basel) 10, 96 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Smith J. J. et al., Unique scorpion toxin with a putative ancestral fold provides insight into evolution of the inhibitor cystine knot motif. Proc. Natl. Acad. Sci. U.S.A. 108, 10478–10483 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Madio B., Undheim E. A. B., King G. F., Revisiting venom of the sea anemone Stichodactyla haddoni: Omics techniques reveal the complete toxin arsenal of a well-studied sea anemone genus. J. Proteomics 166, 83–92 (2017). [DOI] [PubMed] [Google Scholar]
  • 49.Perdigão N. et al., Unexpected features of the dark proteome. Proc. Natl. Acad. Sci. U.S.A. 112, 15898–15903 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Dauly C., Escoubas P., King G. F., Nicholson G. M., “Characterization of spider venom peptides by high-resolution LC-MS/MS analysis” (Application Note: 511, Thermo Fisher Scientific, 2011).
  • 51.Pandey R. V., Nolte V., Schlötterer C., CANGS: A user-friendly utility for processing and analyzing 454 GS-FLX data in biodiversity studies. BMC Res. Notes 3, 3 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Petersen T. N., Brunak S., von Heijne G., Nielsen H., SignalP 4.0: Discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785–786 (2011). [DOI] [PubMed] [Google Scholar]
  • 53.Schneider T. D., Stephens R. M., Sequence logos: A new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Fu L., Niu B., Zhu Z., Wu S., Li W., CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.King G. F., Gentz M. C., Escoubas P., Nicholson G. M., A rational nomenclature for naming peptide toxins from spiders and other venomous animals. Toxicon 52, 264–276 (2008). [DOI] [PubMed] [Google Scholar]
  • 56.Goddard T. D., Kneller D. G., SPARKY (Version 3.11.5, University of California, San Francisco, 2001).
  • 57.Güntert P., Automated NMR structure calculation with CYANA. Methods Mol. Biol. 278, 353–378 (2004). [DOI] [PubMed] [Google Scholar]
  • 58.Shen Y., Delaglio F., Cornilescu G., Bax A., TALOS+: A hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J. Biomol. NMR 44, 213–223 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Chen V. B. et al., MolProbity: All-atom structure validation for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 66, 12–21 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Sali A., Blundell T. L., Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779–815 (1993). [DOI] [PubMed] [Google Scholar]
  • 61.Sharma P. P. et al., Phylogenomic interrogation of arachnida reveals systemic conflicts in phylogenetic signal. Mol. Biol. Evol. 31, 2963–2984 (2014). [DOI] [PubMed] [Google Scholar]
  • 62.Bolger A. M., Lohse M., Usadel B., Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Grabherr M. G. et al., Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Afgan E. et al., Genomics virtual laboratory: A practical bioinformatics workbench for the cloud. PLoS One 10, e0140829 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Katoh K., Standley D. M., MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Nguyen L. T., Schmidt H. A., von Haeseler A., Minh B. Q., IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Minh B. Q., Nguyen M. A. T., von Haeseler A., Ultrafast approximation for phylogenetic bootstrap. Mol. Biol. Evol. 30, 1188–1195 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Ronquist F., Huelsenbeck J. P., MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003). [DOI] [PubMed] [Google Scholar]
  • 69.Müller T., Vingron M., Modeling amino acid replacement. J. Comput. Biol. 7, 761–776 (2000). [DOI] [PubMed] [Google Scholar]
  • 70.Soubrier J. et al., The influence of rate heterogeneity among sites on the time dependence of molecular rates. Mol. Biol. Evol. 29, 3345–3358 (2012). [DOI] [PubMed] [Google Scholar]
  • 71.Deutsch E. W. et al., The ProteomeXchange consortium in 2017: Supporting the cultural change in proteomics public data deposition. Nucleic Acids Res. 45, D1100–D1106 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Perez-Riverol Y. et al., The PRIDE database and related tools and resources in 2019: Improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1914536117.sapp.pdf (12.9MB, pdf)
Supplementary File
Download video file (76.8MB, mp4)
Supplementary File
pnas.1914536117.sd01.xlsx (90.8KB, xlsx)
Supplementary File
pnas.1914536117.sd02.xlsx (21.6KB, xlsx)
Supplementary File
pnas.1914536117.sd03.txt (112.8KB, txt)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES