Targeted Sequencing of Venom Genes from Cone Snail Genomes Improves Understanding of Conotoxin Molecular Evolution

Mark A Phuong; Gusti N Mahardika

doi:10.1093/molbev/msy034

. 2018 Mar 5;35(5):1210–1224. doi: 10.1093/molbev/msy034

Targeted Sequencing of Venom Genes from Cone Snail Genomes Improves Understanding of Conotoxin Molecular Evolution

Mark A Phuong ^1,^✉, Gusti N Mahardika ²

Editor: Nicolas Vidal

PMCID: PMC5913681 PMID: 29514313

Abstract

To expand our capacity to discover venom sequences from the genomes of venomous organisms, we applied targeted sequencing techniques to selectively recover venom gene superfamilies and nontoxin loci from the genomes of 32 cone snail species (family, Conidae), a diverse group of marine gastropods that capture their prey using a cocktail of neurotoxic peptides (conotoxins). We were able to successfully recover conotoxin gene superfamilies across all species with high confidence (> 100× coverage) and used these data to provide new insights into conotoxin evolution. First, we found that conotoxin gene superfamilies are composed of one to six exons and are typically short in length (mean = ∼85 bp). Second, we expanded our understanding of the following genetic features of conotoxin evolution: 1) positive selection, where exons coding the mature toxin region were often three times more divergent than their adjacent noncoding regions, 2) expression regulation, with comparisons to transcriptome data showing that cone snails only express a fraction of the genes available in their genome (24–63%), and 3) extensive gene turnover, where Conidae species varied from 120 to 859 conotoxin gene copies. Finally, using comparative phylogenetic methods, we found that while diet specificity did not predict patterns of conotoxin evolution, dietary breadth was positively correlated with total conotoxin gene diversity. Overall, the targeted sequencing technique demonstrated here has the potential to radically increase the pace at which venom gene families are sequenced and studied, reshaping our ability to understand the impact of genetic changes on ecologically relevant phenotypes and subsequent diversification.

Keywords: Conidae, comparative phylogenetics, diversification

Introduction

Understanding the molecular basis for adaptation and speciation is a central goal in evolutionary biology. Past studies have described several genetic characteristics that seem to be associated with rapidly radiating clades or the evolution of novel phenotypes, including evidence for diversifying selection, gene gains and losses, and accelerated rates of sequence evolution (Floudas et al. 2012; Brawand et al. 2014; Guillén et al. 2014; Cornetti et al. 2015; Malmstrøm et al. 2016; Pease et al. 2016). Although large-scale comparative genomic studies have vastly increased our knowledge of the genetic changes associated with diversification, the link between genotype and ecologically relevant phenotypes frequently remains unclear. Often, the functional consequences of genetic patterns such as an excess of gene duplicates or regions under positive selection are unknown (Brawand et al. 2014; Cornetti et al. 2015; Pease et al. 2016), limiting our ability to understand how genetic changes shape the evolutionary trajectory of species.

Animal venoms provide an excellent opportunity to study the interplay between genetics and adaptation because of the relatively simple relationship between genotype, phenotype, and ecology. Venoms have evolved multiple times throughout the tree of life (e.g., spiders, snakes, and snails) and play a direct role in prey capture and survival (Barlow et al. 2009; Casewell et al. 2013). Venoms are composed of mixtures of toxic proteins and peptides that are usually encoded directly by a handful of known gene families (Kordis and Gubensek 2000; Fry et al. 2009; Casewell et al. 2013). Exceptionally high estimated rates of gene duplication and diversifying selection across these venom genes families are thought to contribute to the evolution of novel proteins and thus changes in venom composition (Duda and Palumbi 1999; Gibbs and Rossiter 2008; Chang and Duda 2012), allowing venomous taxa to specialize and adapt onto different prey species (Kohn 1959a; Daltry et al. 1996; Li et al. 2005; Barlow et al. 2009; Chang and Duda 2016; Phuong et al. 2016). Therefore, the study of venomous taxa can facilitate understanding of the genetic contributions to ecologically relevant traits and subsequent diversification.

A fundamental challenge associated with the study of venom evolution is the inability to rapidly obtain sequences from venomous multigene families. Traditionally, venom genes were sequenced through cDNA cloning techniques, which can be labor intensive and time-consuming (Gibbs and Rossiter 2008; Chang et al. 2015). Although transcriptome sequencing has greatly increased the pace of venom gene sequencing and the discovery of previously undescribed gene families (Casewell et al. 2009; Phuong et al. 2016), transcriptome sequencing still requires fresh RNA extracts from venom organs, which may be difficult to obtain for rare and/or dangerous species. Targeted sequencing approaches have vastly improved the capacity to obtain thousands of markers across populations and species for ecological and evolutionary studies (Bi et al. 2012; Faircloth et al. 2012). Until now, these approaches have not been applied to selectively sequence venomous genomic regions. This may be in part, due to the extraordinary levels of sequence divergence exhibited by venom loci (Gibbs and Rossiter 2008; Chang and Duda 2012), potentially rendering probes designed from a single sequence from one gene family unable to recover any other sequences in the same family (fig. 1). However, past studies have shown that noncoding regions (i.e., introns, untranslated regions [UTRs]) adjacent to hypervariable mature toxin exons are conserved between species (Nakashima et al. 1993, 1995; Gibbs and Rossiter 2008; Wu et al. 2013), suggesting that these conserved regions can be used for probe design to potentially recover all venom genes across clades of venomous taxa.

Fig. 1. — A superfamily conotoxins from *Conus lividus* described in the transcriptome from Phuong et al. (2016). Protein alignment generated using Geneious. Amino acids are highlighted based on disagreement to a consensus sequence generated from the alignment, not shown. Cysteines are highlighted and bolded. Signal region and mature toxin coding region are annotated based on the presence of these functional regions at a particular position in the alignment in any of the sequences.

Here, we used a targeted sequencing approach to recover venom genes and study the evolution of venom gene families across 32 species of cone snails from the family, Conidae. Cone snails are a hyper diverse group of carnivorous marine gastropods (> 700 spp.) that capture their prey using a cocktail of venomous neurotoxins (Puillandre et al. 2014). Cone snail venom precursor peptides (conotoxins) are typically composed of three regions: the signal region that directs the protein into the secretory pathway, the prepro region that is cleaved during protein maturation, and the mature region that ultimately becomes the mature peptide (Robinson and Norton 2014). In some instances, there exists a “post” region of the peptide following the mature region that is also cleaved during protein processing (Robinson and Norton 2014). Conotoxins are classified into > 40 gene superfamilies (e.g., A superfamily and O1 superfamily) based on signal sequence identity, though some gene superfamilies were identified based on domain similarities to proteins from other venomous taxa (Robinson and Norton 2014). To examine the evolution of conotoxin gene superfamilies from genomic DNA, we designed probes targeting over 800 nonconotoxin genes for phylogenetic analyses and conotoxins from 12 previously sequenced Conidae transcriptomes (Phuong et al. 2016). With the recovered conotoxin loci, we describe several features of conotoxin genes, including its genetic architecture, molecular evolution, expression patterns, and changes in gene superfamily size. Finally, we use comparative phylogenetic methods to test whether diet specificity or dietary breadth can explain patterns of gene superfamily size evolution.

Results

Exon Capture Results

We used custom designed 120-bp baits (custom MYbaits-1 kit, 20,000 bait sequences; Arbor Biosciences, Ann Arbor, MI) to selectively target phylogenetic markers and conotoxin genes from 32 Conidae species (table 1). We sequenced all samples on a single Illumina HiSeq2000 lane, producing an average of 12.8 million reads per sample (table 1). After redefining exon boundaries for the phylogenetic markers, we generated a reference that consisted of 5,883 loci. We recovered an average of 5,335 loci (90.7%) across all samples representing ∼0.66 Mb (Megabases) on an average (table 1). For the conotoxin loci, given that conotoxin introns can be several kilobases in length (Wu et al. 2013) and the average insert size of the libraries was ∼350 bp, we were only able to assemble conotoxin exon fragments (conotoxin exons with any adjacent noncoding regions). The number of sequences we assembled containing a conotoxin exon ranged from 281 fragments in Conus papilliferus to 2,278 fragments in C. atristophanes (table 1). Approximately 48.8% of the reads mapped to both the phylogenetic markers and venom genes with 52.3% of these reads being marked as duplicates (table 1). Average coverage across the phylogenetic markers was 95.9×, whereas the average coverage for the conotoxin exons was 149.6× (table 1).

Table 1.

Sample Information and Exon Capture Statistics.

Species	No. of Reads	% Reads on Target	% Duplicates	No. of Bases Recovered	No. of Exons Recovered	Exon Coverage	No. of Conotoxin Fragments Recovered	Conotoxin Coverage	Collection Source	Year Collected
magus	10,348,142	48.17	56.18	652,113	5,258	79.0	332	136.9	Australian Museum (Catalog No. C487620)	2009
textile	8,130,024	47.86	51.79	649,193	5,236	64.8	764	109.9	Australian Museum (Catalog No. C487632)	2009
lischkeanus	8,942,888	45.30	52.64	627,806	5,024	64.2	834	127.1	Australian Museum (Catalog No. C480270)	2011
aristophanes	15,276,200	52.96	53.68	655,444	5,310	92.9	2,278	159.5	Australian Museum (Catalog No. C480269)	2011
papilliferus	9,733,028	46.97	46.97	659,001	5,318	82.5	281	114.1	Australian Museum (Catalog No. C469190)	2011
californicus	11,257,992	40.29	46.23	581,181	4,659	89.2	649	167.1	Monterey Bay, California (provided by WF Gilly)	2014
virgo	8,319,348	51.90	44.74	659,419	5,312	84.9	406	151.9	Bali, Indonesia (Field collection)	2013
quercinus	12,260,196	47.95	53.40	667,897	5,409	92.6	500	154.6	Bali, Indonesia (Field collection)	2013
ebraeus	16,311,454	49.57	55.37	669,852	5,427	120.7	849	203.9	Bali, Indonesia (Field collection)	2013
flavidus	12,175,022	46.25	41.37	671,106	5,446	110.0	614	174.8	Bali, Indonesia (Field collection)	2013
lividus	13,561,122	51.65	54.67	663,079	5,365	93.5	987	159.4	Bali, Indonesia (Field collection)	2013
miles	18,577,352	50.13	49.06	680,191	5,541	175.9	594	200.5	Bali, Indonesia (Field collection)	2013
rattus	16,948,836	52.08	55.04	677,626	5,497	139.7	727	183.5	Bali, Indonesia (Field collection)	2013
distans	12,706,016	50.85	57.47	658,791	5,411	100.0	384	124.1	Bali, Indonesia (Field collection)	2013
sponsalis	11,168,242	46.70	44.51	657,141	5,313	77.8	2,033	125.4	Bali, Indonesia (Field collection)	2013
imperialis	8,838,484	43.70	50.19	652,250	5,242	69.0	397	96.8	Bali, Indonesia (Field collection)	2013
marmoreus	11,494,484	52.20	50.27	669,298	5,430	103.8	524	195.7	Bali, Indonesia (Field collection)	2013
varius	15,035,404	50.78	52.57	666,670	5,387	113.6	743	209.0	Bali, Indonesia (Field collection)	2013
musicus	12,761,042	46.59	56.15	649,666	5,245	71.9	1,970	108.7	Bali, Indonesia (Field collection)	2013
planorbis	13,623,080	44.80	58.94	658,461	5,330	76.0	899	116.7	Bali, Indonesia (Field collection)	2013
miliaris	13,039,246	52.88	55.59	640,651	5,152	67.2	2,106	142.5	Bali, Indonesia (Field collection)	2013
vexillum	12,197,882	49.41	52.89	668,942	5,417	97.6	603	132.6	Bali, Indonesia (Field collection)	2013
arenatus	14,607,148	47.23	48.61	665,490	5,397	98.3	1,466	147.4	Bali, Indonesia (Field collection)	2013
muriculatus	17,328,206	47.25	59.85	662,812	5,359	102.3	1,072	137.6	Bali, Indonesia (Field collection)	2013
chaldaeus	11,778,640	50.84	55.12	661,691	5,371	91.3	704	131.8	Bali, Indonesia (Field collection)	2013
coronatus	14,873,018	49.92	45.09	670,729	5,442	100.8	2,158	179.3	Bali, Indonesia (Field collection)	2013
moreleti	10,843,692	49.41	57.47	653,436	5,282	78.8	485	142.8	Bali, Indonesia (Field collection)	2013
anemone	10,662,156	42.78	41.61	670,558	5,440	94.2	321	130.6	Museum Victoria (Registration No. F237402)	2014
mustelinus	15,317,812	50.86	58.11	668,295	5,416	115.5	679	153.8	Australian Museum (Catalog No. C487551)	2014
litteratus	13,002,630	51.11	56.64	673,310	5,451	99.8	538	148.4	Australian Museum (Catalog No. C487581)	2014
capitaneus	16,366,304	51.30	57.35	672,762	5,461	120.4	909	151.9	Australian Museum (Catalog No. C487552)	2014
emaciatus	13,391,964	51.38	54.88	665,699	5,387	101.7	554	168.3	Australian Museum (Catalog No. C487593)	2014

Open in a new tab

We recovered representative exons from all 49 conotoxin gene superfamilies targeted, plus exons from the Q gene superfamily which we did not explicitly target (supplementary fig. S1, Supplementary Material online). Of the 49 targeted gene superfamilies, “capture success” (defined in Materials and Methods) was 80% or above for 34 gene superfamilies, even though we did not explicitly target every single transcript (supplementary table S1, Supplementary Material online). For example, we only targeted one sequence of the A gene superfamily from C. varius, but we recovered sequences that showed high identity to every single transcript from the A gene superfamily discovered in the C. varius transcriptome (supplementary table S1, Supplementary Material online). We assessed the ability of targeted sequencing to recover conotoxins from species that were not explicitly targeted in the bait sequences by calculating the number of previously sequenced conotoxins (obtained via Genbank and Conoserver; Kaas et al. 2010) recovered in our data set (supplementary table S2, Supplementary Material online). We recovered a higher percentage of previously sequenced conotoxins if the species was included in the bait design (52.35%, supplementary table S2, Supplementary Material online) compared with species not included in the bait design (39.5%, supplementary table S2, Supplementary Material online).

Conotoxin Genetic Architecture

Through analyses of conotoxin genetic structure across species, we found that the number of exons that comprise a conotoxin transcript ranged from one to six exons and exon size ranged from 5 to 444 bp, with an average length of 85.2 bp (fig. 2 and supplementary table S3, Supplementary Material online). Whether or not UTRs were adjacent to terminal exons was dependent on the gene superfamily, with some gene superfamilies always having both 5′- and 3′-UTRs adjacent to terminal exons and some where the 5′- or 3′-UTRs cannot be found directly adjacent to the terminal exons (supplementary table S3, Supplementary Material online). Regions in conotoxin transcripts identified as the signal region, the mature region, or the postregion were most often confined to a single exon (fig. 3). In contrast, the prepro region was more frequently distributed across more than one exon (fig. 3).

Fig. 2. — Exon length distribution. Exon length distribution across all conotoxin gene superfamilies sequenced in this study. Analysis only includes sequences from species that had transcriptome data available.

Fig. 3. — Histograms showing the frequency of the largest proportion of each conotoxin precursor peptide region found on a single exon in Conidae genomes. Analysis only includes sequences from species that had transcriptome data available.

Conotoxin Molecular Evolution

To determine if there are differences in divergence depending on what conotoxin precursor peptide region each exon contains, we calculated uncorrected pairwise differences to quantify the level of sequence divergence between exons and immediately adjacent noncoding regions. Exons containing the signal region were more conserved than their adjacent noncoding regions (average relative ratio < 1, supplementary table S4, Supplementary Material online, fig. 4, and supplementary fig. S2, Supplementary Material online). In contrast, all other exon classifications generally showed the opposite pattern, where the exons were typically more divergent relative to their adjacent noncoding regions (average relative ratio > 1, supplementary table S4, Supplementary Material online, fig. 4, and supplementary fig. S2, Supplementary Material online). The largest contrast in divergence between exons and adjacent noncoding regions came from exons containing the mature region, where the coding region was on an average 2.9 times more divergent than regions surrounding the exon (supplementary table S4, Supplementary Material online, fig. 4, and supplementary fig. S2, Supplementary Material online). For comparison, exons from nonconotoxin genes were more conserved than their adjacent noncoding regions (average relative ratio < 1, fig. 4).

Fig. 4. — Scatterplot of uncorrected pairwise distances for select gene superfamilies and nonconotoxin loci between exons and adjacent noncoding regions. Each point on the graph represents a unique pairwise comparison and points are highlighted by conotoxin functional region. x = y line is shown.

Conotoxin Expression

To examine expression regulation across gene superfamilies and species, we compared transcriptomes we previously sequenced (Phuong et al. 2016) to the targeted sequencing data. The proportion of conotoxin genes expressed per gene superfamily was highly variable (supplementary table S5 and fig. S3, Supplementary Material online) and the exact proportion depended on the gene superfamily and the species. In several cases, all gene copies of a gene superfamily were not expressed in the transcriptome (e.g., Conus ebraeus, A gene superfamily, 0/9 copies expressed, supplementary table S5 and fig. S3, Supplementary Material online), and in other cases, all copies were expressed in the transcriptome (e.g., C. californicus, O3 gene superfamily, 3/3 copies expressed, supplementary table S5 and fig. S3, Supplementary Material online). The average proportion of gene copies expressed per gene superfamily per species was 45% (range: 24–63%, supplementary table S5, Supplementary Material online).

Conotoxin Gene Superfamily Size Evolution

With a concatenated alignment of 4,441 exons representing 573,854 bp, we recovered a highly supported phylogeny with all but four nodes having ≤ 95% bootstrap support (fig. 5). Total conotoxin gene diversity ranged from as low as 120 in C. papilliferus to as high as 859 in C. coronatus (fig. 5). 25 gene superfamilies showed evidence of phylogenetic signal in gene superfamily size, such that closely related species tended to have similar gene superfamily sizes (supplementary table S6, Supplementary Material online). For example, a clade consisting of C. coronatus, C. aristophanes, and C. miliaris contains nearly 5 times more gene copies of the O1 superfamily than their immediate sister clade (fig. 5). CAFE v3.1 (Han et al. 2013) estimates of net gene gains and losses showed that species-specific net conotoxin expansions and contractions are scattered throughout the phylogeny (fig. 5 and supplementary fig. S4, Supplementary Material online).

Fig. 5. — Diet and conotoxin evolution in a phylogenetic context. Time-calibrated maximum likelihood phylogeny of 32 Conidae species generated from concatenated alignment of 4,441 exons. Phylogeny is rooted with *Californiconus californicus*. Branches are colored based on net gains or losses in total conotoxin diversity based on CAFE v3.1 analyses. Recognized subgenera are alternately colored pink. Total conotoxin diversity, the number of expressed precursors for species with transcriptomes, size estimates for commonly studied gene superfamilies, and dietary breadth displayed next to tip names. Recorded observations of each species preying on each of the 27 represented prey families shown in the matrix adjacent to the phylogeny, with cells colored based on whether or not a species has been observed to feed on that prey family (gray = no, blue = yes). Phylum level classifications are shown at the top of the diet matrix and family level classifications are shown at the bottom of the diet matrix.

Diet and Conotoxin Gene Superfamily Evolution

We used comparative phylogenetic methods and extensive prey information from the literature to examine the impact of diet specificity (i.e., what prey a cone snail feeds upon) and dietary breadth (i.e., how many prey species a cone snail feeds upon) on two measures of conotoxin composition: 1) gene superfamily size and 2) total conotoxin diversity. Neither diet specificity nor dietary breadth was correlated in changes with gene superfamily size (distance-based phylogenetic generalized least squares [D-PGLS], P > 0.05). Although diet specificity did not predict changes in total conotoxin diversity (PGLS, P > 0.05), we found a significant positive relationship between dietary breadth and total conotoxin diversity in both the full conotoxin data set (PGLS, P < 0.05, fig. 6) and the conotoxin data set containing gene superfamilies that had > 80% capture success (PGLS, P < 0.001).

Fig. 6. — Scatterplot of total conotoxin gene diversity and dietary breadth. Each point represents a unique species. Graphs are labeled with a regression line and Pearson’s correlation coefficient generated from a phylogenetic generalized least squares (PGLS) analysis. Asterisk denotes significant correlation from the PGLS analysis.

Discussion

Targeted Sequencing and Conotoxin Discovery

Through targeted sequencing of conotoxins in cone snails, we demonstrate the potential to rapidly obtain venom sequences at high coverage (> 100×, table 1) from species for which no venom information is available and without the need of RNA from the venom duct. This is remarkable, given that alignments in amino acid sequences between mature regions of a single gene superfamily within a single individual can be incomprehensible (fig. 1) due to the rapid evolution of the mature region (Duda and Palumbi 1999). Effective capture of conotoxin gene superfamilies was possible in part because conotoxin exons were often directly adjacent to conserved UTRs, which were targeted in the design (supplementary table S3, Supplementary Material online). Although it has been recognized for decades that cone snails collectively harbor tens of thousands of biologically relevant proteins for fields such as molecular biology and pharmacology in their venoms (Olivera and Teichert 2007; Lewis 2009), traditional techniques for conotoxin sequencing (e.g., cDNA cloning) have barely begun to uncover and characterize the full breadth of conotoxin diversity. The targeted sequencing technique presented in this study adds an additional tool to increase the speed at which conotoxins are discovered. Specifically, this technique will allow the rapid toxin sequencing of genetic samples housed in museum collections, which often contain a large proportion of the species diversity for venomous taxonomic groups that have been amassed through several decades of intensive field expeditions. In addition, targeted sequencing approaches are not limited by the venom transcripts that are expressed at any one particular time point. As demonstrated here, on an average, over half of the conotoxin genes available in the genome are not expressed in adult individuals (supplementary table S5 and fig. S3, Supplementary Material online). Therefore, sequencing conotoxins from genomic samples effectively doubles the number of conotoxin sequences that can be recovered.

Overall, the proportion of reads that mapped to our targeted sequences (mean = 48.8%, table 1) is on par with studies that employed similar techniques in Anuran frogs (mean = 60.2%; Portik et al. 2016) and Salamanders (mean = 18.21%; McCartney-Melstad et al. 2016). The proportion of reads marked as duplicates were higher than previous studies (mean = 24.5%; McCartney-Melstad et al. 2016, mean = 17.5%; Portik et al. 2016). The high duplication levels in our data set may have been a function of our small target size (∼0.8 Mb) or a sign that we overamplified our posthybridization product. To reduce the duplication levels in the future, we may reduce the number of posthybridization PCR cycles. Given the high coverage on both the phylogenetic markers and conotoxin sequences (average cov > 95×, table 1), future sequencing experiments should be able to multiplex more than 32 samples on a single lane.

Although most of the gene superfamilies had high capture success, some gene superfamilies performed poorly (supplementary table S1, Supplementary Material online). Variation in overall capture success can be attributed to several factors: first, a lack of diversity in bait sequences for a particular gene superfamily may have impeded effective capture. For example, we only had bait sequences designed from two species for the divMKFPLLFISL gene superfamily and we were unable to recover full sequences from any of the other species (supplementary table S1, Supplementary Material online). Second, the genetic organization of gene superfamilies my hinder capture success. For example, the mature toxin exon for the T gene superfamily is not readily recoverable because it is not adjacent to a conserved UTR that is discoverable through transcriptome sequencing (supplementary tables S1 and S3 and fig. S1, Supplementary Material online). Finally, conotoxin sequence properties may hinder capture, as it has been documented that high or low GC content values can depress capture efficiency statistics (Gnirke et al. 2009). To increase capture success of gene superfamilies in the future, we recommend including a large diversity of sequences from several species for gene superfamilies that had low capture success. In addition, bait sequences should be redesigned for gene superfamilies in which the prepro region or the mature region were not immediately adjacent to the conserved UTRs. We recovered intron sequences in this study that can be used in future bait designs to effectively recover the entire coding region because adjacent noncoding regions are often evolving at a much slower rate than the coding region containing the prepro or mature region (supplementary table S4, Supplementary Material online, fig. 4, and supplementary fig. S2, Supplementary Material online).

Although targeted sequencing can increase the speed at which conotoxins are sequenced, we note several limitations with this approach. First, venom sequencing is limited to the sequences used in the bait design—only known venom gene families can be recovered. Therefore, approaches such as RNAseq are still necessary to identify undiscovered venom gene families. However, broad-scale discovery of venom gene superfamilies through transcriptome sequencing can be performed prior to applying target capture techniques (as done in this study) to obtain target sequences for most of the major venom components. Second, for some gene families, annotation of the exact mature toxin coding region may be difficult if the mature toxin is separated across multiple exons and if there is no closely related reference to accurately define the mature protein. Thus, expressed data are still necessary in some cases to study the mature toxin. Finally, depending on the level of sample multiplexing, targeted sequencing approaches can be more expensive than using RNAseq for venom discovery ($230.65 per RNAseq sample vs. $285.62 per targeted sequencing sample, supplementary table S7, Supplementary Material online). Therefore, the choice between two sequencing strategies will depend on the overall goal of the research project, as no one next-generation sequencing method is suited for all research applications (Jones and Good 2016).

When compared with conotoxin sequences available on Genbank and ConoServer, we found that we were able to recover a larger proportion of previously sequenced conotoxins if species were explicitly targeted with the baits (supplementary table S2, Supplementary Material online). Although this comparison is biased by unequal conotoxin discovery effort across species, nearly half of previously sequenced conotoxins were not recovered in this study. We performed a coarse investigation of database conotoxins and determined potential reasons for why we were not able to recover certain previously sequenced conotoxins. These reasons include: 1) the species in the database was misidentified, which was extensively documented in (Phuong et al. 2016), 2) the database conotoxin had no reliable reference in the literature, 3) the conotoxin was present in our species, but we could not recover it with the current bait design or the conotoxin was filtered during the bioinformatics processing of the data, and 4) the conotoxin was recoverable (i.e., high sequence similarity to bait sequences designed in this study), but the gene was not present in the genome. Future work integrating both population level RNAseq and targeted sequencing data may account for the large proportion of unrecovered conotoxins.