Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2015 Jul 6;112(29):E3782–E3791. doi: 10.1073/pnas.1501334112

Optimized deep-targeted proteotranscriptomic profiling reveals unexplored Conus toxin diversity and novel cysteine frameworks

Vincent Lavergne a, Ivon Harliwong b, Alun Jones a, David Miller b, Ryan J Taft b, Paul F Alewood a,1
PMCID: PMC4517256  PMID: 26150494

Significance

Venomous marine cone snails have evolved complex mixtures of fast-acting paralytic cysteine-rich peptides for prey capture and defense able to modulate specific heterologous membrane receptors, ion channels, or transporters. In contrast to earlier studies in which the richness and sequence hypervariability of lowly expressed toxins were overlooked, we now describe a comprehensive deep-targeted proteotranscriptomic approach that provides, to our knowledge, the first high-definition snapshot of the toxin arsenal of a venomous animal, Conus episcopatus. The thousands of newly identified conotoxins include peptides with cysteine motifs present in FDA-approved molecules or currently undergoing clinical trials. Further highlights include novel cysteine scaffolds likely to unveil unique protein structure and pharmacology, as well as a new category of conotoxins with odd numbers of cysteine residues.

Keywords: cysteine-rich peptides, conotoxin, transcriptomic, proteomic, bioinformatic

Abstract

Cone snails are predatory marine gastropods characterized by a sophisticated venom apparatus responsible for the biosynthesis and delivery of complex mixtures of cysteine-rich toxin peptides. These conotoxins fold into small highly structured frameworks, allowing them to potently and selectively interact with heterologous ion channels and receptors. Approximately 2,000 toxins from an estimated number of >70,000 bioactive peptides have been identified in the genus Conus to date. Here, we describe a high-resolution interrogation of the transcriptomes (available at www.ddbj.nig.ac.jp) and proteomes of the diverse compartments of the Conus episcopatus venom apparatus. Using biochemical and bioinformatic tools, we found the highest number of conopeptides yet discovered in a single Conus specimen, with 3,305 novel precursor toxin sequences classified into 9 known superfamilies (A, I1, I2, M, O1, O2, S, T, Z), and identified 16 new superfamilies showing unique signal peptide signatures. We were also able to depict the largest population of venom peptides containing the pharmacologically active C-C-CC-C-C inhibitor cystine knot and CC-C-C motifs (168 and 44 toxins, respectively), as well as 208 new conotoxins displaying odd numbers of cysteine residues derived from known conotoxin motifs. Importantly, six novel cysteine-rich frameworks were revealed which may have novel pharmacology. Finally, analyses of codon usage bias and RNA-editing processes of the conotoxin transcripts demonstrate a specific conservation of the cysteine skeleton at the nucleic acid level and provide new insights about the origin of sequence hypervariablity in mature toxin regions.


Cone snails are venomous marine gastropod molluscs from the genus Conus (family Conidae), with 706 valid species currently recognized (on April 29, 2015) in the World Register of Marine Species (1). Over the last ∼30 million years, these species have evolved sophisticated predatory and defense strategies, with the elaboration of a highly organized envenomation machinery (2). Their venom apparatus is responsible for the biosynthesis and maturation of short peptide neurotoxins called conotoxins (occasionally referred to as conopeptides) that, once injected in the prey or predator (fish, molluscs, or worms), act as fast-acting paralytics. When the cone snail senses waterborne chemical signals via a specialized chemoreceptory organ (called a siphon or osphradium), searching behavior begins with the release and extension of the proboscis where, in its lumen, a single dart-like radula tooth loaded from the radular sac (RS) is tightly held by circular muscles and filled with venom (Fig. 1 A and B) (35). When the tip of the proboscis comes in contact with the target, the radula is rapidly propelled into the prey and acts like a hypodermic needle to inject the venom (6). This radula tooth then serves as a harpoon to bring the captured prey back to the mouth of the snail (Fig. 1C). The biochemical and cellular mechanisms of toxin synthesis, including their processing and packaging in secretory granules, are poorly described. Nevertheless, epithelial cells bordering the venom duct (VD) are most likely the site of conotoxin production, which may then be released into the duct’s lumen through a holocrine secretion process (Fig. 1B) (7). The muscular venom bulb triggers burst contractions for the circulation of the venom inside the duct up to the pharynx, where conotoxins may undergo sorting and maturation (8). In addition, it has been suggested that certain conotoxins could, to a much lesser extent, be specifically expressed by the salivary gland (SG) (9).

Fig. 1.

Fig. 1.

Macroscopic anatomy of a cone snail (A), its venom apparatus (B), and a radula tooth (C).

In compensation for their limited mobility, cone snails have developed a vast library of structurally diverse bioactive peptides for prey capture and defense (10). As a result of speciation, a high rate of hypermutations, and a remarkable number of posttranslational modifications, little overlap of conopeptides between Conus species has been observed (11, 12), which has led to an estimation of >70,000 pharmacologically active conopeptides although fewer than 1% have been characterized to date (13). The precursor form of conotoxins is composed of three distinct regions: a highly conserved N-terminal endoplasmic reticulum (ER) signal region (used to classify the toxins into gene superfamilies), a central proregion, and a hypervariable mature region, typically between 10 and 35 amino acids long, characterized by conserved cysteine patterns and connectivities (1416). Mature conotoxins are able to selectively modulate specific subtypes of voltage- or ligand-gated transporters, receptors, and ion channels, expressed in organisms broadly distributed along the phylogenetic spectrum (10), and are thus considered a rich source of molecular templates with diagnostic and therapeutic interests for the management of human neuropathic pain, epilepsy, cardiac infarction, and neurological diseases (10). As described in Table 1, 37 conotoxin cysteine patterns have been reported to date (8 of which have known disulfide bond connectivity) (15). Although cysteine bridges always improve toxin stability and provide resistance to enzymatic degradation, some cysteine frameworks combined to particular loop lengths are more pharmacologically relevant. For instance, ω-conotoxin MVIIA [C(6)C(6)CC(3)C(4)C] (the only FDA-approved venom-derived synthetic peptide, marketed under the name Prialt) (53) and ω-conotoxin CVID [C(6)C(6)CC(3)C(6)C] (phase II clinical trials) (54) both contain the inhibitor cystine knot (ICK) motif where cysteine residues are disposed in a C-C-CC-C-C pattern with a I–IV, II–V, III–VI connectivity. Also, conotoxins such as χ-conotoxin MrIA [(3)CC(4)C(2)C; I–III, II–IV; phase II] are important drug leads (55). Despite a weak correlation between gene superfamilies and pharmacological properties, some functional redundancy among members of a same superfamily exists (56). To date, 16 empirical gene superfamilies (designated as A, D, I1, I2, I3, J, L, M, O1, O2, O3, P, S, T, V, Y) have been annotated (57), plus 31 novel superfamilies have been discovered during the past two years (38, 39, 46, 5760).

Table 1.

Cysteine frameworks of mature conotoxins

Name Cysteine pattern Cysteine connectivity Refs.
I CC-C-C I-III, II-IV (17)
II CCC-C-C-C (18)
III CC-C-C-CC (19)
IV CC-C-C-C-C I-V, II-III, IV-VI (20)
V CC-CC I-III, II-IV (21)
VI/VII C-C-CC-C-C I-IV, II-V, III-VI (22)
VIII C-C-C-C-C-C-C-C-C-C (23)
IX C-C-C-C-C-C I-IV, II-V, III-VI (24)
X CC-C.[PO]C I-IV, II-III (25)
XI C-C-CC-CC-C-C I-IV, II-VI, III-VII, V-VIII (26)
XII C-C-C-C-CC-C-C (27)
XIII C-C-C-CC-C-C-C (28)
XIV C-C-C-C I-III, II-IV (29)
XV C-C-CC-C-C-C-C (30)
XVI C-C-CC (31)
XVII C-C-CC-C-CC-C (32)
XVIII C-C-CC-CC (33)
XIX C-C-C-CCC-C-C-C-C (34)
XX C-CC-C-CC-C-C-C-C (35)
XXI CC-C-C-C-CC-C-C-C (36)
XXII C-C-C-C-C-C-C-C (37)
XXIII C-C-C-CC-C (38)
XXIV C-CC-C (39)
XXV C-C-C-C-CC (40)
XXVI C-C-C-C-CC-CC (41)
C-CC-C-C-C (42)
C-C-C-C-C-CC-C (43)
C-C-CCC-C-C-C (44)
CCC-C-CC-C-C (45)
C-C-C-CCC-C-C (46)
CC-C-C-C-CC-C (47)
CC-C-C-CC-C-C (48)
C-C-C-CC-C-C-C-C-C *
C-C-C-C-C-C-C-C-C-C-C-C (49)
C-C-C-C-C-C-C-C-C-CC-C (50)
CC-CC-C-C-C-CC-C-C-C-C (51)
C-C-C-CC-C-C-C-C-C-CC-C-C (52)

The name, pattern, and connectivity of cysteine frameworks (“—” when unknown) are reported.

*GenBank accession no. HM003926.

Here we describe a deep-targeted pipeline used to analyze the transcriptomes and proteomes of the three main venom apparatus compartments (VD, RS, and SG) of the Bishop’s molluscivorous Conus episcopatus. A comprehensive investigation of the cysteine patterns of several thousands of newly identified conotoxin sequences, classified into known and novel gene superfamilies, led to the characterization of numerous peptides containing the ICK and CC-C-C motifs, as well as six novel cysteine scaffolds. We also bring additional insights to explain the hypervariability of mature conotoxin sequences by showing the existence of a specific codon usage bias at the gene level.

Results

RNA Preparation and cDNA Library Sequencing.

The lysis of the venom duct, radular sac, and salivary gland of a single C. episcopatus specimen provided 401 ng/μL, 314 ng/μL, and 73 ng/μL of total RNA, respectively. The initial qualitative controls of these samples revealed a lack of ribosomal 28S peak along with a strong and sharp 18S band, suggesting that RNA integrity was suitable for library preparation. Lack of 28S rRNA was originally called “the hidden break” by H. Ishikawa (61) and has since been observed in the sea slugs Aplysia (62), insects (63), or nematode parasites (64). To our knowledge, this is the first time the existence of a hidden break has been reported in cone snails. After mRNA isolation and generation of cDNA libraries, we obtained inserts at concentrations of 12,461.6 pM (average of 445 bp; VD), 2,119.9 pM (462 bp; RS), and 803.6 pM (431 bp; SG). Next-generation paired-end sequencing gave rise to average numbers of reads of 20,885,730 (VD), 29,187,419 (RS), and 31,725,853 (SG) (Table 2) (read datasets are freely available at www.ddbj.nig.ac.jp). Filtering of sequences showing an average Phred+33 quality score of >30, and merging of paired-end reads led to a decrease in number of 15.45%, 21.94%, and 36.94% for VD, RS, and SG, respectively.

Table 2.

Description of the raw, merged, and assembled reads from the venom duct, radular sac, and salivary gland

Sample
Read R1 Read R2 Merged R1/R2 Unmerged R1 Unmerged R2 CLC contigs SOAP contigs Oases contigs Trinity contigs
Venom duct
 Total Sequences 20,890,920 20,880,539 17,659,352 2,864,734 511,428 132,719 46,926 30,354 114,771
 Length Interval 35–301 35–301 35–592 25–301 37–301 100–7,392 102–5,385 101–29,853 101–5,747
 Avg Length 209 211 215 256 281 341 408 761 370
 N50 244 251 230 289 301 418 424 1,038 478
 N90 301 301 438 301 301 1,379 851 3,060 1,466
 %GC 37.44 38.03 37.60 37.68 36.99 38.89 38.41 38.63 38.96
 %N 0 0 0 0 0 0 0 0.53 0
 Reads to contigs mapping 90.83% 76.28% 69.61% 87.35%
 Known toxins 6 6 2 1 2 2 1
 New toxins (score 3) 2,061 2,629 1,117 28 20 16 9
 New toxins (score 2) 822 1,323 158 6 1 3 2
Radular sac
 Total sequences 29,442,459 28,932,379 22,782,581 6,166,372 1,324,521 138,284 150,900 269,328 129,509
 Length interval 35–301 35–301 35–592 35–301 35–301 100–5,386 100–5,385 100–21,487 101–5,386
 Avg length 206 209 204 262 288 258 225 504 267
 N50 248 293 215 286 301 301 246 830 315
 N90 301 301 441 301 301 979 680 2,091 1,046
 %GC 32.90 33.47 32.60 33.11 33.95 36.67 36.69 34.27 36.92
 %N 0 0 0 0 0 0 0 0 0
 Reads to contigs mapping 91.93% 89.88% 45.47% 96.48%
 Known toxins 1 1 0 1 1 1 1
 New toxins (score 3) 10 13 6 8 3 7 3
 New toxins (score 2) 1 1 0 2 2 2 0
Salivary gland
 Total sequences 32,701,009 30,750,697 20,005,743 2,454,298 2,174,305 147,817 166,152 276,069 141,353
 Length interval 35–301 35–301 35–592 51–301 45–301 100–5,386 100–3,646 100–28,906 101–6,374
 Avg length 200 205 164 299 299 249 208 472 260
 N50 239 300 165 300 301 305 215 755 325
 N90 301 301 358 301 301 928 668 2,044 1,071
 %GC 33.08 34.58 32.84 33.33 34.21 36.30 36.40 34.79 36.50
 %N 0 0 0 0 0 0 0 0 0
 Reads to contigs mapping 90.70% 87.84% 40.61% 95.71%
 Known toxins 1 1 0 1 0 1 0
 New toxins (score 3) 0 0 0 3 2 3 4
 New toxins (score 2) 0 0 0 0 0 0 0

The length interval, average length, N50 and N90 values, and %GC, as well as %N, of the different sets of sequences, are reported in the table. The overall rates of merged reads aligned back to the contigs have been calculated after using four different de novo assemblers for each compartment of the venom apparatus. The number of sequences annotated as known and new conotoxins (≥40 amino acids long; containing ≥60% hydrophobic residues in their N-ter region) and classified into Conus gene superfamilies [based on conopeptide sequence signatures in both signal/pro/mature regions (score 3) or pro/mature regions only (score 2)] are also listed (calculations not performed on read datasets are represented by “—”).

Tissue-specific sets of concatenated merged and unmerged reads were independently submitted to four de novo assemblers that produced contigs with consistent length ranges and read-vs.-contig mapping rates (Table 2). However, the number of conopeptides identified from the contigs remained very low compared with the direct analysis of the reads (24 score-3 and 6 score-2 contigs displaying signal and proregion cleavage sites were annotated as conotoxins). Also, when using a different assembly approach by pooling together all of the merged and unmerged reads from the three tissues, then by mapping back each tissue-specific set of reads to the contigs generated, fewer toxins were detected (2 score-3 and 3 score-2 toxins previously found with the first strategy). Moreover, their tissue origin couldn’t be retrieved precisely because the number of conotoxins identified tended to be uniformly distributed across the three different compartments.

Protein Fractionation.

To confirm the presence at the protein level of conotoxin sequences identified in the transcriptomes, we investigated in parallel the proteomes of the venom duct, radular sac and salivary gland. Total protein samples were fractionated by HPLC and 1D-, and 2D-PAGE, giving rise to a total of 300 fractions that have been analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) (Fig. 2). Reversed-phase HPLC revealed a complex protein mixture in the venom gland of C. episcopatus, compared with the radular sac and salivary gland samples (Fig. 2 AD). From a quantitative point of view, the VD sample contains mainly small proteins and peptides (<28 kDa) (Fig. 2 E and F) whereas the major RS and SG components have masses of >28 kDa.

Fig. 2.

Fig. 2.

Fractionation methods used to purify protein samples. Reverse-phase HPLC traces (214 nm) of VD [<30 kDa (A); >30 kDa (B)], as well as RS (C) and SG (D) protein extracts are shown. Raw and reduced protein extracts have also been separated by 1D-PAGE and revealed with Coomassie blue (E). Total VD proteins have been separated by 2D-PAGE and revealed with Coomassie blue (F). The fraction (HPLC) and spot (gels) numbers that refer to MS fragments (Dataset S1) are also mentioned.

New Precursor Conopeptides.

ConoSorter was able to identify two of the four full-length conopeptide precursors currently known in C. episcopatus [EpI, patent US 6797808/GenBank accession no. AR584835; and Ep11.1 (65) precursors]. The program also identified Pn10.1 and TxMMSK-02 precursors previously isolated from the related molluscivorous species Conus pennaceus and Conus textile, respectively (66). We were also able to detect 132 novel precursor forms of known mature toxin regions. Indeed, 84 new precursor sequences from Conus magnificus μO-MfVIA (67), 26 from C. episcopatus Ep6.1 (patent US 20020173449), 10 from C. pennaceus Pn5.1 precursor conotoxin (66), 7 from Conus omaria Om6.5 toxin (also called PnVIB in C. pennaceus) (68), and 5 from C. pennaceus PnMRCL-012 (66) mature conotoxin regions have been deciphered (Fig. S1 and Dataset S1A).

Fig. S1.

Fig. S1.

New precursor forms of known mature conotoxins. Alignments of known (framed) and novel sequences are shown in the figure (low to high amino acid conservation are represented from light to darker colors; names with a “-2” suffix designate novel toxins containing an unknown signal peptide along with known cone snail sequence signatures in their proregion and mature regions).

In addition, ConoSorter annotated at the highest score (i.e., the signal, pro, and mature regions simultaneously) 3,303 (99.19%) novel precursor conopeptide sequences in VD, as well as 22 (0.66%) in RS and 5 (0.15%) in SG (≥40 amino acids in length, with ≥60% hydrophobic residues in their N-terminal region and containing a signal/proregion cleavage site) that were retained for further analysis (nucleotide and amino acid sequences are available in the DNA Data Bank of Japan and UniProtKB, respectively). A majority of VD conopeptides belong to the T (2,356 toxins; 76.78%), O1 (333; 10.08%), and S (199; 6.02%) superfamilies (Fig. 3). We observed that all of the RS and SG precursor conopeptides were also found in the VD, except for 2 lowly expressed RS-specific sequences (Ep21rs is 96.88% identical to Ep3057; Ep22rs shares 98.44% identity with Ep1412) (Table S1). Also, we noticed that these VD conotoxins belonged to the top 0.70% of the most expressed toxin transcripts.

Fig. 3.

Fig. 3.

Number of new precursor conopeptides per gene superfamily and compartment. These toxin precursors all contain a signal peptide and have been classified with ConoSorter at the highest score (matching Conus signal, pro, and mature signatures without conflicts).

Table S1.

Precursor conopeptides expressed in VD, RS, and SG

graphic file with name pnas.1501334112sfx01.jpg

Their names, sequences (signal peptide in red, and mature toxin in blue with cysteines in bold), frequency in their respective dataset, superfamily and cysteine pattern (known cone snail cysteine framework between parentheses) are also listed. The bottom part of the table shows the two precursor conotoxins Ep21rs and Ep22rs found only in RS, along with the most identical VD conopeptides.

A detailed examination of the similarity between these new VD precursor sequences revealed 401 “parent” toxins (defined as the longest protein present in a cluster of similar sequences) with a ratio of 1 parent for 7.24 “variant” sequences, when a minimum identity threshold between parent/variant of 96% was applied (Fig. S2). Among multiple cutoffs tested (from 93% to 100%), only this optimal identity limit of 96% produced clusters of sequences all reflecting the characteristics of precursor conotoxins. Indeed, every single cluster contained conotoxins belonging to the same superfamily (as opposed to clusters created at identity thresholds of <96%) and sharing a moderate number of amino acid substitutions in their proregions, as well as high rates of sequence variability in their mature regions (at thresholds of >96% identity, the majority of clusters contained sequences with identical proregion and/or mature regions due to low average numbers of variant per parent toxin: ≤2.74). Moreover, 40 (9.98%) of these parent conotoxin transcripts were retrieved in the VD proteome (at a confidence ≥99%) (Dataset S1B).

Fig. S2.

Fig. S2.

Average number of parent and variant precursors (n = 3,303) as a function of their similarity rate in the venom gland. The line plot in red shows the variation of the number of parent conopeptide precursors when different similarity thresholds were applied. The bar chart in green represents the average number of variants per parent sequence (and the corresponding SEM represented by gray vertical bars) for different similarity thresholds.

Cysteine Motifs in Mature Conotoxins.

We identified 1,448 unique cysteine-rich mature toxins (with ≥4 cysteines), among which 1,240 (85.64%) contained an even number of cysteine residues (4 cysteines, 881 sequences; 6 cysteines, 197 sequences; 8 cysteines, 95 sequences; 10 cysteines, 67 sequences) and 208 (14.36%) contained an odd number of cysteine residues (5:145; 7:35; 9:27; and 11:1). Among toxins with an even number of cysteines (104 retrieved at protein level) (Dataset S1C), 9 cone snail cysteine frameworks were represented (Fig. 4A): 44 mature toxins with framework I (CC-C-C), 834 with framework V (CC-CC), 3 with framework XIV (C-C-C-C), 6 with framework III (CC-C-C-CC), 168 with framework VI/VII (C-C-CC-C-C), 6 with framework IX (C-C-C-C-C-C), 77 with framework XI (C-C-CC-CC-C-C), 15 with framework XXII (C-C-C-C-C-C-C-C), and 66 with framework VIII (C-C-C-C-C-C-C-C-C-C).

Fig. 4.

Fig. 4.

Distribution of mature toxins per superfamily and known cysteine pattern (A). (Bottom) The number of mature toxins containing cysteine framework VI/VII (C-C-CC-C-C) and I (CC-C-C) classified per loop formula in (B) and (C), respectively.

We then focused specifically on mature toxins containing the VI/VII pattern, which, when complemented by a I–IV, II–V, III–VI cysteine connectivity, forms the well-known ICK fold present in numerous drug leads, including the FDA-approved Ziconotide (53). A total of 166 (98.81%) mature peptides belong to the O1 superfamily, compared with 2 sequences (1.19%) classified in the O2 superfamily [a total of 563 mature conopeptides with framework VI/VII are currently known, among which the majority belong to the O1 (68.56%), O2 (10.66%), and O3 (5.68%) superfamilies]. All these new toxins share the general loop formula (0–16)C(6)C(5–9)CC(2–4)C(3–4)C(0–43) (Fig. 4B, Fig. S3, and Dataset S1C). Interestingly, we observed that several of these new toxins also share high similarity rates with the following: C. magnificus MfVIA μO-conotoxin, a modulator of the pain target Nav 1.8 voltage-gated sodium channels (67) (97% identity with Ep298, Ep299, Ep301, Ep311, Ep323, Ep325, and Ep615); C. pennaceus PnVIB ω-conotoxin, which is able to block dihydropyridine-insensitive high voltage-activated calcium channels (68) (97% identity with Ep584, Ep587, and Ep589); and Conus amadis Am2766 δ-conotoxin, targeting the rNav 1.2a sodium channel (69) (59% with Ep386).

Fig. S3.

Fig. S3.

New mature conopeptides containing the cysteine pattern VI/VII (C-C-CC-C-C). Sequences are grouped according to their internal loop formula and were aligned using the BLOSUM62 cost matrix. Conserved cysteine amino acids are framed in red.

We also investigated mature conotoxins containing the cysteine framework I, which is found in the norepinephrine transporter inhibitor Conus marmoreus χ-conotoxin MrIA [(3)CC(4)C(2)C (phase IIb clinical trial)] (70). Among them, 30 (68.18%) and 14 (31.81%) belong to the A and T superfamilies, respectively [309 mature conotoxins with framework I are known to date, the most populated groups being the A (54.05%), M (3.24%), and T (1.29%) superfamilies]. The general loop formula for these toxins is (1–24)CC(4–5)C(2–13)C(0–35) (Fig. 4C, Fig. S4, and Dataset S1C). We also observed that the highest similarity rate (73%) is shared between the sequence Ep3055 and the patented TxId conotoxin isolated from C. textile (71).

Fig. S4.

Fig. S4.

New mature conopeptides containing the cysteine pattern I (CC-C-C). Sequences are grouped according to their internal loop formula and were aligned using the BLOSUM62 cost matrix. Conserved cysteine amino acids are framed in red.

In addition, we found 6 new cone snail cysteine motifs in 21 lowly expressed mature conotoxins [7 (33.33%) have been identified by MS) (Table 3 and Dataset S1D). After scanning the UniProtKB/Swiss-Prot database (541,954 entries), we observed that 4 of these patterns (CC-C-CC-C, CC-CC-C-C, CC-CC-CC, and CC-CC-C-C-C-C) are present in 558 mature proteins isolated from 292 different species. A fifth motif (CC-C-C-C-C-C-C) has been found in 19 snake heteromeric C-type lectins. Intriguingly, we also identified a novel 10-cysteine pattern that has never been observed in any organism: CC-CC-CC-CC-C-C.

Table 3.

New precursor conopeptide sequences with mature regions containing new cone snail cysteine patterns

Cysteine pattern Name Precursor sequence Frequency Superfamily
Six cysteines
 CC-C-CC-C Arylsulfatase A (component C) (H. sapiens - P15289) (I–V, II–VI, III–IV) MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFTDFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGMAGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIPLLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAERSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRCGKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSPLLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSLTAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARGEDPALQICCHPGCTPRPACCHCPDPHA
Ep214 MMSKLGVLLTICLLLFSLTAVPLDGDQHADQPAERLQGDILSEKHPLFNPVKRCCPAAACAMGCCPFICGTV 1 M
 CC-CC-C-C Heat-stable enterotoxin ST-IA/ST-P (E. coli - P01559) (I–IV, II–V, III–VI) MKKLMLAIFISVLSFPSFSQSTESLDSSKEKITLETKKCDVVKNNSEKKSENMNNTFYCCELCCNPACAGCY
Ep1802 MRCLPVFVILLLLIASAPSVDARPKTKDDIPQASFQDNAKRILQVLKSKRNCCRLQVCCGLQAAVSLSFHLWNCMIKQLKCHRNFSVDKHYDHVASNYIIWTF 1 T
Ep2291 MRCLPVFVILLLLIASTPNVDARPKTKDDMPLASFHDDAKRILQILQDRNGCCIAGDCCGGSEIKENEFGCKPCKLSLDVKFGKQTVPFARVRRISNGR 1 T
Ep1646 MRCLPVFVILLLLIASAPSVDARPKTKDDIPQASFQDNAKRILQVLESKRNCCRLQVCCGFHLWNCMIKQLKCHRNFSVDKHYDHVASNYIIWTF 1 T
Ep2642 MRCLPVFVILLLLIASTPSVDALLKTKDDMPLASFRDDVKRTLQTLLNKRFCCPYFECCKLLDERLKTCICVWLYTGIPDNRKTGDPFQT 1 T
Ep2653 MRCLPVFVILLLLIASTPSVDALLKTKDDMPLASFRDDVKRTLQTLLNKRFCCPYFECCVVGGDQLCYRGLIKCIMNK 1 T
Ep2036 MRCLPVFVILLLLIASAPSVDVRPKAKDDMPLASFHDNPLQIRLVDTSCCPSQPCCRFGYREMTLDETPTKCPCMYT 1 T
Ep1629 MRCLPVFVILLLLIASAPSVDARPKTKDDIPQASFQDNAKRILQVLESKRNCCRLQVCCGCFEVKENVRTDFC 1 T
Ep1609 MRCLPVFVILLLLIASAPSVDARPKTKDDIPQASFQDNAKRILQVLESKRNCCRLQVCCGCFEIKENVHADCG 1 T
Ep1092 MRCLPVFIVLLLLIASAPCLDALPKTEGDVPLSSFHDNLKRTRRTHLNIRECCPDGWCCPAGCPTEKVQLCS 1 T
Ep1100 MRCLPVFIVLLLLIASAPCLDALPKTEGDVPLSSFHDNLKRTRRTHLNIRECCSDGRCCPAGCSTENVHLCP 1 T
Ep1109 MRCLPVFIVLLLLIASAPCLDALPKTEGDVPLSSFHDNLKRTRRTHLNIRECCSDGWCCPAGCLTENVHLCP 1 T
Ep1111 MRCLPVFIVLLLLIASAPCLDALPKTEGDVPLSSFHDNLKRTRRTHLNIRECCSDGWCCPAGCSTENEHLCP 1 T
Ep1112 MRCLPVFIVLLLLIASAPCLDALPKTEGDVPLSSFHDNLKRTRRTHLNIRECCSDGWCCPAGCSTENVHLCP 2 T
Ep1110 MRCLPVFIVLLLLIASAPCLDALPKTEGDVPLSSFHDNLKRTRRTHLNIRECCSDGWCCPAGCSTEHVHVCP 1 T
 CC-CC-CC Protein YqcK (B. subtilis - P45945) MKYVHVGVNVVSLEKSINFYEKVFGVKAVKVKTDYAKFLLETPGLNFTLNVADEVKGNQVNHFGFQVDSLEEVLKHKKRLEKEGFFAREEMDTTCCYAVQDKFWITDPDGNEWEFFYTKSNSEVQKQDSSSCCVTPPSDITTNSCC
Ep1587 MRCLPVFVILLLLIASAPSVDARPKTKDDIPQASFQDNAKRILQVLESKRNCCRLQALASFHDNPLQIRLVDTRCCPSQPCCRFG 1 T
Ep2695 MRCLPVFVILLLLIASTPSVDALLKTKDDMPLASFRDDVKRTLQTLLNKRFCCQYFDAKRALQTLMDIRECCMGTPGCCPWG 1 T
Eight cysteines
 CC-C-C-C-C-C-C Snaclec 4 (C-type lectin-like 4) (D. siamensis - Q4PRC9) (II–III, IV–VIII, V–inter, VI–VII) MGRFISISFGLLVVFLSLSGTEAAFCCPSGWSAYDQNCYKVFTEEMNWADAEKFCTEQKKGSHLVSLHSREEEKFVVNLISENLEYPATWIGLGNMWKDCRMEWSDRGNVKYKALAEESYCLIMITHEKVWKSMTCNFIAPVVCKF
Ep1738 MRCLPVFVILLLLIASAPSVDARPKTKDDIPQASFQDNAKRILQVLESKRNCCRLWLRPLQTVPGCEIWKADCSFRTCSWNFEWSLTTRCHLQATISLSFHLWNCMIKQLKCHRHY 1 T
Ep2668 MRCLPVFVILLLLIASTPSVDALLKTKDDMPLASFRDDVKRTLQTLLNKRFCCPYFECWKADCSFRTCSWNFEWSLTTRCHLQATISLSFHLWNCMIKQLKCHRHY 1 T
 CC-CC-C-C-C-C SPRY domain-containing protein 7 (H. sapiens - Q5W111) MATSVLCCLRCCRDGGTGHIPLKEMPAVQLDTQHMGTDVVIVKNGRRICGTGGCLASAPLHQNKSYFEFKIQSTGIWGIGVATQKVNLNQIPLGRDMHSLVMRNDGALYHNNEEKNRLPANSLPQEGDVVGITYDHVELNVYLNGKNMHCPASGIRGTVYPVVYVDDSAILDCQFSEFYHTPPPGFEKILFEQQIF
Ep1702 MRCLPVFVILLLLIASAPSVDARPKTKDDIPQASFQDNAKRILQVLESKRNCCRLQVCCGSTQNWYGPGESDCLIKTKHCDGHHSVLTQCDFCPVL 1 T
Ten cysteines
 CC-CC-CC-CC-C-C Ep1647 MRCLPVFVILLLLIASAPSVDARPKTKDDIPQASFQDNAKRILQVLESKRNCCRLQVCCGFQGNRLCCVSLPTHTQTVHFCCILNTTCFTTCDSSQ 1 T

Cysteine patterns found in proteins from non-Conus organisms (name, species, UniProtKB accession number and cysteine connectivity in bold) are illustrated with a representative sequence (signal peptide in italic; mature region in bold; data not applicable to the reference proteins are represented by “—”).

Although protein sequences containing an odd number of cysteines are usually considered as forming only dimeric structures through the formation of an interchain disulfide bridge, it has been observed that the unpaired cysteine residue can also undergo other types of posttranslational modification, such as ADP ribosylation (72), lipidations (e.g., S-acylation or S-prenylation) (73), nitrosylation (74), or cysteinylation (75), which provide additional functionality to the molecule. Here, we reveal the presence of 208 new conotoxins containing an odd number of cysteines (5, 7, 9, or 11) that form 21 distinct cysteine patterns (Table S2). Although all these patterns have been observed in larger proteins isolated from other species (75,795 UniProtKB/Swiss-Prot entries), only 5 of them are present in 14 known conotoxins (C-C-C-C-C; C-CC-C-C; CC-CCC; C-C-C-C-C-C-C; C-C-C-C-C-C-C). Moreover, 13 of the cysteine patterns described in Table S2 could be derived from known Conus scaffolds containing even numbers of cysteine residues (Fig. S5). New frameworks can have thus been created by insertion of an extra cysteine residue either at the N-terminal end, C-terminal end, or in the core of the sequence, or by deletion of a single cysteine amino acid. Further investigation of these new conotoxin sequences could reveal novel posttranslational modifications at their unpaired cysteine residue, conferring upon them new functional advantages, as recently demonstrated with Conus geographus μO§-conotoxin GVIIJ, which is able to stabilize the sodium channel blockade (75).

Table S2.

A total of 21 patterns containing odd numbers of cysteine residues (from 5 to 11) with diverse loop formulas were isolated from the mature regions of 208 newly identified precursor conopeptides

Cysteine pattern Conotoxins Loop formulas Also known in other sources
Proteins Taxon Human? PTM
Five cysteines
 C-C-C-C-C 6 (4-9)C(3-6)C(5-6)C(3-5)C(1-3)C(1-24) 35,216 Fungi, bacteria, virus, plants, animals Yes 2-(S-cysteinyl)pyruvic acid O-phosphothioketal
ADP ribosylcysteine
Cysteine methyl ester
Cysteine persulfide
Cysteine sulfenic acid (-SOH)
S-nitrosocysteine
S-glutathionyl cysteine
S-(dipyrrolylmethanemethyl)cysteine
S-8alpha-FAD cysteine
N-acetylcysteine
S-4a-FMN cysteine
S-4a-FMN cysteine
S-8alpha-FAD cysteine
S-methylcysteine
 C-CC-C-C 1 (16)C(5)CC(3)C(3)C(3) 585 Fungi, bacteria, virus, plants, animals Yes Cysteine methyl ester
S-(dipyrrolylmethanemethyl)cysteine
S-glutathionyl cysteine
S-nitrosocysteine
 C-CC-CC 6 (8-15)C(1-32)CC(4-5)CC(0-3) 19 Bacteria, virus, plants, animals Yes
 C-CCCC 1 (9)C(19)CCCC(3) 1 Animals Yes
 CC-C-C-C 1 (10)CC(4)C(11)C(14)C(2) 924 Fungi, bacteria, virus, plants, animals Yes 2-(S-cysteinyl)pyruvic acid O-phosphothioketal
Cysteine methyl ester
Cysteine sulfenic acid (-SOH)
S-(dipyrrolylmethanemethyl)cysteine
 CC-C-CC 3 (10)CC(2-4)C(1-7)CC(0-17) 30 Fungi, bacteria, plants, animals Yes
 CC-CC-C 118 (0-29)CC(4-5)CC(1-17)C(0-61) 14 Bacteria, virus, plants, animals No
 CC-CCC 4 (0-10)CC(4-5)CCC(0-3) 5 Virus, animals No
 CCC-CC 5 (37)CCC(78)CC (7) 1 Bacteria No
Seven cysteines
 C-C-C-C-C-C-C 12 (4)C(3)C(6)C(3)C(1)C(5)C(1)C(4-12) 20,572 Fungi, bacteria, virus, plants, animals Yes 2-(S-cysteinyl)pyruvic acid O-phosphothioketal
Cysteine methyl ester
Cysteine persulfide
Cysteine sulfenic acid (-SOH)
Cysteine sulfinic acid (-SO2H)
S-glutathionyl cysteine
S-nitrosocysteine
S-8alpha-FAD cysteine.
S-(dipyrrolylmethanemethyl)cysteine
S-4a-FMN cysteine
S-8alpha-FAD cysteine
 C-C-C-CC-C-C 2 C(6)C(1-3)C(2-4)CC(2-4)C(4)C(1) 537 Fungi, bacteria, virus, plants, animals Yes 2-(S-cysteinyl)pyruvic acid O-phosphothioketal
Cysteine sulfenic acid (-SOH)
S-(dipyrrolylmethanemethyl)cysteine
S-8alpha-FAD cysteine
 C-C-CC-C-C-C 7 (0-3)C(6)C(6-9)CC(1-4)C(1-4)C(3-12)C(1-43) 445 Fungi, bacteria, virus, plants, animals Yes
 C-C-CC-CC-C 3 (0-3)C(6)C(6-8)CC(2-3)CC(2-3)C(1-5) 22 Fungi, bacteria, virus, animals Yes
 C-CC-CC-C-C 1 (4)C(5)CC(3)CC(3)C(3)C(3) 12 Fungi, bacteria, virus, plants, animals Yes
 CC-C-CC-CC 3 (10)CC(4)C(15)CC(4)CC 1 Bacteria No
 CC-CC-C-C-C 6 (1)CC(4)CC(1-9)C(1-19)C(1-13)C(1-22) 11 Bacteria, virus, plants, animals Yes Cysteine methyl ester.
 CC-CC-CC-C 1 (1)CC(4)CC(13)CC(5)C(3) 1 Fungi No
Nine cysteines
 C-C-C-C-C-C-C-C-C 25 (2-17)C(1-10)C(3-12)C(1-6)C(1-7)C(1-11)C(1-10)C(1-10)C(1-6)C(1-4) 11,061 Fungi, bacteria, virus, plants, animals Yes 2-(S-cysteinyl)pyruvic acid O-phosphothioketal
ADP ribosylcysteine
Cysteine methyl ester
Cysteine sulfenic acid (-SOH)
Cysteine persulfide
S-nitrosocysteine
S-(dipyrrolylmethanemethyl)cysteine.
N-acetylcysteine
S-(dipyrrolylmethanemethyl)cysteine
S-methylcysteine
 C-C-C-CC-CC-C-C 1 (10)C(2)C(6)C(5)CC(3)CC(4)C(6)C(2) 12 Bacteria, animals Yes
 CC-C-C-C-C-C-C-C 1 (1)CC(12)C(6)C(4)C(11)C(6)C(7)C(6)C(4) 228 Fungi, bacteria, virus, plants, animals Yes
Eleven cysteines
 C-C-C-C-C-C-C-C-C-C-C 1 (4)C(3)C(6)C(3)C(1)C(5)C(1)C(2)C(5)C(1)C(1)C(2) 6,098 Fungi, bacteria, virus, plants, animals Yes 2-(S-cysteinyl)pyruvic acid O-phosphothioketal
Cysteine methyl ester
Cysteine persulfide
Cysteine sulfenic acid (-SOH)
S-(dipyrrolylmethanemethyl)cysteine
S-4a-FMN cysteine
S-8alpha-FAD cysteine
S-nitrosocysteine

The columns under the heading “Also known in other sources” indicate other sources where identical cysteine patterns have been found [the number of proteins, taxa of organisms, their presence in human, and the posttranslational modifications encountered at the cysteine residue(s) of these motifs are mentioned].

Fig. S5.

Fig. S5.

Speculation on the origin of mature conotoxin scaffolds containing an odd number of cysteine residues. The mature regions of 208 new conotoxin sequences containing an odd number of cysteines were aligned with known mature conopeptides (n = 2,258 sequences obtained from the ConoServer website) using the CD-HIT algorithm. Clusters of new (name in regular font) and known (name in italic) sequences sharing >70% identity show the position of the insertion (bold red) or deletion (bold blue) of cysteines compared with the conserved residues (green frame).

Definition of New Cone Snail Gene Superfamilies.

We have recently reported new gene superfamilies in C. marmoreus (57). Briefly, unclassified signal peptides, isolated from precursor conotoxin transcripts containing proregions and mature Conus regions, were grouped into clusters sharing ≥75.00% identity. Batch pairwise alignments between unclassified signal peptides and ones for which a superfamily has been previously assigned were performed. When the entire cluster of novel signal sequences shared ≤53.30% identity with the signal sequences of any empirical superfamilies, the cluster was considered a putative novel superfamily. Here, the same strategy applied to 1,311 novel signal peptide signatures grouped into 105 clusters led to the identification of 39 precursor sequences classified into 16 categories of toxins that we propose to define as new superfamilies SF-Epi 1–16 (Fig. S6).

Fig. S6.

Fig. S6.

New conopeptide gene superfamilies. Sequences were aligned using the BLOSUM62 cost matrix. The number of sequences present in the original cDNA library (second column), their maximal identity percentage (third column) compared with the closest known conopeptide (name and superfamily between parentheses), as well as their signal region (delimited by red lines), are mentioned.

Codon Usage Bias and RNA Editing.

The relative synonymous codon usage (RSCU) of transcripts encoding each group of parent precursor conotoxins and their corresponding isoforms (2,750 sequences distributed into 99 clusters) has been analyzed according to (i) their superfamilies, (ii) their individual signal, proregions and mature regions, and (iii) their cysteine frameworks. Although we were unable to find any correlations between codon usage and superfamilies, we observed that codons encoding cysteines in the mature region are highly conserved, but also specific to the position of the residue in the motif, and specific to the framework considered (Fig. 5A).

Fig. 5.

Fig. 5.

Relative synonymous codon usage (RSCU) of cone snail cysteine frameworks (A) and profiles of point nucleotide substitutions or indels in the signal (red), proregion (green), and mature (blue) conotoxin regions (B).

We also focused on mutations and potential RNA-editing processes among the clusters of parent and their variant sequences sharing at least 96% identity as described previously (Fig. 5B). We observed that point base substitutions occurred more frequently in the mature part of the toxins than in their signal and proregions (except for C to T, and G to T permutations that seemed to be more or equally abundant in the signal peptide) (Fig. 5B). However, the rate of indels is much higher than base substitutions (∼3–10 times in number). Interestingly, whereas insertions mainly take place in the mature part of conotoxins, the majority of deletions have been observed in the proregions.

Discussion

With their extreme chemical, thermal, and proteolytic stability, small globular cysteine-rich peptides produced by predatory marine cone snails have been considered promising pharmacological alternatives that share the advantages of both small molecules (potential oral delivery, high tissue penetration, cellular internalization, weak immunogenicity) and large protein “biologics,” like antibodies (high affinity and specificity to clinical targets) (76). However, using traditional protein-centric drug discovery approaches has been a tedious and time-consuming task that allows only superficial mining of the huge chemical diversity of natural products and that usually leads to the identification of only a few bioactive peptides per experiment (77).

During the past decade, several studies focusing solely on cone snail venom duct (43, 44, 49, 7880) or salivary gland (9, 43, 44, 49, 7880) transcriptomes, and later complemented by proteome profiling (46, 47, 50, 59, 81), have allowed the report of no more than only a hundred (47 on average) full-length precursor conotoxins each. The great majority of these studies used the ROCHE 454 next-generation sequencing platform because it produced low amounts of long reads that were possible to annotate by performing simple homology BLAST searches. However, the sequences produced were often analyzed without applying quality filtering first (or using low thresholds) although single-base call and homopolymer-associated errors are frequent with this platform (82). Moreover, the weak accuracy of global BLAST searches to identify and classify conotoxin transcripts, compared with purpose-built algorithms, favors the discovery only of toxins closely related to known ones and is not suitable for large datasets containing numerous sequence isoforms.

In this article, we used state-of-the-art Illumina 2 × 300 paired-end chemistry and LC-MS/MS protein sequencing integrated in a dedicated bioinformatics pipeline that allowed capturing, to our knowledge, the first high-definition snapshot of the toxin arsenal isolated from a single venom apparatus and supported by accurate annotations. We were able to (i) identify 3,303 novel full-length conotoxin precursors belonging to 9 empirical and 16 new gene superfamilies, as well as displaying 9 Conus cysteine frameworks; (ii) identify 212 conotoxins containing the pharmacologically active ICK and CC-C-C motifs; (iii) identify six novel cysteine frameworks anticipated to support novel pharmacology; and (iv) highlight the specific conservation of codons encoding the cysteine skeleton of the mature conotoxins.

The high rate of nucleotide substitutions and insertions observed in the intercysteine loops of the mature toxin region, amplified by potential RNA-editing processes, could explain the extensive number of conotoxin isoforms. Indeed, nucleoside modifications such as cytidine (C) to uridine (U) or adenosine (A) to inosine (I) deaminations have been observed in both eukaryotic and prokaryotic tRNAs, rRNAs, microRNAs, and mRNAs (83, 84). Although posttranscriptional editing of mRNAs is far less common than other RNA-processing events, such as alternative splicing, 5′-capping, or 3′-polyadenylation, it could be a source of preference for certain codons to be translated more accurately or efficiently, leading to sequence variability and variations of protein expression levels (85). Sequencing the Conus genome might help address these questions and shed light on the organization, expression, and regulation mechanisms of gene-encoding conotoxins.

This exceptional sequence diversity, coupled here with the discovery of new conotoxins with cysteine patterns encountered in other organisms (such as integrin receptor antagonist snake C-type lectins, which have provided lead structures for the design of antimetastatic and antiangiogenic drugs) (86), confirms the extraordinary potential of small Conus peptides to unveil novel pharmacology. Moreover, improvement of de novo assembly programs dedicated to the treatment of datasets with numerous conserved sequences and repeats would open the way to the identification of new classes of longer polypeptides with original modes of action (81). However, de novo transcriptome assembly still remains a challenging task (8791) requiring dedicated high-depth sequencing strategies and extensive optimization steps (92) especially when performed in nonmodel organisms expressing highly similar transcripts. Indeed, most modern de novo assemblers based on de Bruijn graphs (93) still lack the efficiency to treat repetitive sequence regions, often leading to the abortion of the graph resolution after a few cycles and production of chimera sequences.

This chemical diversity could also be expanded with better identification of toxin posttranslational modifications (PTMs) and the amelioration of transcriptome/proteome mapping. Although the pipeline described here has allowed matching more peptide fragments with conotoxin transcripts than the only two comparable Conus studies [144 peptides mapped to 3,303 full-length precursor transcripts (4.4%); Dutertre et al. (59), 43 vs. 75–57%; Jin et al. (46), 29 vs. 48–60%], the task still remains delicate. Indeed, the difference of sequencing depth between Illumina and current mass spectrometers necessitates enriching protein samples to detect low expressed proteins. In addition, bottom-up proteomic technologies can sequence only short fragments of proteins, leading to an enrichment of identical peptides when originating from similar protein isoforms, thus making difficult their precise assignment to their corresponding parent precursor transcripts. This limitation could be alleviated using top-down mass spectrometry where intact protein ions are introduced into a gas phase to be further fragmented and analyzed (94). Although this sequencing approach provides high sequence coverage and usually retains labile PTMs (95), the limited compatibility of the dissociation techniques used (electron capture dissociation or electron transfer dissociation for instance) with front-end separation methods and the difficulty to interpret complex fragmentation spectra generated by large multiply charged precursors limit this application to isolated proteins or simple mixtures (96, 97). We can also mention that a minor fraction of mature conotoxins lacking Arg and/or Lys [345 (10.44%) of the toxins reported here] or showing disadvantageous placement of these amino acids [64 (1.94%)] (98) would not be observable at protein level when using shotgun sequencing. Moreover, peptides not (or too strongly) retained on the LC column, as well as large peptide fragments containing amino acids that weakly protonate (99) and are able to generate multiply charged ions with m/z value above the mass spectrometer selection threshold, could not be considered for database searching.

Finally, it is noteworthy that the methodology described in this report can be applied to potentially any type of tissue or organisms. The high sensitivity of the sequencing platforms clearly demonstrates the possibility of working with small amounts of starting material, which makes this approach suitable for studying rare samples. Also, the type of sequences to analyze is not restrictive. ConoSorter, the annotation program used here, can be easily modulated to study other organisms by incorporating specific search models built from training sets of protein sequences that share conserved or unique primary structure signatures. Thus, this data-mining strategy offers a personalized tool for studying large sets of exome expression products that can be used for fundamental research purposes or applications such as diagnostic or drug discovery.

Materials and Methods

Collection of the Conus specimen, as well as the dissection of its venom duct, radular sac, and salivary glands are described in SI Materials and Methods. mRNA isolation from these compartments followed by the preparation and sequencing of the cDNA libraries with Illumina MiSeq sequencer are described in SI Materials and Methods. The bioinformatic processing of the transcriptome sequencing reads and their de novo assemblies allowing the discovery of new conotoxin sequences, cysteine frameworks, and gene superfamilies, as well as the analysis of codon usage and RNA editing are described in SI Materials and Methods. Finally, protein extraction and fractionation by PAGE and HPLC, followed by MS sequencing showing the existence of conotoxin transcripts at protein level, are detailed in SI Materials and Methods.

SI Materials and Methods

Specimen Collection, Total RNA, and Protein Isolation.

One C. episcopatus specimen measuring 4 cm in the antero-posterior axis was collected on Lady Musgrave island (Queensland, Australia). Dissections of the venom duct, radular sac, and salivary gland were performed on ice under sterile conditions. The DNA, total RNA, and proteins were isolated from the three organs using TRIzol reagent (Life Technologies) according to the manufacturer’s protocol.

Messenger RNA Isolation and cDNA Library Preparation.

CDNA (cDNA) libraries were prepared using a TruSeq Stranded mRNA Sample Preparation Kit (Illumina) following the manufacturer’s protocol. For each organ, 1 μg of total RNA extract was used. Poly-A mRNA molecules were purified using oligo-dT–coated magnetic beads and a 2-min fragmentation period at 94 °C. Priming of purified mRNA followed by first and second strand cDNA synthesis, 3′ adenylation, and adapter ligation, as well as enrichment of the obtained DNA fragments, were performed before controlling the quality and validating 1:50, 1:30 diluted, and undiluted libraries using a 2100 Bioanalyzer instrument (Agilent Technologies) and high sensitivity DNA chips. Libraries of 445, 462, and 431 average insert lengths (bp) for VD, RS, and SG, respectively, were then sequenced.

Transcriptome Sequencing.

Indexed cDNA libraries were sequenced with a next-generation MiSeq Benchtop Sequencer (Illumina) based on 2 × 300 bp strand-specific paired-end runs (301 sequencing cycles). Then, 4 nM (VD and RS), and 2 nM (SG) libraries of NaOH-denatured DNA were prepared with the corresponding MiSeq reagent kit by following the manufacturer’s guidelines to obtain final concentrations of 7 pM (VD) and 12 pM (RS and SG).

Bioinformatics Processing of Transcriptome Sequencing Data.

Raw reads were first analyzed with the FastQC v0.10.1 program (www.bioinformatics.babraham.ac.uk/projects/fastqc/) (100). Paired-end reads were trimmed from adaptor sequences with the Trimmomatic-0.30 algorithm (101) using the “palindrome” approach, allowing seed matches a maximum of two mismatches, as well as palindrome and simple clip thresholds of 30 (i.e., requiring a match of about 50 bases) and 10 (∼17 bases), respectively. An average quality score cutoff per read of 30 was applied according to the FastQC analysis report (probability of 1 incorrect base call over 1000, or 99.9% base call accuracy). Trimmed paired-end reads were then merged with FLASh (102) (minimum overlap, 10; maximum overlap, 65; maximum mismatch density, 0.25). To assess the accuracy of the merging process, the PANDAseq algorithm also was used and gave a similar number of merged paired-end reads (103). To decipher potential transcript sequences encoding unusually large Conus proteins, such as actinoporin- or hyaluronidase-like conotoxins (81), as well as to reconstruct the full-length precursor sequence of low-expressed conotoxins for which only fragments of their coding region were present in the reads, three tissue-specific sets of merged and unmerged reads obtained from VD, RS, and SG were concatenated and submitted independently to four different de novo assemblers: Trinity (version r2014-04-13) (flags used:–seqType fq–single <input.fastq>–JM 30G–CPU 8–min_contig_length 100) (104, 105); SOAPdenovo-Trans v1.03 (flags: -K 25, -p 8, -d 0, -e 2, -M 1, -L 100, -t 5, -G 50; config. file: max_rd_len = 592, rd_len_cutof = 592, reverse_seq = 1, asm_flags = 3, map_len = 32, q=<input.fastq>, as well as avg_ins = 222, avg_ins = 219 and avg-ins = 190 for VD, RS, and SG, respectively) (106); Oases 0.2.8 (flags: -m 15, -M 41, -d “-fastq <input.fastq>”, -g 27, -min_trans_lgth 100) (107); CLC Genomics Workbench 6.5.1 (www.clcbio.com) (settings: automatic word size, automatic bubble size, minimum contig length: 100). To assess the assembly quality, reads were mapped back to contigs using Bowtie2 (108). A different assembly approach was also performed by pooling together all of the merged and unmerged reads from the three compartments of the venom apparatus and then by mapping back each tissue-specific set of reads to the contigs generated.

Merged reads and contigs were then annotated with ConoSorter (57) to identify and classify transcript encoding precursor conotoxins into gene superfamilies. Sequences matching simultaneously ER signal, pro, and mature conopeptide signatures (score 3) or pro and mature motifs only (score 2) were dispatched into two separate groups. Only hits ≥40 amino acids (referred to the shortest Conus spurius Sr5.5 and SrVA precursors conotoxin; 44 amino acids long) (109), with an N-ter hydrophobicity rate of ≥60% (on average, it is of 74.56% with an SD σ = 6.59, and a minimum of 52.00%) (57), and a signal peptide cleavage site identified with SignalP 4.1 algorithm (110) were selected. These sequences were then submitted to ProP 1.0 (111) to detect potential proprotein convertase cleavage sites that delimit the boundaries between pro- and mature regions. We also searched the isolated mature peptides for two exopeptidase cleavage sites known in mollusks as carboxypeptidase E and peptidylglycine α-amidating monooxygenase that cleave C-terminal lysines and arginines or glycine, respectively (112, 113). Finally, their cysteine patterns were extracted and searched against ConoServer and UniProtKB/Swiss-Prot databases.

Protein Fractionation and Preparation for MS Analysis.

Proteins extracted from the venom duct (1,360 μg) were freeze-dried and resuspended in Milli-Q purified water to a concentration of 10 μg⋅μL−1. The solution was then filtered using Amicon Ultra 30-kDa spin filter tubes (Merck Millipore). VD protein solutions containing solutes >30 kDa and <30 kDa (with a relative precision) were independently fractionated using a Shimadzu Prominence HPLC system equipped with an analytical C18 2.1 × 250-mm Vydac Everest column (Grace). A 0–100% (vol/vol) buffer B/A linear gradient flowing at a rate of 1%⋅min−1 was applied (buffer A, 0.05% trifluoroacetic acid, H2O; buffer B, 0.043% trifluoroacetic acid/90% acetonitrile, H2O). Fractions were collected from the 214-nm trace. The same treatments were applied to RS and SG protein extracts except for the freeze-drying and size-filtering steps to limit the loss of proteins that were present in lower amounts (682 μg and 245 μg, respectively). HPLC fractions were dried and resuspended in 1 M ammonium carbonate, pH 11. Cysteine reduction and alkylation were carried out in a solution containing 97.5% (vol/vol) acetonitrile/2% iodoethanol (alkylating agent)/0.5% triethylphosphine (reducing agent) incubated at 37 °C for 60 min as described in the Hale et al. protocol (114). Dry protein samples were reconstituted in100 mM ammonium bicarbonate, pH 8.5, and digested overnight with trypsin (Sigma proteomics grade) at 37 °C. Before MS sequencing, pH adjustment was made using a 0.1% formic acid solution.

Protein extracted from the venom duct, radular sac, and salivary gland were also separated by 1D-PAGE under denaturing conditions with SDS. VD (80 μg), RS (50 μg), and SG (20 μg) raw and reduced protein extracts were loaded on a 4–12% acrylamide gel. VD proteins were also separated by 2D-PAGE using a linear pH 3–10 immobilized pH gradient strip (Bio-Rad) for the first dimension and a 4–12% acrylamide gel for the second dimension. Bands/spots revealed with Coomassie blue were excised, reduced with 10 mM DTT in a 50 mM ammonium bicarbonate solution for 1 h at 37 °C, and alkylated with 50 mM iodoacetamide in 50 mM ammonium bicarbonate for 1 h in the dark at room temperature. Reduced/alkylated proteins were then digested with trypsin (1:4, 0.02% trypsin in 1 mM HCl/40 mM ammonium bicarbonate, pH 8) on ice for 20 min and then overnight at 37 °C. Peptide fragments were extracted from the gel pieces with a 5% formic acid/50% acetonitrile solution, sonicated for 10 min, and incubated for 60 min at room temperature.

Protein Sequencing.

Reduced/alkylated trypsin digests were sequenced by shotgun liquid chromatography coupled to electrospray ionization tandem mass spectrometry (LC-ESI-MS/MS). Low-abundant fraction digests obtained from HPLC and 2D-PAGE were submitted to a Shimadzu nano HPLC system in series with the hybrid quadrupole time-of-flight (TOF) mass spectrometer AB Sciex TripleTOF 5600 system equipped with a NanoSpray ionization source. Then, 10 μL of solutions were injected for product ion MS/MS analysis using information-dependent acquisition (IDA) experiments. The LC separation was achieved using a nano C18 150 mm × 75 μm Agilent Zorbax column at a linear 2% B (90% acetonitrile/0.1% formic acid, H2O)⋅min−1 gradient with a flow rate of 0.3 μL⋅min−1 over 30 min. One full scan of the mass range (350–1,800 Da), followed by multiple tandem mass spectra, was applied using a rolling collision energy relative to the m/z ratio and charge state of the precursor ion up to a maximum of 80 eV. The duration of the full scan was set to 30 min with 2.3 s per cycle (total cycles, 783). A maximum number of 20 candidate ions were monitored per cycle, with an ion tolerance threshold set to 100 mDa. Former target ions were excluded for a period of 8 s to exclude isotopes within a ±4-Da window.

The same devices were used for the digests extracted from 1D SDS/PAGE, except a micro 1.8 μm 2.1 × 100-mm LC column was used (Agilent ZORBAX 300 SB-C18), with flow rate of 180 μL/min directly into the DuoSpray ionization source. For these samples, the MS/MS method was also adjusted accordingly (scan duration, 14 min; cycle time, 1.25 s, for a total of 672 cycles; maximum number of candidate ions per cycle, 10 spectra; no exclusion of former target ions was applied). Data were acquired with the proprietary Analyst TF 1.6 software.

Transcriptome/MSMS Data Matching.

Raw MSMS sequencing data were submitted to ProteinPilot 4.0 (AB Sciex) for mapping (confidence threshold of ≥99%) to the conotoxin transcripts identified with ConoSorter.

Codon Use Bias.

The relative synonymous codon use was measured with the GCUA program (115). The cDNA sequences of all ConoSorter hits classified at the highest score were used. After discarding duplicates, a sample of 8,715 unique cDNA sequences was analyzed for calculating the cumulative codon and amino acid uses.

RNA Editing.

Clusters of parent/variant amino acid sequences sharing at least 96% identity were independently studied at the cDNA level to detect mutations and RNA-editing mechanisms. Groups of closely related nucleotide sequences were aligned using ClustalW2. Base mismatches and indels were then compared in signal, pro, and mature regions separately.

Supplementary Material

Supplementary File
pnas.1501334112.sd01.docx (240.7KB, docx)

Acknowledgments

We thank Professor Sean M. Grimmond for allowing access to the transcriptome sequencing platform. V.L. acknowledges the provision of an Institute for Molecular Bioscience Postgraduate Award and support from National Health and Medical Research Council Program Grant 569927.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequences reported in this paper have been deposited in the DNA Data Bank of Japan, www.ddbj.nig.ac.jp/ (accession nos. DRA003531, PRJDB3896, SAMD00029744, DRX030964, DRR034331, SAMD00029745, DRX030965, DRR034332, SAMD00029746, DRX030966, and DRR034333).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1501334112/-/DCSupplemental.

References

  • 1.Bouchet P, Gofas S. 2014. Conus Linnaeus, 1758. World Register of Marine Species. Available at www.marinespecies.org. Accessed April 29, 2015.
  • 2.Duda TF, Jr, Kohn AJ. Species-level phylogeography and evolutionary history of the hyperdiverse marine gastropod genus Conus. Mol Phylogenet Evol. 2005;34(2):257–272. doi: 10.1016/j.ympev.2004.09.012. [DOI] [PubMed] [Google Scholar]
  • 3.Freeman SE, Turner RJ, Silva SR. The venom and venom apparatus of the marine gastropod Conus striatus Linne. Toxicon. 1974;12(6):587–592. doi: 10.1016/0041-0101(74)90191-3. [DOI] [PubMed] [Google Scholar]
  • 4.Kohn AJ. Piscivorous gastropods of the genus Conus. Proc Natl Acad Sci USA. 1956;42(3):168–171. doi: 10.1073/pnas.42.3.168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Spengler HA, Kohn AJ. Comparative external morphology of the Conus osphradium (Mollusca: Gastropoda) J Zool. 1995;235(3):439–453. [Google Scholar]
  • 6.Schulz JR, Norton AG, Gilly WF. The projectile tooth of a fish-hunting cone snail: Conus catus injects venom into fish prey using a high-speed ballistic mechanism. Biol Bull. 2004;207(2):77–79. doi: 10.2307/1543581. [DOI] [PubMed] [Google Scholar]
  • 7.Marshall J, et al. Anatomical correlates of venom production in Conus californicus. Biol Bull. 2002;203(1):27–41. doi: 10.2307/1543455. [DOI] [PubMed] [Google Scholar]
  • 8.Safavi-Hemami H, Young ND, Williamson NA, Purcell AW. Proteomic interrogation of venom delivery in marine cone snails: Novel insights into the role of the venom bulb. J Proteome Res. 2010;9(11):5610–5619. doi: 10.1021/pr100431x. [DOI] [PubMed] [Google Scholar]
  • 9.Biggs JS, Olivera BM, Kantor YI. Alpha-conopeptides specifically expressed in the salivary gland of Conus pulicarius. Toxicon. 2008;52(1):101–105. doi: 10.1016/j.toxicon.2008.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lewis RJ, Dutertre S, Vetter I, Christie MJ. Conus venom peptide pharmacology. Pharmacol Rev. 2012;64(2):259–298. doi: 10.1124/pr.111.005322. [DOI] [PubMed] [Google Scholar]
  • 11.Terlau H, Olivera BM. Conus venoms: A rich source of novel ion channel-targeted peptides. Physiol Rev. 2004;84(1):41–68. doi: 10.1152/physrev.00020.2003. [DOI] [PubMed] [Google Scholar]
  • 12.Craig AG, Bandyopadhyay P, Olivera BM. Post-translationally modified neuropeptides from Conus venoms. Eur J Biochem. 1999;264(2):271–275. doi: 10.1046/j.1432-1327.1999.00624.x. [DOI] [PubMed] [Google Scholar]
  • 13.Olivera BM. Conus peptides: Biodiversity-based discovery and exogenomics. J Biol Chem. 2006;281(42):31173–31177. doi: 10.1074/jbc.R600020200. [DOI] [PubMed] [Google Scholar]
  • 14.Espiritu DJ, et al. Venomous cone snails: Molecular phylogeny and the generation of toxin diversity. Toxicon. 2001;39(12):1899–1916. doi: 10.1016/s0041-0101(01)00175-1. [DOI] [PubMed] [Google Scholar]
  • 15.Kaas Q, Yu R, Jin AH, Dutertre S, Craik DJ. ConoServer: Updated content, knowledge, and discovery tools in the conopeptide database. Nucleic Acids Res. 2012;40(Database issue):D325–D330. doi: 10.1093/nar/gkr886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Schroeder CI, Craik DJ. Therapeutic potential of conopeptides. Future Med Chem. 2012;4(10):1243–1255. doi: 10.4155/fmc.12.70. [DOI] [PubMed] [Google Scholar]
  • 17.Gray WR, Luque A, Olivera BM, Barrett J, Cruz LJ. Peptide toxins from Conus geographus venom. J Biol Chem. 1981;256(10):4734–4740. [PubMed] [Google Scholar]
  • 18.Ramilo CA, et al. Novel alpha- and omega-conotoxins from Conus striatus venom. Biochemistry. 1992;31(41):9919–9926. doi: 10.1021/bi00156a009. [DOI] [PubMed] [Google Scholar]
  • 19.Sato S, Nakamura H, Ohizumi Y, Kobayashi J, Hirata Y. The amino acid sequences of homologous hydroxyproline-containing myotoxins from the marine snail Conus geographus venom. FEBS Lett. 1983;155(2):277–280. doi: 10.1016/0014-5793(82)80620-0. [DOI] [PubMed] [Google Scholar]
  • 20.Fainzilber M, et al. A new cysteine framework in sodium channel blocking conotoxins. Biochemistry. 1995;34(27):8649–8656. doi: 10.1021/bi00027a014. [DOI] [PubMed] [Google Scholar]
  • 21.Walker CS, et al. The T-superfamily of conotoxins. J Biol Chem. 1999;274(43):30664–30671. doi: 10.1074/jbc.274.43.30664. [DOI] [PubMed] [Google Scholar]
  • 22.Olivera BM, McIntosh JM, Cruz LJ, Luque FA, Gray WR. Purification and sequence of a presynaptic peptide toxin from Conus geographus venom. Biochemistry. 1984;23(22):5087–5090. doi: 10.1021/bi00317a001. [DOI] [PubMed] [Google Scholar]
  • 23.England LJ, et al. Inactivation of a serotonin-gated ion channel by a polypeptide toxin from marine snails. Science. 1998;281(5376):575–578. doi: 10.1126/science.281.5376.575. [DOI] [PubMed] [Google Scholar]
  • 24.Lirazan MB, et al. The spasmodic peptide defines a new conotoxin superfamily. Biochemistry. 2000;39(7):1583–1588. doi: 10.1021/bi9923712. [DOI] [PubMed] [Google Scholar]
  • 25.Balaji RA, et al. Lambda-conotoxins, a new family of conotoxins with unique disulfide pattern and protein folding: Isolation and characterization from the venom of Conus marmoreus. J Biol Chem. 2000;275(50):39516–39522. doi: 10.1074/jbc.M006354200. [DOI] [PubMed] [Google Scholar]
  • 26.Jimenez EC, et al. Novel excitatory Conus peptides define a new conotoxin superfamily. J Neurochem. 2003;85(3):610–621. doi: 10.1046/j.1471-4159.2003.01685.x. [DOI] [PubMed] [Google Scholar]
  • 27.Brown MA, et al. Precursors of novel Gla-containing conotoxins contain a carboxy-terminal recognition site that directs gamma-carboxylation. Biochemistry. 2005;44(25):9150–9159. doi: 10.1021/bi0503293. [DOI] [PubMed] [Google Scholar]
  • 28.Aguilar MB, et al. A novel conotoxin from Conus delessertii with posttranslationally modified lysine residues. Biochemistry. 2005;44(33):11130–11136. doi: 10.1021/bi050518l. [DOI] [PubMed] [Google Scholar]
  • 29.Möller C, et al. A novel conotoxin framework with a helix-loop-helix (Cs alpha/alpha) fold. Biochemistry. 2005;44(49):15986–15996. doi: 10.1021/bi0511181. [DOI] [PubMed] [Google Scholar]
  • 30.Peng C, Liu L, Shao X, Chi C, Wang C. Identification of a novel class of conotoxins defined as V-conotoxins with a unique cysteine pattern and signal peptide sequence. Peptides. 2008;29(6):985–991. doi: 10.1016/j.peptides.2008.01.007. [DOI] [PubMed] [Google Scholar]
  • 31.Pi C, et al. Diversity and evolution of conotoxins based on gene expression profiling of Conus litteratus. Genomics. 2006;88(6):809–819. doi: 10.1016/j.ygeno.2006.06.014. [DOI] [PubMed] [Google Scholar]
  • 32.Yuan DD, et al. Isolation and cloning of a conotoxin with a novel cysteine pattern from Conus caracteristicus. Peptides. 2008;29(9):1521–1525. doi: 10.1016/j.peptides.2008.05.015. [DOI] [PubMed] [Google Scholar]
  • 33.Chen JS, Fan CX, Hu KP, Wei KH, Zhong MN. Studies on conotoxins of Conus betulinus. J Nat Toxins. 1999;8(3):341–349. [PubMed] [Google Scholar]
  • 34.Chen P, Garrett JE, Watkins M, Olivera BM. Purification and characterization of a novel excitatory peptide from Conus distans venom that defines a novel gene superfamily of conotoxins. Toxicon. 2008;52(1):139–145. doi: 10.1016/j.toxicon.2008.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Loughnan ML, Nicke A, Lawrence N, Lewis RJ. Novel alpha D-conopeptides and their precursors identified by cDNA cloning define the D-conotoxin superfamily. Biochemistry. 2009;48(17):3717–3729. doi: 10.1021/bi9000326. [DOI] [PubMed] [Google Scholar]
  • 36.Möller C, Marí F. 9.3 KDa components of the injected venom of Conus purpurascens define a new five-disulfide conotoxin framework. Biopolymers. 2011;96(2):158–165. doi: 10.1002/bip.21406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Elliger CA, et al. Diversity of conotoxin types from Conus californicus reflects a diversity of prey types and a novel evolutionary history. Toxicon. 2011;57(2):311–322. doi: 10.1016/j.toxicon.2010.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Ye M, et al. A helical conotoxin from Conus imperialis has a novel cysteine framework and defines a new superfamily. J Biol Chem. 2012;287(18):14973–14983. doi: 10.1074/jbc.M111.334615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Luo S, et al. A novel inhibitor of α9α10 nicotinic acetylcholine receptors from Conus vexillum delineates a new conotoxin superfamily. PLoS ONE. 2013;8(1):e54648. doi: 10.1371/journal.pone.0054648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Aguilar MB, et al. A novel arrangement of Cys residues in a paralytic peptide of Conus cancellatus (jr. syn.: Conus austini), a worm-hunting snail from the Gulf of Mexico. Peptides. 2013;41:38–44. doi: 10.1016/j.peptides.2013.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Bernáldez J, et al. A Conus regularis conotoxin with a novel eight-cysteine framework inhibits CaV2.2 channels and displays an anti-nociceptive activity. Mar Drugs. 2013;11(4):1188–1202. doi: 10.3390/md11041188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Liu Z, et al. Diversity and evolution of conotoxins in Conus virgo, Conus eburneus, Conus imperialis and Conus marmoreus from the South China Sea. Toxicon. 2012;60(6):982–989. doi: 10.1016/j.toxicon.2012.06.011. [DOI] [PubMed] [Google Scholar]
  • 43.Lluisma AO, Milash BA, Moore B, Olivera BM, Bandyopadhyay PK. Novel venom peptides from the cone snail Conus pulicarius discovered through next-generation sequencing of its venom duct transcriptome. Mar Genomics. 2012;5:43–51. doi: 10.1016/j.margen.2011.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Hu H, Bandyopadhyay PK, Olivera BM, Yandell M. Elucidation of the molecular envenomation strategy of the cone snail Conus geographus through transcriptome sequencing of its venom duct. BMC Genomics. 2012;13:284. doi: 10.1186/1471-2164-13-284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Biggs JS, et al. Evolution of Conus peptide toxins: Analysis of Conus californicus Reeve, 1844. Mol Phylogenet Evol. 2010;56(1):1–12. doi: 10.1016/j.ympev.2010.03.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Jin AH, et al. Transcriptomic messiness in the venom duct of Conus miles contributes to conotoxin diversity. Mol Cell Proteomics. 2013;12(12):3824–3833. doi: 10.1074/mcp.M113.030353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Safavi-Hemami H, et al. Combined proteomic and transcriptomic interrogation of the venom gland of Conus geographus uncovers novel components and functional compartmentalization. Mol Cell Proteomics. 2014;13(4):938–953. doi: 10.1074/mcp.M113.031351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zhou M, et al. Characterizing the evolution and functions of the M-superfamily conotoxins. Toxicon. 2013;76:150–159. doi: 10.1016/j.toxicon.2013.09.020. [DOI] [PubMed] [Google Scholar]
  • 49.Terrat Y, et al. High-resolution picture of a venom gland transcriptome: Case study with the marine snail Conus consors. Toxicon. 2012;59(1):34–46. doi: 10.1016/j.toxicon.2011.10.001. [DOI] [PubMed] [Google Scholar]
  • 50.Violette A, et al. Recruitment of glycosyl hydrolase proteins in a cone snail venomous arsenal: Further insights into biomolecular features of Conus venoms. Mar Drugs. 2012;10(2):258–280. doi: 10.3390/md10020258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Walker CS, et al. A novel Conus snail polypeptide causes excitotoxicity by blocking desensitization of AMPA receptors. Curr Biol. 2009;19(11):900–908. doi: 10.1016/j.cub.2009.05.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Lirazan M, Jimenez EC, Grey Craig A, Olivera BM, Cruz LJ. Conophysin-R, a Conus radiatus venom peptide belonging to the neurophysin family. Toxicon. 2002;40(7):901–908. doi: 10.1016/s0041-0101(02)00079-x. [DOI] [PubMed] [Google Scholar]
  • 53.Wermeling DP. Ziconotide, an intrathecally administered N-type calcium channel antagonist for the treatment of chronic pain. Pharmacotherapy. 2005;25(8):1084–1094. doi: 10.1592/phco.2005.25.8.1084. [DOI] [PubMed] [Google Scholar]
  • 54.Kolosov A, Aurini L, Williams ED, Cooke I, Goodchild CS. Intravenous injection of leconotide, an omega conotoxin: Synergistic antihyperalgesic effects with morphine in a rat model of bone cancer pain. Pain Med. 2011;12(6):923–941. doi: 10.1111/j.1526-4637.2011.01118.x. [DOI] [PubMed] [Google Scholar]
  • 55.Brust A, et al. chi-Conopeptide pharmacophore development: Toward a novel class of norepinephrine transporter inhibitor (Xen2174) for pain. J Med Chem. 2009;52(22):6991–7002. doi: 10.1021/jm9003413. [DOI] [PubMed] [Google Scholar]
  • 56.Kaas Q, Westermann JC, Craik DJ. Conopeptide characterization and classifications: An analysis using ConoServer. Toxicon. 2010;55(8):1491–1509. doi: 10.1016/j.toxicon.2010.03.002. [DOI] [PubMed] [Google Scholar]
  • 57.Lavergne V, et al. Systematic interrogation of the Conus marmoreus venom duct transcriptome with ConoSorter reveals 158 novel conotoxins and 13 new gene superfamilies. BMC Genomics. 2013;14(1):708. doi: 10.1186/1471-2164-14-708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Aguilar MB, et al. Precursor De13.1 from Conus delessertii defines the novel G gene superfamily. Peptides. 2013;41:17–20. doi: 10.1016/j.peptides.2013.01.009. [DOI] [PubMed] [Google Scholar]
  • 59.Dutertre S, et al. Deep venomics reveals the mechanism for expanded peptide diversity in cone snail venom. Mol Cell Proteomics. 2013;12(2):312–329. doi: 10.1074/mcp.M112.021469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Puillandre N, Koua D, Favreau P, Olivera BM, Stöcklin R. Molecular phylogeny, classification and evolution of conopeptides. J Mol Evol. 2012;74(5-6):297–309. doi: 10.1007/s00239-012-9507-2. [DOI] [PubMed] [Google Scholar]
  • 61.Ishikawa H. Evolution of ribosomal RNA. Comp Biochem Physiol B. 1977;58(1):1–7. doi: 10.1016/0305-0491(77)90116-x. [DOI] [PubMed] [Google Scholar]
  • 62.Hernández AI, et al. Poly-(ADP-ribose) polymerase-1 is necessary for long-term facilitation in Aplysia. J Neurosci. 2009;29(30):9553–9562. doi: 10.1523/JNEUROSCI.1512-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Fujiwara H, Ishikawa H. Molecular mechanism of introduction of the hidden break into the 28S rRNA of insects: Implication based on structural studies. Nucleic Acids Res. 1986;14(16):6393–6401. doi: 10.1093/nar/14.16.6393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Zarlenga DS, Dame JB. The identification and characterization of a break within the large subunit ribosomal RNA of Trichinella spiralis: Comparison of gap sequences within the genus. Mol Biochem Parasitol. 1992;51(2):281–289. doi: 10.1016/0166-6851(92)90078-x. [DOI] [PubMed] [Google Scholar]
  • 65.Buczek O, et al. Characterization of D-amino-acid-containing excitatory conotoxins and redefinition of the I-conotoxin superfamily. FEBS J. 2005;272(16):4178–4188. doi: 10.1111/j.1742-4658.2005.04830.x. [DOI] [PubMed] [Google Scholar]
  • 66.Conticello SG, et al. Mechanisms for evolving hypervariability: The case of conopeptides. Mol Biol Evol. 2001;18(2):120–131. doi: 10.1093/oxfordjournals.molbev.a003786. [DOI] [PubMed] [Google Scholar]
  • 67.Vetter I, et al. Isolation, characterization and total regioselective synthesis of the novel μO-conotoxin MfVIA from Conus magnificus that targets voltage-gated sodium channels. Biochem Pharmacol. 2012;84(4):540–548. doi: 10.1016/j.bcp.2012.05.008. [DOI] [PubMed] [Google Scholar]
  • 68.Kits KS, et al. Novel omega-conotoxins block dihydropyridine-insensitive high voltage-activated calcium channels in molluscan neurons. J Neurochem. 1996;67(5):2155–2163. doi: 10.1046/j.1471-4159.1996.67052155.x. [DOI] [PubMed] [Google Scholar]
  • 69.Sudarslal S, et al. Sodium channel modulating activity in a delta-conotoxin from an Indian marine snail. FEBS Lett. 2003;553(1-2):209–212. doi: 10.1016/s0014-5793(03)01016-0. [DOI] [PubMed] [Google Scholar]
  • 70.Lewis RJ. Discovery and development of the χ-conopeptide class of analgesic peptides. Toxicon. 2012;59(4):524–528. doi: 10.1016/j.toxicon.2011.07.012. [DOI] [PubMed] [Google Scholar]
  • 71.Bhatia S, et al. Constrained de novo sequencing of conotoxins. J Proteome Res. 2012;11(8):4191–4200. doi: 10.1021/pr300312h. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.McDonald LJ, Moss J. Enzymatic and nonenzymatic ADP-ribosylation of cysteine. Mol Cell Biochem. 1994;138(1-2):221–226. doi: 10.1007/BF00928465. [DOI] [PubMed] [Google Scholar]
  • 73.Tate EW, Kalesh KA, Lanyon-Hogg T, Storck EM, Thinon E. Global profiling of protein lipidation using chemical proteomic technologies. Curr Opin Chem Biol. 2015;24:48–57. doi: 10.1016/j.cbpa.2014.10.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Qin Y, Dey A, Daaka Y. Protein s-nitrosylation measurement. Methods Enzymol. 2013;522:409–425. doi: 10.1016/B978-0-12-407865-9.00019-4. [DOI] [PubMed] [Google Scholar]
  • 75.Gajewiak J, et al. A disulfide tether stabilizes the block of sodium channels by the conotoxin μO§-GVIIJ. Proc Natl Acad Sci USA. 2014;111(7):2758–2763. doi: 10.1073/pnas.1324189111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Fosgerau K, Hoffmann T. Peptide therapeutics: Current status and future directions. Drug Discov Today. 2015;20(1):122–128. doi: 10.1016/j.drudis.2014.10.003. [DOI] [PubMed] [Google Scholar]
  • 77.Vetter I, et al. Venomics: A new paradigm for natural products-based drug discovery. Amino Acids. 2011;40(1):15–28. doi: 10.1007/s00726-010-0516-4. [DOI] [PubMed] [Google Scholar]
  • 78.Hu H, Bandyopadhyay PK, Olivera BM, Yandell M. Characterization of the Conus bullatus genome and its venom-duct transcriptome. BMC Genomics. 2011;12:60. doi: 10.1186/1471-2164-12-60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Remigio EA, Duda TF., Jr Evolution of ecological specialization and venom of a predatory marine gastropod. Mol Ecol. 2008;17(4):1156–1162. doi: 10.1111/j.1365-294X.2007.03627.x. [DOI] [PubMed] [Google Scholar]
  • 80.Robinson SD, et al. Diversity of conotoxin gene superfamilies in the venomous snail, Conus victoriae. PLoS ONE. 2014;9(2):e87648. doi: 10.1371/journal.pone.0087648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Violette A, et al. Large-scale discovery of conopeptides and conoproteins in the injectable venom of a fish-hunting cone snail using a combined proteomic and transcriptomic approach. J Proteomics. 2012;75(17):5215–5225. doi: 10.1016/j.jprot.2012.06.001. [DOI] [PubMed] [Google Scholar]
  • 82.Luo C, Tsementzi D, Kyrpides N, Read T, Konstantinidis KT. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample. PLoS ONE. 2012;7(2):e30087. doi: 10.1371/journal.pone.0030087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Brennicke A, Marchfelder A, Binder S. RNA editing. FEMS Microbiol Rev. 1999;23(3):297–316. doi: 10.1111/j.1574-6976.1999.tb00401.x. [DOI] [PubMed] [Google Scholar]
  • 84.Su AA, Randau L. A-to-I and C-to-U editing within transfer RNAs. Biochemistry (Mosc) 2011;76(8):932–937. doi: 10.1134/S0006297911080098. [DOI] [PubMed] [Google Scholar]
  • 85.Tuller T, Waldman YY, Kupiec M, Ruppin E. Translation efficiency is determined by both codon bias and folding energy. Proc Natl Acad Sci USA. 2010;107(8):3645–3650. doi: 10.1073/pnas.0909910107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Arlinghaus FT, Eble JA. C-type lectin-like proteins from snake venoms. Toxicon. 2012;60(4):512–519. doi: 10.1016/j.toxicon.2012.03.001. [DOI] [PubMed] [Google Scholar]
  • 87.Feldmeyer B, Wheat CW, Krezdorn N, Rotter B, Pfenninger M. Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance. BMC Genomics. 2011;12:317. doi: 10.1186/1471-2164-12-317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Góngora-Castillo E, Buell CR. Bioinformatics challenges in de novo transcriptome assembly using short read sequences in the absence of a reference genome sequence. Nat Prod Rep. 2013;30(4):490–500. doi: 10.1039/c3np20099j. [DOI] [PubMed] [Google Scholar]
  • 89.Martin JA, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet. 2011;12(10):671–682. doi: 10.1038/nrg3068. [DOI] [PubMed] [Google Scholar]
  • 90.Treangen TJ, Salzberg SL. Repetitive DNA and next-generation sequencing: Computational challenges and solutions. Nat Rev Genet. 2012;13(1):36–46. doi: 10.1038/nrg3117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Vijay N, Poelstra JW, Künstner A, Wolf JB. Challenges and strategies in transcriptome assembly and differential gene expression quantification: A comprehensive in silico assessment of RNA-seq experiments. Mol Ecol. 2013;22(3):620–634. doi: 10.1111/mec.12014. [DOI] [PubMed] [Google Scholar]
  • 92.Zhao QY, et al. Optimizing de novo transcriptome assembly from short-read RNA-Seq data: A comparative study. BMC Bioinformatics. 2011;12(Suppl 14):S2. doi: 10.1186/1471-2105-12-S14-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Compeau PE, Pevzner PA, Tesler G. How to apply de Bruijn graphs to genome assembly. Nat Biotechnol. 2011;29(11):987–991. doi: 10.1038/nbt.2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Chait BT. Mass spectrometry: Bottom-up or top-down? Science. 2006;314(5796):65–66. doi: 10.1126/science.1133987. [DOI] [PubMed] [Google Scholar]
  • 95.Ueberheide BM, Fenyö D, Alewood PF, Chait BT. Rapid sensitive analysis of cysteine rich peptide venom components. Proc Natl Acad Sci USA. 2009;106(17):6910–6915. doi: 10.1073/pnas.0900745106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Wehr T. Top-down versus bottom-up approaches in proteomics. LCGC North America. 2006;24(9):1004–1011. [Google Scholar]
  • 97.Yates JR, Ruse CI, Nakorchevsky A. Proteomics by mass spectrometry: approaches, advances, and applications. Annu Rev Biomed Eng. 2009;11:49–79. doi: 10.1146/annurev-bioeng-061008-124934. [DOI] [PubMed] [Google Scholar]
  • 98.Keil B. Specificity of Proteolysis. Springer; Berlin: 1992. [Google Scholar]
  • 99.Harrison AG. The gas-phase basicities and proton affinities of amino acids and peptides. Mass Spectrom Rev. 1997;16(4):201–217. [Google Scholar]
  • 100.Andrews S. 2011. FastQC: A Quality Control Tool for High Throughput Sequence Data (Babraham Bioinformatics, Cambridge, UK)
  • 101.Lohse M, et al. 2012. RobiNA: A user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res 40(Web Server Issue):W622–W627.
  • 102.Magoč T, Salzberg SL. FLASH: Fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27(21):2957–2963. doi: 10.1093/bioinformatics/btr507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Masella AP, Bartram AK, Truszkowski JM, Brown DG, Neufeld JD. PANDAseq: Paired-end assembler for illumina sequences. BMC Bioinformatics. 2012;13:31. doi: 10.1186/1471-2105-13-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Grabherr MG, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–652. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Haas BJ, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8(8):1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Xie Y, et al. SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads. Bioinformatics. 2014;30(12):1660–1666. doi: 10.1093/bioinformatics/btu077. [DOI] [PubMed] [Google Scholar]
  • 107.Schulz MH, Zerbino DR, Vingron M, Birney E. Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics. 2012;28(8):1086–1092. doi: 10.1093/bioinformatics/bts094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Zamora-Bustillos R, Aguilar MB, Falcón A, Heimer de la Cotera EP. Identification, by RT-PCR, of four novel T-1-superfamily conotoxins from the vermivorous snail Conus spurius from the Gulf of Mexico. Peptides. 2009;30(8):1396–1404. doi: 10.1016/j.peptides.2009.05.003. [DOI] [PubMed] [Google Scholar]
  • 110.Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: Discriminating signal peptides from transmembrane regions. Nat Methods. 2011;8(10):785–786. doi: 10.1038/nmeth.1701. [DOI] [PubMed] [Google Scholar]
  • 111.Duckert P, Brunak S, Blom N. Prediction of proprotein convertase cleavage sites. Protein Eng Des Sel. 2004;17(1):107–112. doi: 10.1093/protein/gzh013. [DOI] [PubMed] [Google Scholar]
  • 112.Fan X, Nagle GT. Molecular cloning of Aplysia neuronal cDNAs that encode carboxypeptidases related to mammalian prohormone processing enzymes. DNA Cell Biol. 1996;15(11):937–945. doi: 10.1089/dna.1996.15.937. [DOI] [PubMed] [Google Scholar]
  • 113.Fan X, Spijker S, Akalal DB, Nagle GT. Neuropeptide amidation: Cloning of a bifunctional alpha-amidating enzyme from Aplysia. Brain Res Mol Brain Res. 2000;82(1-2):25–34. doi: 10.1016/s0169-328x(00)00173-x. [DOI] [PubMed] [Google Scholar]
  • 114.Hale JE, Butler JP, Gelfanova V, You JS, Knierman MD. A simplified procedure for the reduction and alkylation of cysteine residues in proteins prior to proteolytic digestion and mass spectral analysis. Anal Biochem. 2004;333(1):174–181. doi: 10.1016/j.ab.2004.04.013. [DOI] [PubMed] [Google Scholar]
  • 115.McInerney JO. GCUA: General codon usage analysis. Bioinformatics. 1998;14(4):372–373. doi: 10.1093/bioinformatics/14.4.372. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1501334112.sd01.docx (240.7KB, docx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES