Skip to main content
Data in Brief logoLink to Data in Brief
. 2017 Sep 22;15:203–207. doi: 10.1016/j.dib.2017.09.020

Dataset for the proteomic and transcriptomic analyses of perivitelline fluid proteins in Pomacea snail eggs

Huawei Mu a, Jin Sun b, Horacio Heras c,d, Ka Hou Chu e, Jian-Wen Qiu a,
PMCID: PMC5633349  PMID: 29021998

Abstract

This article describes how the proteomic and transcriptomic data were produced during a study of the reproductive proteins of Pomacea maculata, an aquatic apple snail laying colorful aerial eggs, and provides public access to the data. The data are related to a research article titled ‘An integrated proteomic and transcriptomic analysis of perivitelline fluid proteins in a freshwater gastropod laying aerial eggs’ (Mu et al., 2017) [1]. RNA was extracted from the albumen gland and other tissues and sequenced on an Illumina Hiseq. 2000. The assembled transcriptome was translated into protein sequences and then used for protein identification. Proteins from the perivitelline fluid of P. maculata were separated in SDS-PAGE and analyzed by LTQ-Orbitrap Elite coupled to an Easy-nLC. The translated transcriptome data are provided in this article. Proteomic data (.raw file format) are available via ProteomeXchange with the identifier PXD006718.

Keywords: Apple snail, Perivitelline fluid, Proteome, Transcriptome


Specifications Table

Subject area Biology
More specific subject area Apple snail proteomics and transcriptomics
Type of data Table and.raw file
How data was acquired SDS-PAGE, strong cation exchange (SCX) chromatography, and LTQ-Orbitrap Elite coupled to an Easy-nLC were used to acquire the proteomic data;
Illumina Hiseq. 2000 sequencing was applied to acquire the transcriptomic data.
Data format Analyzed
Experimental factors Perivitelline fluid of snail eggs laid within 12 h.
Experimental features Mass spectrometry was applied to determine the proteome profile of the egg perivitelline fluid, and transcriptome sequencing was used to determine differential gene expression in different tissues.
Data source location Department of Biology, Hong Kong Baptist University, Hong Kong, China
Data accessibility Data are available via ProteomeXchange with the identifier PXD006718
Related research article An integrated proteomic and transcriptomic analysis of perivitelline fluid proteins in a freshwater gastropod laying aerial eggs [1]

Value of the data

  • This dataset provides a comprehensive proteomic profile of perivitelline fluid of the apple snail Pomacea maculata. The proteomic data which were obtained from state-of-the-art mass spectrometry analysis can be used for protein identification, especially for reproductive proteins in gastropods.

  • This dataset also provides translated transcriptomic profiles of the albumen gland and other tissues of Pomacea maculata. The translated transcriptome can be used as the database to support protein identification in gastropods.

  • The data presented here can be used for studies of protein function and evolution in gastropods.

1. Data

Pomacea maculata is a freshwater snail native to South America that has invaded many regions of the world [2]. There is considerable interest in the reproductive biology of this species [3], [4], but a lack of genomic resources has hindered such studies at the molecular level. We extracted the RNA from the albumen gland and other tissues, and sequenced them on Illumina Hiseq. 2000 to generate a database to support protein identification. Table 1 shows the number of contigs and unigenes in the assembled transcriptome, as well as the quality of the data. Table S2 contains 44,350 protein sequences which were translated from the transcriptome. These sequences were used for protein identification as described below. Proteins were extracted from the perivitelline fluid of newly laid eggs, fractionated using SDS-PAGE and analyzed with LTQ-Orbitrap Elite coupled to an Easy-nLC. The data files (.raw) generated by mass spectrometry was converted into.mgf files using Proteome Discovery 1.3.0.339 and searched against the protein database in Mascot 2.3.2 and they were deposited in ProteomeXchange.

Table 1.

Statistics of transcriptome assembly quality of albumen gland (AG) and other tissues (OT) in Pomacea maculata.

Sample name Total Number Total Length(nt) Mean Length(nt) N50 Total Consensus Sequencesa Distinct Clustersb Distinct Singletonsc
Contig Pm_AG 179,342 52,426,027 292 427
Pm_OT 211,148 72,423,210 343 557


 

 

 

 

 

 

 

 


Unigene Pm_AG 92,567 52,283,278 565 878 92,567 13,896 78,671
Pm_OT 130,305 81,931,694 629 1109 130,305 27,438 102,867
Pm_AG&OT 105,349 82,687,751 785 1332 105,349 26,056 79,293
a

All the assembled unigenes.

b

The cluster unigenes; The same cluster contains many high similar unigenes which have more than 70% of similarity, and these unigenes may come from homologous or same gene.

c

Distinct Singletons represents this unigene come from a single gene.

2. Experimental design, materials and methods

2.1. Animal culture

The Pomacea maculata adults originally collected from a river in San Pedro, Argentina (33°39′35.97″ S, 59°41′52.86″ W) were transported to Hong Kong Baptist University and cultured at 25±1 °C with dechlorinated tap water. Fish food, lettuce and carrot were fed to the snails. Egg clutches deposited by the snails on the walls of aquaria were used for protein extraction.

2.2. RNA extraction and transcriptome sequencing

In order to establish a database for protein identification and detect tissue specific genes, transcriptomes of albumen gland (AG) and other tissue (OT; including foot, mantle and visceral mass) were sequenced. Total RNA of AG and OT was extracted using TRIzol reagent (Invitrogen, Carlsbad, USA) following the manufacturer's protocol except two minor modifications: A mixed solution of 0.8 M Na3C6H5O7 and 1.2 M NaCl was added before the isopropanol step; A LiCl solution (final concentration 2 M) was added after resuspension of RNA pellets with RNase-free water. The messenger RNA was collected and reverse-transcribed into cDNA, and sequenced on an Illumina Hiseq. 2000 to produce 100 base pair of pair-ended reads. Clean reads were assembled using Trinity (release 20130225) [5]. The assembly statistics of the AG and OT transcriptomes are showed in Table 1. Assembled sequences were annotated using BLASTx by searching against public databases (NCBI nr, Swissprot, COG and KEGG) with an E-value threshold of 1×e5 [6], [7]. Amino acid sequences were translated from the assembled sequences and used as the database for protein identification (Table 2).

2.3. Egg mass collection, protein extraction and mass spectrometry

Egg masses were washed with MilliQ water and then air-dried. A sterile needle was used to crack the egg shells gently and a pipette with a fine tip was used to collect the perivitelline fluid (PVF). PVF was stored in 8 M urea, homogenized, and centrifuged. Supernatant solution was collected, purified, and protein concentration was determined using RC-DC kit (Bio-Rad). There were three biological replicates which were collected from different egg masses.

The protein solutions were mixed with a SDS-PAGE buffer (0.05% bromophenol blue, 50% glycerol, 10 mM dithiothreitol, 0.2 M Tris–HCl pH=6.8, and 10% SDS) with a ratio of 3:1 (v/v), heated at 105 °C for 5 min, and separated by SDS-PAGE. Sample gels were stained with Coomassie Brilliant Blue and destained with 1% acetic acid and MilliQ. Each biological replicate was divided into 10 fractions. For each fraction, gels were cut into small pieces and further destained with a mixed solution of 50% methanol and 50 mM NH4HCO3, and then washed with MilliQ, 100% ACN, and 100 mM NH4HCO3 sequentially. Then 10 mM of dithiothreitol was applied to reduce the disulfide bonds, and 55 mM of iodoacetamide was used to alkylate the sulfhydryl groups. Each gel fraction was then digested using sequencing grade trypsin in 50 mM NH4HCO3 for 16 h. The peptide solutions were recovered, desalted with Sep-Pak C18 cartridges, and dried in a vacuum concentrator.

Each fraction from the three biological samples was reconstituted using 0.1% formic acid and analyzed twice with a LTQ-Orbitrap Elite coupled to an Easy-nLC as described previously [8]. In short, peptides from each fraction were separated in a C18 capillary column. Mass spectrometry scans over a range of 350–1600 m/z were conducted with a resolution of 60,000 under the positive charge mode. The top five abundant multiple-charged ions which had a minimum signal threshold of 500.0 were selected for fragmentation using collision-induced dissociation (CID) and high-energy collision-induced dissociation (HCD). Both CID and HCD scanning strategies used an isolation width of 2.0 m/z. The CID fragmentation adopted an activation time of 10 ms and a normalized collision energy of 35%; The HCD fragmentation also used an activation time of 10 ms but the normalized collision energy was 45%.

2.4. Protein identification

The raw MS/MS files were converted into.mgf files (Raw data are available via ProteomeXchange with identifier PXD006718) using Proteome Discovery 1.3.0.339, and searched against the P. maculata database with 77584 protein sequences containing both ‘decoy’ and ‘target’ sequences using Mascot version 2.3.2. The parameters were similar to those described in Mu et al. [9] except that the fixed modification was set as cysteine carbamidomethylation and the maximum number of missed cleavage of trypsin was set as one. Peptides having an ion score ≥22 (corresponding to 95% confidence) were kept. Peptides which had more than nine amino acids were retained and 1% of false discovery rate threshold was adopted in the protein identification. Proteins which had at least three matched peptides and were detected in at least two replicates were kept.

Acknowledgements

We would like to acknowledge the support by the Research Grants Council (Project no. HKBU 12301415), Hong Kong Baptist University (Project no. FRG1/14-15/026), and China and Agencia Nacional de Promoción Científica y Tecnológica, Argentina (Project no. 0850).

Footnotes

Transparency document

Transparency document associated with this article can be found in the online version at doi:10.1016/j.dib.2017.09.020.

Appendix A

Supplementary data associated with this article can be found in the online version at doi:10.1016/j.dib.2017.09.020.

Transparency document. Supplementary material

Transparency document

mmc1.docx (14.7KB, docx)

.

Appendix A. Supplementary material

Supplementary material

mmc2.xlsx (5.6MB, xlsx)

.

References

  • 1.Mu H., Sun J., Heras H., Chu K.H., Qiu J.-W. An integrated proteomic and transcriptomic analysis of perivitelline fluid proteins in a freshwater gastropod laying aerial eggs. J. Proteom. 2017;155:22–30. doi: 10.1016/j.jprot.2017.01.006. [DOI] [PubMed] [Google Scholar]
  • 2.Hayes K.A., Burks R.L., Castro-Vazquez A., Darby P.C., Heras H., Martín P.R., Qiu J.-W., Thiengo S.C., Vega I.A., Wada T., Yusa Y., Burela S., Cadierno P., Cueto J.A., Dellagnola F.A., Dreon M.S., Frassa M.V., Giraud-Billoud M., Godoy M.S., Ituarte S., Koch E., Matsukura K., Pasquevich M.Y., Rodriguez C., Saveanu L., Seuffert M.E., Strong E.E., Sun J., Tamburi N.E., Tiecher M.J., Turner R.L., Valentine-Darby P.L., Cowie R.H. Insights from an integrated view of the biology of apple snails (Caenogastropoda: Ampullariidae) Malacologia. 2015;58:245–302. [Google Scholar]
  • 3.Giglio M.L., Ituarte S., Pasquevich M.Y., Heras H. The eggs of the apple snail Pomacea maculata are defended by indigestible polysaccharides and toxic proteins. Can. J. Zool. 2016;94:777–785. [Google Scholar]
  • 4.Pasquevich M.Y., Dreon M.S., Heras H. The major egg reserve protein from the invasive apple snail Pomacea maculata is a complex carotenoprotein related to those of Pomacea canaliculata and Pomacea scalaris. Comp. Biochem. Physiol. B. 2014;169:63–71. doi: 10.1016/j.cbpb.2013.11.008. [DOI] [PubMed] [Google Scholar]
  • 5.Grabherr M.G., Haas B.J., Yassour M., Levin J.Z., Thompson D.A., Amit I., Adiconis X., Fan L., Raychowdhury R., Zeng Q., Chen Z., Mauceli E., Hacohen N., Gnirke A., Rhind N., di Palma F., Birren B.W., Nusbaum C., Lindblad-Toh K., Friedman N., Regev A. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011;29:644–652. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Tse W.K.F., Sun J., Zhang H., Law A.Y.S., Yeung B.H.Y., Chou S.C., Qiu J.-W., Wong C.K.C. Transcriptomic and iTRAQ proteomic approaches reveal novel short-term hyperosmotic stress responsive proteins in the gill of the Japanese eel (Anguilla japonica) J. Proteom. 2013;89:81–94. doi: 10.1016/j.jprot.2013.05.026. [DOI] [PubMed] [Google Scholar]
  • 7.Sun J., Wang M., Wang H., Zhang H., Zhang X., Thiyagarajan V., Qian P.-Y., Qiu J.-W. De novo assembly of the transcriptome of an invasive snail and its multiple ecological applications. Mol. Ecol. Resour. 2012;12:1133–1144. doi: 10.1111/1755-0998.12014. [DOI] [PubMed] [Google Scholar]
  • 8.Sun J., Zhang H., Wang H., Heras H., Dreon M.S., Ituarte S., Ravasi T., Qian P.-Y., Qiu J.-W. First proteome of the egg perivitelline fluid of a freshwater gastropod with aerial oviposition. J. Proteome Res. 2012;11:4240–4248. doi: 10.1021/pr3003613. [DOI] [PubMed] [Google Scholar]
  • 9.Mu H., Sun J., Fang L., Luan T., Williams G.A., Cheung S.G., Wong C.K.C., Qiu J.-W. Genetic basis of differential heat resistance between two species of congeneric freshwater snails: insights from quantitative proteomics and base substitution rate analysis. J. Proteome Res. 2015;14:4296–4308. doi: 10.1021/acs.jproteome.5b00462. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Transparency document

mmc1.docx (14.7KB, docx)

Supplementary material

mmc2.xlsx (5.6MB, xlsx)

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES