Skip to main content
Scientific Data logoLink to Scientific Data
. 2024 Nov 6;11:1201. doi: 10.1038/s41597-024-04001-9

Chromosome-level genome assembly of the ivory shell Babylonia areolata

Yu Zou 1,2,#, Jingqiang Fu 1,2,3,#, Yuan Liang 1,2, Xuan Luo 1,2, Minghui Shen 4, Miaoqin Huang 1,2, Yexin Chen 5, Weiwei You 1,2,, Caihuan Ke 1,2,
PMCID: PMC11542075  PMID: 39505919

Abstract

The ivory shell Babylonia areolata is an economically important marine benthic gastropod known for its rapid growth and high nutritional value. B. areolata is distributed in Southeast Asia and the southeast coastal areas of China. In this study, we constructed a high-quality genome for B. areolata using PacBio, Illumina, and Hi-C sequencing technologies. The genome assembly comprised 35 chromosomal sequences with a total length of 1.65 Gb. The scaffold and contig N50 lengths were 53.17 Mb and 2.64 Mb, respectively, with repeat sequences constituting 64.46% of the genome. Furthermore, 26,130 protein-coding genes and 96.75% of the genome’s BUSCOs were identified. This inaugural report of a B. areolata genome provides crucial foundational information for further investigations into the biology, genomics, and genetic improvement of economic traits of this species.

Subject terms: Genome, Genomics

Background & Summary

Babylonia is a carnivorous benthic shellfish (Fig. 1). The genus Babylonia (Schluter, 1838) belongs to the phylum Mollusca, class Gastropoda, subclass Caenogastropoda, order Neogastropoda, and was originally classified as a genus in the family Buccinidae1. However, the latest classification has placed the species in a separate family, the Babyloniidae2. The 11 species in this family are restricted to the Indo-Pacific region1. In its natural environment, Babylonia is distributed in shallow sea areas having water depths of several to tens of meters in the subtidal zone. The species have various requirements for the sea bottom quality, generally preferring a sandy, muddy, or silted sea bottom. The activity of Babylonia follows a circadian rhythm, with the mollusk lurking in the sand and mud during the day and exposing the water pipe, only moving slightly with the rising and falling of the tides and emerging at night with the help of gastropods to forage for food. All species of Babylonia are edible. At present, only Babylonia areolata and B. lutosa have been successfully cultivated3,4.

Fig. 1.

Fig. 1

A photo of the B. areolata specimen used for the genome sequencing.

The ivory shell (Babylonia areolata, Link 1807) is distributed from Sri Lanka and Nicobar Islands, across the Gulf of Thailand, along the coast of Vietnam, and the southeast coastal areas of China1,47. B. areolata is currently the mainly cultivated species, and China produces the largest amounts of B. areolata in the world8. At present, B. areolata breeding in China occurs in the coastal areas south of Fujian Province, where Hainan Province is the most suitable area for the breeding of B. areolata seedlings due to its unique geographical environment and climatic conditions9. The ivory shell is popular with consumers, and the ivory shell aquaculture industry has developed rapidly over the past two decades. The annual output value of the ivory shell industry presently exceeds 10 billion yuan. Since 2011, a genetic improvement program for growth traits has been carried out in B. areolata10. By using the traditional selective breeding techniques, the new variety “Haitai No.1” was successfully cultivated through four successive generations of selective breeding, with shell length and body weight as the target traits11. With the pollution of mariculture environments and climate changes, disease outbreaks and large-scale mortality have frequently occurred, causing huge losses to the ivory shell breeding industry12. Therefore, it is important to carry out genetic improvement studies on B. areolata from various aspects, including stress and disease resistance. However, the lack of genome resources has limited the analysis of genetic mechanisms and application of breeding traits as well as the study of other biological characters.

Neogastropoda, comprising over 15,000 species and accounting for more than one-fifth of existing mollusks13, currently has chromosome-level genome data publicly available for only a few species1418. Furthermore, there is significant variability in the genomic annotation quality for these species. In this study, we present the first chromosome-level genome assembly for B. areolata. The genome was assembled by integrating a combination of PacBio Continuous Long Reads (CLR) sequencing, Illumina sequencing, and high-throughput chromatin capture (Hi-C) sequencing technology. The final assembly genome size was 1.65 Gb, with 96.7% assembly completeness as analyzed by BUSCO19. The contig and scaffold N50 lengths for the genome were 2.64 and 53.17 Mb, respectively. The genome contained a large proportion of repeat sequences (64.48%), and 26,130 protein-coding genes were predicted in the genome of B. areolata, which is the genome data with the highest annotation quality among the publicly available genomes of Neogastropoda currently. In summary, this high-quality chromosomal-level genome establishes a foundation for investigating the biological characteristics of the species in Neogastropoda and facilitating genetic improvement of B. areolata.

Methods

Sample collection

The experimental material for this study was a wild, adult female B. areolata obtained off the coast of Zhangzhou, Fujian, China. The specimen was transported to the laboratory at Xiamen University and provisionally accommodated in a plastic container (60 × 40 × 25 cm) filled with 5 cm sea sand. The temporary habitat utilized filtered natural seawater passed through a 0.45 μm membrane filter, with the salinity level maintained at 28–32 ppt (parts per thousand) and the water temperature regulated at 25 ± 1 °C.

Three days post-adaptation, the specimen was anesthetized using a 2% solution of ethanol absolute in seawater for 10 minutes. Following confirmation of complete muscular relaxation in the foot and the absence of tactile response, the shell was carefully opened to access and excise the epidermal layer of the foot muscle, thereby exposing the internal musculature for sampling. In addition to the foot muscle, other 11 tissues harvested for analysis included the proboscis, the cephalic region, the ocular structures and tentacles, siphonal tissue, renal tissue, and intestinal, mantle, gill, heart, liver, and gonadal tissue. Post-collection, these tissues were rinsed in 1 × PBS (Phosphate-Buffered Solution) solution and promptly preserved in cryovials. The tissue samples were rapidly frozen in liquid nitrogen for 24 hours followed by long-term storage at −80 °C before sample preparation.

Sample extraction and sequencing

The genomic DNA of B. areolata was extracted from the foot muscle tissue using the phenol/chloroform method. The construction and sequencing of the Illumina platform’s genomic DNA library were conducted following the manufacturer’s guidelines20. The process involved fragmenting the high-quality DNA to 300–350 bp using ultrasonication, followed by end-repairing, poly(A) tailing, adapter ligation, purification, and subsequent library preparation. The resulting genomic DNA library was sequenced on the Illumina HiSeq X platform utilizing the 150PE (paired-end) mode to generate 172.46 Gb raw data. After the quality filtering with fastp21 (version 0.20.0), 154.85 Gb clean data of the B. areolata genome were retained (Table 1) with a Q20 rate of 95.08%. Concurrently, a PacBio library comprising 20 kb fragments from the same individual’s foot muscle genomic DNA was constructed and sequenced using three SMRT flow cells on the PacBio Sequel system22. The PacBio CLR sequencing obtained 185.00 Gb (115.62× coverage) of long sequencing reads for the genome assembly (Table 1).

Table 1.

Sequencing data used for the B. areolata genome assembly.

Types Method Library size (bp) Clean data (Gb) Average length (bp) Coverage (×)
Genome Illumina 300–350 154.85 150 96.78
PacBio 20,000 185.00 11,788 115.62
Hi-C 150.48 150 94.05
Transcriptome Illumina 250–300 24.28 149

The coverage was calculated using an estimated genome size of 1.6 Gb.

A Hi-C library was constructed following the methods of previous studies23. The procedure encompassed cross-linking the chromatin in the foot muscle cell nucleus with a 1% formaldehyde solution, enzymatically cleaving the chromatin using the restriction enzyme MboI, labeling the repaired sticky ends with biotin, joining the fragments with T3 ligase, fragmenting the DNA through ultrasonication, and isolating biotin-labeled DNA fragments. The Hi-C library was then sequenced on the Illumina HiSeq X platform, producing 150.48 Gb of Hi-C clean data after quality control (Table 1).

RNA was extracted from all 12 tissues of B. areolata. Tissue samples were collected as described in the previous step, and the TRIZOL Reagent (Invitrogen, USA) was used for RNA extraction. The extracted RNA from homogenized tissues was mixed in equal amounts. The qualified RNA was then eluted, purified, and sequenced on the Illumina HiSeq X platform in 150PE mode. Finally, 24.28 Gb RNA-seq clean data were obtained (Table 1).

Genome survey and contig assembly

To characterize the genome of B. areolata, the genome size, repetitive sequence content, and heterozygosity were assessed using the Kmer-based method and employing the Illumina sequencing data24. The clean data underwent analysis with GCE25 software (version 1.0.0) utilizing a Kmer size of 17 (K = 17). Consequently, we analyzed the Kmer frequency distribution of the B. areolata genome (Fig. 2) and estimated a revised genome size of 1583.04 Mb, with a heterozygosity rate of 1.25%. The genome contained an average of 60.81% repetitive sequences.

Fig. 2.

Fig. 2

K-mer frequency of B. areolata. The first and second peaks indicated the homozygous and heterozygous Kmers, respectively.

In this study, the contig assembly of the B. areolata genome employed a hybrid assembly approach that leveraged the high accuracy of Canu26 (version 2.1.1) and the high contiguity of wtdbg227 (version 2.5). Using the primary parameters 'minReadLength = 2000 minOverlapLength = 500 corOutCoverage = 40 corMinCoverage = 2' the Canu assembly comprises three steps: correction, trimming, and assembly. Then, the original CLR subreads data are aligned to the preliminary assembled contig sequences using pbmm2 (version 1.2.0), followed by one round of polishing using the Arrow program. Subsequently, BWA (version 0.7.17) was used to align the Illumina short reads to the contig sequences followed by a round of polishing using Pilon to obtain 1.81 Gb of Canu-assembled genome contig sequences. Wtdbg2 assemblies with parameters '-p 19 -AS 2 -s 0.05 -L 5000' were based on the fast global alignment and fuzzy-Bruijn graph (FBG) algorithm. The preliminarily assembled contig underwent one polishing with Arrow and two rounds with Pilon to obtain 1.82 Gb of wtdbg2-assembled genome contig sequences. The contig sequences assembled by Canu and wtdbg2 were initially filtered for shorter heterozygous and repetitive sequences using Purge Haplotigs28 (version 1.1.0) by self-defined parameter '-a 99'. The filtered contig sequences from the Canu and wtdbg2 assemblies were then split into sequences with window lengths of 20 kb and 50 kb and step lengths of 5 kb and 10 kb, respectively. These split-sequence data served as input files for the Canu assembly to run the assembly steps, and we obtained the merged 3.13 Gb genome raw contig sequences. After two iterations of heterozygosity filtering of the raw sequences using Purge Haplotigs, a final contig assembly of 1.65 Gb was obtained, and this assembly comprised 1,693 contigs with a contig N50 length of 2.69 Mb (Table 2).

Table 2.

Assembly statistics of B. areolata.

Sequence type Contig length (bp) Contig number Scaffold length (bp) Scaffold number
Total 1,651,840,551 1,738 1,651,990,951 234
Max 12,580,597 102,942,266
N50 2,638,925 182 53,174,447 13
N60 1,996,305 255 47,247,324 17
N70 1,514,876 350 43,008,941 20
N80 1,037,548 480 40,459,595 24
N90 561,334 691 28,280,870 29

Chromosome-level genome assembly using Hi-C data

To obtain a chromosome-level genome sequence of B. areolata, the Juicer29 software (version 1.6) was employed for chromosome scaffolding to align the 150.48 Gb of filtered Hi-C reads to the assembled contig sequences. This step involved filtering out reads that were excessively distant from the restriction site, those that were only single-end aligned, and duplicate reads. The retained reads were then utilized for genome-assisted assembly. The 3D-DNA30 (version 180419) facilitated the genome-assisted assembly process where the contigs were clustered, ordered, and phased, preliminarily linking them into chromosomal structures. Following this, the Juicebox31 software was employed for manual corrections. Finally, we assembled the 1,469 contigs into 35 chromosomes representing 99.54% of the total genome length. The contig and scaffold N50 values of the final chromosome assembly were 2.64 Mb and 53.17 Mb, indicating a high level of continuity and completeness in the final chromosome assembly (Table 2, Fig. 3).

Fig. 3.

Fig. 3

The Hi-C interaction heatmap for the B. areolata genome. Over 99% of the genome sequences were anchored into 35 chromosomes.

Gene prediction and functional annotation

The annotation of the B. areolata genome was divided into two primary sections: annotation of repeat sequences and annotation of genes. Gene annotation was further categorized into the prediction of gene structure and gene function annotation.

The annotation of repeat elements employed both ab initio and homology-based prediction. For the homology-based prediction, RepeatProteinMask and RepeatMasker32 (version 4.1.2-p1) were employed to align the genome sequence to the Repbase33 database (version 20181026) that contains known eukaryotic DNA repeat sequences. For the ab initio prediction, LTR_retriever34 (version 2.9.0) and RepeatModeler35 (version 2.0.1) were used to analyze the repeat sequences in the B. areolata genome and create a custom repeat sequence library. RepeatMasker then utilized this library to identify repeat sequences in the genome. Tandem Repeats Finder36 (TRF) was used to annotate tandem repeat sequences. Ultimately, we identified 1,065.16 Mb of non-redundant repetitive sequences, accounting for 64.48% of the genome sequence length. Among these, transposable element sequences were employed for repeat masking during the gene prediction process (Table 3).

Table 3.

General statistics of predicted repeat sequences.

Type Repeats length (bp) Percent in genome (%)
Tandem repeat 423,258,187 25.61
DNA 329,732,316 19.95
LINE 246,207,857 14.90
SINE 76,756,240 4.64
LTR 186,740,905 11.30
Satellite 47,998,532 2.90
Simple repeat 7,566,655 0.46
Other 38,980 0.00
Unknown 197,689,235 11.96
Total 1,065,155,833 64.48

In this research, gene structure prediction was performed using three methodologies: de novo, homology-based, and transcriptome-based prediction. Specifically, the Augustus program (version 3.5.0) was employed to identify protein-coding regions within the genome, explicitly excluding transposable elements (TEs). Protein sequences from six diverse species, Aplysia californica (GCF_000002075.1), Biomphalaria glabrata (GCF_000457365.1), Crassostrea gigas (GCF_902806645.1), Lottia gigantea (GCF_000327385.1), Octopus bimaculoides (GCF_001194135.1), and Pomacea canaliculata (GCF_003073045.1), were retrieved from the National Center for Biotechnology Information (NCBI). These sequences were aligned against the genome utilizing tblastn37 (version 2.12.0). The alignment results were consolidated using Solar38 (version 0.9.6), and gene structures were inferred using Exonerate39 (version 2.4.0) for genes homologous to the referenced species. Subsequently, next-generation sequencing (NGS) transcriptome data were mapped to the genome via Hisat240 (version 2.1.0). Potential exonic regions were analyzed using Stringtie41 (version 2.0.4). These identified exon structures were employed to predict protein-coding gene structures using TransDecoder (https://github.com/TransDecoder/TransDecoder, version 5.5.0). The gene sets derived from these methods were amalgamated using MAKER42 (version 3.01.03). This integration produced a comprehensive and non-redundant gene set encompassing 26,130 protein-coding genes (Table 4). The predicted protein sequences were subsequently aligned with entries from several protein databases, including NCBI non-redundant protein, SwissProt43, TrEMBL43, eggNOG44, and InterPro45, to determine protein functions. Ultimately, 22,216 (85.02%) genes showed at least one alignment with these databases (Table 5).

Table 4.

General statistics of predicted protein-coding genes.

Gene set Number Average gene length (bp) Average CDS length (bp) Average exons per gene Average exon length (bp) Average intron length (bp)
De novo Augustus 94,011 3,326.70 957.83 2.54 376.97 1,537.39
Homolog A. californica 35,090 10,853.12 753.54 3.85 195.84 3,546.50
B. glabrata 62,376 5,098.33 594.69 2.43 244.83 3,151.64
C. gigas 34,513 10,836.62 812.54 3.66 221.95 3,767.16
L. gigantea 68,001 4,711.13 549.92 2.35 233.94 3,080.75
O. bimaculoides 25,588 10,381.59 712.54 3.59 198.61 3,736.71
P. canaliculata 35,789 14,391.41 922.98 4.79 192.56 3,550.67
RNAseq 15,978 26,694.50 1,319.14 8.36 452.32 3,113.32
MAKER 26,130 23306.82 1,528.20 7.92 370.78 2,945.81

Table 5.

General statistics of gene function annotation.

Type Number Percent (%)
Total 26,130 100.00
Annotated InterPro 17,386 66.54
GO 12,055 46.13
KEGG 12,769 48.87
Swiss-Prot 17,496 66.96
TrEMBL 21,221 81.21
NR 21,672 82.94
Annotated 22,216 85.02
Unannotated 3,914 14.98

Data Records

The genomic Illumina sequencing data were submitted to NCBI with the accession number SRR2995055546.

The genomic PacBio sequencing data were submitted to NCBI with the accession number SRR2995055447.

The Hi-C sequencing data were submitted to NCBI with the accession number SRR2995055648.

The transcriptome Illumina sequencing data were submitted to NCBI with the accession number SRR2995055749.

The chromosome assembly of B. areolata genome sequences was submitted to NCBI with the accession number of JBFRHL00000000050.

The annotation of the B. areolata genome was submitted to figshare51.

Technical Validation

Assessment of genome assembly and annotation quality

The DNA and RNA extracted from the muscle tissues of 12 samples successfully passed quality control benchmarks. For the DNA extraction from muscle tissue, quality was assessed by a distinct band around the 20 kb region, accompanied by an absorbance ratio (A260/A280) of at least 1.8. For RNA, quality was determined using a Nanodrop ND-1000 spectrophotometer (LabTech, USA), requiring an absorbance ratio (A260/A280) of greater than 1.7, indicative of satisfactory RNA purity and low protein contamination. Further assessment of RNA integrity was performed using a 2100 Bioanalyzer (Agilent Technologies, USA), with samples deemed acceptable if they exhibited an RNA Integrity Number (RIN) exceeding 8.0.

Multiple datasets and bioinformatic tools were utilized to evaluate the integrity and accuracy of the genome assembly. We aligned PacBio CLR data to our chromosome-level genome using minimap252 (version 2.19), demonstrating that 99.88% of the genome was covered at least once, and 99.56% exceeded 10× coverage. Additionally, assembly completeness was further verified using the BUSCO (version 5.2.2) metazoan dataset (metazoa_odb10) that identified 95.4% complete genes, including 857 (89.8%) single-copy and 53 (5.6%) duplicated genes, demonstrating a comprehensive representation of the B. areolata genome. Illumina genome sequencing data from the same individual was aligned to the genome using BWA53, and processing included filtering of unaligned and duplicate reads. Subsequent the Genome Analysis Toolkit54 (GATK, version 4.2) analysis identified SNPs, revealing 308 bp of Illumina sequencing data consistent with the expected insert sizes. This analysis identified 13,806,591 heterozygous and 757,120 homozygous SNPs. The accuracy of gene annotation was initially evaluated using transcriptome data, with 94.86% of transcriptome reads aligning to the genome using STAR55 (version 2.7.9), and 73.94% mapping to annotated exonic regions. The completeness of protein sequences was assessed using the BUSCO metazoan dataset, identifying 854 single-copy and 69 duplicated genes corresponding to 96.7% of the genes in the metazoan database. Taken together, the results collectively affirmed the accuracy and completeness of the B. areolata genome and its annotation.

The phylogenetic position of B. areolata in Neogastropoda

We employed phylogenetic analyses of the B. areolata genome to validate the precision of our genome assembly and annotation. The longest protein-coding gene sequences from the coding sequences (CDS) of 13 species with publicly available genomes were included in the phylogeny. These species were A. californica, B. glabrata, C. gigas, L. gigantea, Monoplex corrugatus16, O. bimaculoides, P. canaliculata, Lanistes nyassanus56, Stramonita haemastoma16, Rapana venosa18, Batillaria attramentaria57, Conus ventricosus15, and Achatina fulica58. Gene family clustering was performed using Orthofinder59 (version 2.5.5), and the second codon position of 148 single-copy gene families was used to create a supergene. This supergene was refined with Gblocks60 (version 0.91b) to isolate conserved sequences that were then used to construct a phylogenetic tree using RAxML61 (version 8.2.12). Species divergence times were estimated using mcmctree in Phylogenetic Analysis by Maximum Likelihood62 (PAML, version 4.10.7), applying the parameters for burnin, sampfreq, and nsample set as 5,000,000, 100, and 100,000, respectively. Fossil calibration points referred to previous studies63,64, including the first appearance of Mollusca between 549 and 532 million years ago65 (Mya), divergence times for Aplysia californica and Lottia gigantea (531.5–470.2 Mya)66, the emergence of Pteriomorpha (at least 465 Mya)67, the span between A. californica and B. glabrata (473.4–168.6 Mya)66, the divergence between Caenogastropoda and Heterobranchia (no earlier than 390 Mya)68, and the split between L. nyassanus and P. canaliculate (no later than 150 Mya)69. As a result, we estimated that the divergence between B. areolata and its nearest relative in the Conidae family, C. ventricosus, occurred around 143 Mya. Furthermore, the simultaneous divergence of four newly sequenced gastropod species was placed at approximately 155.9 Mya (Fig. 4). Additionally, chromosomal collinearity analysis with JCVI70 revealed significant consistency between B. areolata and C. ventricosus15 (Fig. 5), reinforcing the reliability of our genome assembly and annotation for B. areolata. This comprehensive approach confirmed the accuracy of our genomic data and thus has enriched our understanding of the evolutionary relationships within these Molluscan lineages.

Fig. 4.

Fig. 4

The divergence time of B. areolata within the Mollusca. The times in parentheses represent the 95% confidence intervals.

Fig. 5.

Fig. 5

The chromosomal collinearity between B. areolata, C. ventricosus, and P. canaliculate. Compared to P. canaliculate, B. areolata and C. ventricosus exhibited significant chromosomal doubling.

Acknowledgements

This work was supported by the Key Research and Development Program of Hainan Province (ZDYF2022XDNY234), the National Natural Science Foundation of China (32202900), the Outstanding Postdoctoral Scholarship, State Key Laboratory of Marine Environmental Science at Xiamen University.

Author contributions

Caihuan Ke and Weiwei You conceived the study; Jingqiang Fu, Xuan Luo, Minghui Shen, Miaoqin Huang and Yexin Chen collected the samples. Yu Zou performed sequencing and Hi-C experiments; Yu Zou, and Jingqiang Fu estimated the genome size and assembled the genome; Yu Zou and Jingqiang Fu assessed the assembly quality; Yu Zou and Yuan Liang carried out the genome annotation and phylogenetic analysis. Yu Zou, Jingqiang Fu, Yuan Liang, Xuan Luo, Minghui Shen, Miaoqin Huang, Yexin Chen, Caihuan Ke, and Weiwei You wrote the manuscript. Also, all authors read, edited, and approved the final manuscript.

Code availability

No specific code was developed in this work. The data analysis was performed according to the manuals and protocols provided by the developers of the corresponding bioinformatic tools.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Yu Zou, Jingqiang Fu.

Contributor Information

Weiwei You, Email: wwyou@xmu.edu.cn.

Caihuan Ke, Email: chke@xmu.edu.cn.

References

  • 1.Altena, C. O. V. R. & Gittenberger, E. The genus Babylonia (Prosobranchia, Buccinidae). Zoologische Verhandelingen188, 1–57 (1981). [Google Scholar]
  • 2.Harasewych, M. G. & Kantor, Y. I. On the morphology and taxonomic position of Babylonia (Neogastropoda: Babyloniidae). Bollettino Malacologico, 19–36 (2002).
  • 3.Lü, W. et al. Comparison and Optimal Prediction of Goptimal prediction of growth of Babylonia areolata and B. lutosa. Aquaculture Reports1810.1016/j.aqrep.2020.100425 (2020).
  • 4.Guilan, D. et al. Spermatozoan morphology of the snails Babylonia lutosa, Babylonia areolata from parental lines of populations in Hainan and Thailand and hybrid lines. Aquaculture Research52, 952–965, 10.1111/are.14951 (2021). [Google Scholar]
  • 5.Ruangsri, J., Thawonsuwan, J., Wanlem, S. & Withyachumnarnkul, B. Effect of body size and sub-optimal water quality on some hemato-immunological parameters of spotted babylon snail Babylonia areolata. Fisheries Science84, 513–522, 10.1007/s12562-018-1191-8 (2018). [Google Scholar]
  • 6.Dobson, G. T., Duy, N. D. Q., Paul, N. A. & Southgate, P. C. Assessing potential for integrating sea grape (Caulerpa lentillifera) culture with sandfish (Holothuria scabra) and Babylon snail (Babylonia areolata) co-culture. Aquaculture522, 10.1016/j.aquaculture.2020.735153 (2020).
  • 7.Chiu, T.-H., Kuo, C.-W., Lin, H.-C., Huang, D.-S. & Wu, P.-L. Genetic diversity of ivory shell (Babylonia areolata) in Taiwan and identification of species using DNA-based assays. Food Control48, 108–116, 10.1016/j.foodcont.2014.05.032 (2015). [Google Scholar]
  • 8.Lü, W. et al. Evaluation of crosses between two geographic populations of native Chinese and introduced Thai spotted ivory shell, Babylonia areolata, in southern China. Journal of the World Aquaculture Society47, 544–554 (2016). [Google Scholar]
  • 9.Lü, W. et al. Combined effects of temperature, salinity and rearing density on growth and survival of juvenile ivory shell, Babylonia areolata (Link 1807) population in Thailand. Aquaculture Research48, 1648–1665, 10.1111/are.13000 (2017). [Google Scholar]
  • 10.Fu, J. et al. Comparative assessment of the genetic variation in selectively bred generations from two geographic populations of ivory shell (Babylonia areolata). Aquaculture Research48, 4205–4218, 10.1111/are.13241 (2017). [Google Scholar]
  • 11.Fu, J. et al. Changes in low salinity and hypoxia tolerance in F1 hybrids of the ivory shell, Babylonia areolata. Aquaculture Reports36, 10.1016/j.aqrep.2024.102131 (2024).
  • 12.Fu, J. et al. Survival and immune responses of two populations of Babylonia areolata and their hybrids under pathogenic Vibrio challenge. Aquaculture584, 10.1016/j.aquaculture.2024.740646 (2024).
  • 13.Fedosov, A. E. et al. Phylogenomics of Neogastropoda: the backbone hidden in the bush. Systematic Biology, syae010 (2024). [DOI] [PMC free article] [PubMed]
  • 14.Herráez-Pérez, A., Pardos-Blas, J. R., Afonso, C. M., Tenorio, M. J. & Zardoya, R. Chromosome-level genome of the venomous snail Kalloconus canariensis: a valuable model for venomics and comparative genomics. GigaScience12, giad075 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Pardos-Blas, J. R. et al. The genome of the venomous snail Lautoconus ventricosus sheds light on the origin of conotoxin diversity. Gigascience10, giab037 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Farhat, S., Modica, M. V. & Puillandre, N. Whole genome duplication and gene evolution in the hyperdiverse venomous gastropods. Molecular Biology and Evolution40, msad171 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Liu, Z. et al. Chromosome-level genome assembly of the deep-sea snail Phymorhynchus buccinoides provides insights into the adaptation to the cold seep habitat. BMC genomics24, 679 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Song, H. et al. Chromosome-level genome assembly of the caenogastropod snail Rapana venosa. Scientific Data10, 539 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Manni, M., Berkeley, M. R., Seppey, M. & Zdobnov, E. M. BUSCO: assessing genomic data quality and beyond. Current Protocols1, e323 (2021). [DOI] [PubMed] [Google Scholar]
  • 20.Xiao, S. et al. Whole-genome single-nucleotide polymorphism (SNP) marker discovery and association analysis with the eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content in Larimichthys crocea. PeerJ4, e2664, 10.7717/peerj.2664 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chen, S. Ultrafast one‐pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta, e107 (2023). [DOI] [PMC free article] [PubMed]
  • 22.Rhoads, A. & Au, K. F. PacBio sequencing and its applications. Genomics, Proteomics and Bioinformatics13, 278–289 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell159, 1665–1680, 10.1016/j.cell.2014.11.021 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Cai, M. et al. Chromosome assembly of Collichthys lucidus, a fish of Sciaenidae with a multiple sex chromosome system. Scientific Data6, 132 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Liu, B. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. arXiv preprint arXiv1308, 2012 (2013).
  • 26.Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome research27, 722–736 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nature methods17, 155–158 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC bioinformatics19, 1–10 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell systems3, 95–98 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science356, 92–95 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell systems3, 99–101 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Tarailo‐Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics25, 4.10. 11-14.10. 14 (2009). [DOI] [PubMed]
  • 33.Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile Dna6, 1–6 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant physiology176, 1410–1422 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences117, 9451–9457 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research27, 573–580 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. Journal of molecular biology215, 403–410 (1990). [DOI] [PubMed] [Google Scholar]
  • 38.Yu, X. J., Zheng, H. K., Wang, J., Wang, W. & Su, B. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics88, 745–751, 10.1016/j.ygeno.2006.05.008 (2006). [DOI] [PubMed] [Google Scholar]
  • 39.Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC bioinformatics6, 1–11 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature biotechnology37, 907–915 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology33, 290–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Campbell, M. S., Holt, C., Moore, B. & Yandell, M. Genome annotation and curation using MAKER and MAKER‐P. Current protocols in bioinformatics48, 4.11. 11-14.11. 39 (2014). [DOI] [PMC free article] [PubMed]
  • 43.Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic acids research31, 365–370 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic acids research47, D309–D314 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic acids research37, D211–D215 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.NCBI sequence read archive.https://identifiers.org/ncbi/insdc.sra:SRR29950555 (2024).
  • 47.NCBI sequence read archive.https://identifiers.org/ncbi/insdc.sra:SRR29950554 (2024).
  • 48.NCBI sequence read archive.https://identifiers.org/ncbi/insdc.sra:SRR29950556 (2024).
  • 49.NCBI sequence read archive.https://identifiers.org/ncbi/insdc.sra:SRR29950557 (2024).
  • 50.Ke, C. et al. Babylonia areolata Genome sequencing and assembly. GenBank.https://identifiers.org/ncbi/insdc:JBFRHL000000000 (2024).
  • 51.Zou, Y. Genes annotation of Babylonia areolata genome. figshare.10.6084/m9.figshare.26933008 (2024).
  • 52.Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics34, 3094–3100 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997 (2013).
  • 54.McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research20, 1297–1303 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Sun, J. et al. Signatures of divergence, invasiveness, and terrestrialization revealed by four apple snail genomes. Molecular Biology and Evolution36, 1507–1520 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Patra, A. K. et al. Genome assembly of the Korean intertidal mud-creeper Batillaria attramentaria. Scientific Data10, 498 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Guo, Y. et al. A chromosomal-level genome assembly for the giant African snail Achatina fulica. Gigascience8, giz124 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome biology20, 1–14 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular biology and evolution17, 540–552 (2000). [DOI] [PubMed] [Google Scholar]
  • 61.Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics30, 1312–1313 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecular biology and evolution24, 1586–1591 (2007). [DOI] [PubMed] [Google Scholar]
  • 63.Sun, J. et al. The Scaly-foot Snail genome and implications for the origins of biomineralised armour. Nature communications11, 1657 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Lan, Y. et al. Hologenome analysis reveals dual symbiosis in the deep-sea hydrothermal vent snail Gigantopelta aegis. Nature communications12, 1165 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Benton, M. J. et al. Constraints on the timescale of animal evolutionary history. (2015).
  • 66.Hedges, S. B. & Kumar, S. The timetree of life. (OUP Oxford, 2009).
  • 67.Stöger, I. et al. The continuing debate on deep molluscan phylogeny: evidence for Serialia (Mollusca, Monoplacophora+ Polyplacophora). BioMed Research International2013, 407072 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Jörger, K. M. et al. On the origin of Acochlidia and other enigmatic euthyneuran gastropods, with implications for the systematics of Heterobranchia. BMC evolutionary biology10, 1–20 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Hayes, K. A. et al. Molluscan models in evolutionary biology: apple snails (Gastropoda: Ampullariidae) as a system for addressing fundamental questions. American Malacological Bulletin27, 47–58 (2009). [Google Scholar]
  • 70.Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. iMeta, e211 (2024). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. NCBI sequence read archive.https://identifiers.org/ncbi/insdc.sra:SRR29950555 (2024).
  2. NCBI sequence read archive.https://identifiers.org/ncbi/insdc.sra:SRR29950554 (2024).
  3. NCBI sequence read archive.https://identifiers.org/ncbi/insdc.sra:SRR29950556 (2024).
  4. NCBI sequence read archive.https://identifiers.org/ncbi/insdc.sra:SRR29950557 (2024).
  5. Ke, C. et al. Babylonia areolata Genome sequencing and assembly. GenBank.https://identifiers.org/ncbi/insdc:JBFRHL000000000 (2024).
  6. Zou, Y. Genes annotation of Babylonia areolata genome. figshare.10.6084/m9.figshare.26933008 (2024).

Data Availability Statement

No specific code was developed in this work. The data analysis was performed according to the manuals and protocols provided by the developers of the corresponding bioinformatic tools.


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES