Abstract
The semislug Megaustenia siamensis, commonly found in Thailand, is notable for its exceptional capacity to produce biological adhesives, enabling it to adhere to tree leaves even during heavy rainfall. In this study, we generated the first reference genome for M. siamensis using a combination of three sequencing technologies: Illumina’s short-read, Pac-Bio’s HIFI long-read, and Hi-C. The assembled genome size was 2593 billion base pairs (bp), containing 34,882 protein-coding genes. Our analysis revealed positive selection in pathways associated with the ubiquitin–proteasome system. Furthermore, RNA sequencing of foot and mantle tissues unveiled the primary constituents of the adhesive, including lectin-like proteins (C-lectin, H-lectin, and C1q) and matrilin-like proteins (VWA and EGF). Additionally, antimicrobial peptides were identified. The comprehensive M. siamensis genome and tissue-specific transcriptomic data provided here offer valuable resources for understanding its biology and exploring potential medical applications.
Subject terms: Evolution, Genetics
Introduction
Terrestrial gastropods, fascinating invertebrates, originated from marine ancestors and evolved to diverse terrestrial habitats over time. Among these, pulmonates stand out; they possess a pallial lung for gas exchange instead of gills and are divided into three main groups: snails, semislugs, and slugs1. To thrive on land, these gastropods secrete slime or mucus with multiple functions, including adhesion, emollience, moisturization, lubrication, and defense1,2. This mucus contains various bioactive components, such as antibacterial and antioxidant compounds3,4. Consequently, there is a growing interest in exploring terrestrial gastropod mucus for medical, pharmaceutical, and cosmetic applications5–8. Studies have examined its protein components, glycosylation, ion content, and mechanical properties2,4, and advancements in genomic and transcriptomic data enable the reconstruction of snail mucus biosynthetic pathways9. However, most studies into the genomic structure and differential gene expression of terrestrial gastropods have focused predominantly on snails and slugs10–12, neglecting semislugs entirely.
Megaustenia siamensis, commonly known as the Siamese semislug, is abundant and inhabits various environments across Thailand. When disturbed, this creature secretes highly adhesive mucus, aiding in its firm attachment to surfaces. During the dry season, it can retract into its shell and secrete mucus to form an epiphragm over the shell aperture, protecting against desiccation13,14. Consequently, the mucus of the semislug exhibits distinctive properties compared to that of snails and slugs. Leveraging high-throughput sequencing technologies, we integrate genome and transcriptome data of M. siamensis to unveil its genetic characteristics, aiming to understand the properties of its mucus. This research is pivotal for advancing our understanding of semislug evolution and investigating potential medicinal applications of terrestrial gastropod mucus.
Results
M. siamensis genome assembly and annotation
The estimated genome size of M. siamensis is approximately 2.2 Gb, determined through k-mer analysis with short reads (Supplementary Fig. 1). We obtained 234 Gb of raw PacBio long reads, of which 196 Gb were filtered as clean reads. The initial draft assembly, generated using Canu software with default parameters, yielded an assembly of around 3.07 Gb, comprising 5246 contigs with an N50 contig length of 1.5 Mb. Subsequently, Illumina short reads were employed for polishing and error correction, resulting in a refined assembly of approximately 2.6 Gb, with an N50 of 1.8 Mb. The final draft genome assembly involved genomic scaffolding using Hi-C data. A total of 161 scaffolds were anchored into 32 pseudo-chromosomes (2n = 64), with a combined length of 2.59 Gb and an N50 of 84.3 Mb (Supplementary Fig. 2). The number of chromosome-scale scaffolds aligns with other terrestrial pulmonates, which typically range from 2n = 16 to 2n = 66 based on cytogenetic studies15. Genome assembly statistics are presented in Fig. 1A and Table 1. Evaluation of the genome completeness of M. siamensis, conducted by searching for 954 single-copy metazoan genes using Benchmarking Universal Single-Copy Orthologs (BUSCO), revealed a completeness level of 86.9%, indicating its adequacy as a genomic reference resource.
Figure 1.
Summary statistics of M. siamensis genome (A). Bar chart with summary assessments for the proportion of genes present in snails. The summary assessment shows the percentage of complete and single copy genes (Light blue), complete and duplicated genes (Dark blue), fragmented genes (Yellow blue), and missing genes in the assemblies (Red blue) (B). Transposable element divergence landscape within the M. siamensis genome (different classes represented by different colors) (C).
Table 1.
Genome statistics of the Megaustenia siamensis.
Assembly | Megaustenia siamensis |
---|---|
Genome size (bp) | 2593626580 |
Number of scaffolds | 161 |
Scaffold N50 | 84301762 |
Number of protein-coding genes | 34,882 |
Repeat content (%) | 60.69 |
GC content (%) | 38.24 |
Complete BUSCO (%) | 85.9 |
Complete and Single-copy BUSCO (%) | 79.9 |
Complete and Duplicated BUSCO (%) | 6.0 |
Fragmented BUSCO (%) | 1.8 |
Missing BUSCO (%) | 12.3 |
Total number of metazoa_odb10 | 954 |
We further compared the proportion of genes present in 13 mollusk genomes (Supplementary Table 1 and Fig. 1B). The genome assembly size of the semislug M. siamensis is notably larger than that of other terrestrial snails and slugs (approximately 1.8 Gb for Lissachatina fulica, 1.5 Gb for Arion vulgaris, and 1.2 Gb for Candidula unifasciata)10–12. The GC content of M. siamensis is 38.24%, closely resembling that of A. vulgaris (38.46%). Repetitive elements constitute 60.69% of the genome (Fig. 1C and Supplementary Table 2). Annotation of the M. siamensis genome revealed a total of 34,882 protein-coding genes. Among these predicted protein-coding genes, approximately 58% could be annotated through at least one of the following protein-related databases: the EggNOG database (20,197; 57.90%), the Swiss-Prot protein database (14,181; 40.65%), the protein families (Pfam) database (18,863; 54.08%), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (11,103; 31.83%).
Gene family clustering
A gene family is a set of genes with similar structures or functions, indicating adaptive evolution. Therefore, we investigate the adaptive evolution of M. siamensis by examining the relationship of gene families among 10 gastropod and 3 bivalve species. We identified 47,370 orthologous gene families containing a total of 668,284 genes. Among these, 1673 orthogroups were shared by all 14 species, with 26 of them represented as single-copy orthogroups (Fig. 2A). Illustrating the shared orthogroups among terrestrial snails and slugs, the Venn diagram revealed that 5326 gene families were shared among the slug A. vulgaris, the snail C. unifasciata, and M. siamensis (Fig. 2B). Out of these, 731 gene families were specific to M. siamensis, including 15 gene ontology (GO) enrichments (Supplementary Table 3). The top three significantly enriched gene families were related to the cellular component of the nucleus (GO: 0005634, P = 3.61e − 7, 8 genes), the biological process of xenobiotic metabolic process (GO: 0006805, P = 5.58e − 7, 7 genes), and transposition, DNA-mediated (GO: 0006313, P = 7.92e − 7, 5 genes). Additionally, two gene families concerning bacteria were significantly enriched: response to bacterium (GO: 0009617, P = 3.68e − 4, 4 genes), and negative regulation of defense response to bacterium, incompatible interaction (GO:1902478, P = 4.05e − 4, 2 genes), along with one concerning the regulation of innate immune response (GO:0045088, P = 1.32e − 3, 2 genes). Three significant GO enrichments were found between M. siamensis and C. unifasciata (Supplementary Table 4). Most proteins were significantly enriched in the gene families associated with the molecular function of G-protein-coupled receptor activity (GO: 0004930, P = 2.02e − 13, 20 genes), followed by the positive regulation of cytosolic calcium ion concentration (GO:0004930, P = 3.22e − 06, 11 genes), and outer dynein arm assembly (GO:0036158, P = 7.34e − 05, 8 genes). However, no significant GO enrichment was observed between M. siamensis and A. vulgaris.
Figure 2.
Orthologous gene cluster shared among the fourteen species (A). Venn diagram showing the distribution of gene families (orthologous clusters) among M. siamensis, A. vulgaris, and C. unifasciata (B).
Phylogenetic construction and divergence time estimation
To understand the genomic evolution of M. siamensis, we analyzed a set of 26 single-copy orthologous genes shared among 10 gastropod and 3 bivalve species, with Octopus bimaculoides chosen as the outgroup species. The resulting phylogenetic tree revealed that M. siamensis is most closely related to A. vulgaris, with an estimated divergence time of approximately 63 million years ago (MYA). Within the remaining stylommatophoran snails, C. unifasciata emerged as a sister clade containing M. siamensis and A. vulgaris, while L. fulica forms a sister clade to these three stylommatophoran species. Furthermore, the analysis indicated the divergence of terrestrial pulmonates from other mollusks inhabiting different habitats, such as freshwater and marine snails, approximately 254 MYA. Additionally, the split between Gastropoda and Bivalvia occurred around 439 MYA (Fig. 3).
Figure 3.
Phylogenetic relationship of M. siamensis with 4 freshwater, 6 marine, and 3 land snail. Octopus bimaculoides is outgroup. The divergence times (million year ago (MYA)) are shown with 95% confidence intervals represent blue color bar.
Positive selection
Our further analysis exploring the roles of genes under positive selection (dn/ds > 1) revealed that 17 out of 26 orthologous groups exhibited evidence of positive selection in M. siamensis (Supplementary Table 5). The GO enrichment of genes under positive selection included 3 ontologies of biological processes, 7 cellular components, and 4 molecular functions (Supplementary Table 6). Additionally, we found evidence of pathway enrichment in the ubiquitin–proteasome pathway, specifically in the PSMD1 genes, with a raw P-value of 0.004. This finding supports the hypothesis of protein binding in M. siamensis through positive selection.
Transcriptome assembly and functional annotation
The comparative transcriptome analysis of M. siamensis foot and mantle was conducted using the Illumina HiSeq sequencing platform. Approximately 30 Gb of data were assembled, resulting in 716,193 transcripts with an average length of 570 bp. The assembly comprised 403,122 unigenes, with a GC content of 39.19%. These unigenes were annotated in several databases, including Refseq (70,286, 17.44%), Pfam (69,212, 17.17%), clusters of orthologous groups (COG) (68,487, 16.99%), GO (64,830, 16.08%), and KEGG (5981, 1.48%). Notably, a total of 7830 genes were identified encoding putative antimicrobial peptides (AMPs) in the M. siamensis transcriptomes.
According to the GO annotation, a total of 64,830 unigenes were assigned to 113 GO terms covering biological processes, cellular components, and molecular functions. In the biological process category, the majority of unigenes were associated with metabolism (4342, 6.70%), development (3097, 4.77%), and cell organization and biogenesis (2102, 3.24%). Among the cellular component category, the most represented categories included cell (1667, 2.57%), intracellular (1281, 1.98%), and cytoplasm (630, 0.97%). In the molecular function category, the matched sequences were distributed across catalytic activity (2016, 3.11%), binding (1130, 1.74%), and transferase activity (715, 1.10%) (Fig. 4A).
Figure 4.
Function classification in Gene ontology (A) and Clusters of Orthologous Groups of proteins (COG) (B).
Regarding COG annotation, a total of 68,487 unigenes were classified into 24 functional categories (Fig. 4B). The categories with the highest proportion of unigenes were unknown function (20,195, 29.49%), signal transduction mechanisms (8636, 12.61%), and translation, ribosomal structure, and biogenesis (6715, 9.80%). Additionally, 5981 unigenes were significantly matched in the KEGG database and assigned to 419 KEGG pathways. The metabolic pathways emerged as the pathway with the highest proportion of unigenes, followed by biosynthesis of secondary metabolites.
Identification of differentially expressed genes
In the transcriptome analysis of M. siamensis foot and mantle, we obtained 390,055 assembled transcripts with an average length of 636 bp and identified 285,630 unigenes from the foot muscle data. Similarly, from the mantle data, we acquired 427,574 assembled transcripts with an average length of 555 bp and identified 255,705 unigenes. The GC percentage of the unigenes was 38.75% in the foot muscle and 39.32% in the mantle.
Differential gene expression analysis revealed a total of 115,263 differentially expressed genes in M. siamensis foot and mantle tissues. Among these, 55,675 genes were up-regulated, while 59,588 genes were down-regulated. The top 20 most highly expressed genes in the foot muscle and mantle are listed in Supplementary Tables 7 and 8, respectively. Notably, the top expressed genes in both tissues exhibit similarities. Particularly noteworthy is the higher expression of five genes involved in biological adhesion (actin, C1q, H-lectin, C-lectin, and VWA) in the foot muscle compared to the mantle, as detailed in Supplementary Tables 9 and 10.
Antimicrobial and anticancer activity prediction
In the transcriptome of M. siamensis, a total of 7830 sequences were identified as putative AMPs. We conducted further analyses on the CAMP database for these sequences using four machine-learning algorithms, namely Support Vector Machine (SVM), Discriminant Analysis (DA), Artificial Neural Network (ANN), and Random Forest (RF). Additionally, the iACP tool was used for anticancer analysis.
The analysis yielded 44 putative active peptides. Among these, 29 sequences were predicted to possess antimicrobial properties, predominantly belonging to bacteriocin families. Eight sequences were identified to exhibit both putative antimicrobial and anticancer properties, while seven sequences were predicted to solely display anticancer activity (Supplementary Table 10 and Fig. 5). Furthermore, among these potential active peptides, eight peptides were found to be highly expressed in both foot (4 peptides) and mantle (4 peptides) tissues (Supplementary Table 10).
Figure 5.
44 putative active peptides of antimicrobial and anticancer prediction.
Discussion
This study is the first comprehensive report on the de novo genome assembly of the semislug, a prominent member of the terrestrial pulmonates. Comparisons of gene proportions between M. siamensis and 13 other molluscan genomes (Supplementary Table 1 and Fig. 1B) indicate a higher prevalence of duplicated genes in terrestrial pulmonates. This elevated level of duplication may be attributed to a whole genome duplication event, similar to the one observed in the adaptation of giant African snails, Lissachatina fulica11.
Phylogenetic analysis revealed that M. siamensis emerges as a sister clade to A. vulgaris with strong support, aligning with previous molecular phylogenies based on nuclear ribosomal RNA16,17. However, the relationship between Limacoidei (which includes M. siamensis) and Arionoidea (to which A. vulgaris belongs) still exhibits lower support values. The clustering of M. siamensis, A. vulgaris, and C. unifasciata corresponds to their classification within the suborder Helicina (“nonachatinoid” clade), contrasting with L. fulica placed in the suborder Achatinina (“achatinoid” clade)16–18.
Among the significantly enriched gene families specific to M. siamensis, four gene ontologies (xenobiotic metabolic process, response to bacterium, negative regulation of defense response to bacterium, incompatible interaction, and regulation of innate immune response) emphasize the importance of the immune system and response to foreign substances in semislugs. This is further supported by the signature of positive selection observed in interleukin 1 receptor-associated kinase 1 binding protein 1 (IRAK1BP1), a major component of the Toll-like receptors signaling pathway, as seen in the putative immune signaling cascade of the land slug Incilaria fruhstorferi19. Moreover, our transcriptome analysis predicted the antimicrobial and anticancer properties of putative active peptides, consistent with findings from the mucus of other terrestrial gastropods. However, notable antimicrobial peptides such as achacin-like peptide, macin, and hemocyanin, identified in L. fulica, were not predicted in the transcriptome of M. siamensis3,20–22.
In contrast to transferrin and cystatin, which are prevalent across various animal taxa23,24 and widely reported from various documented in different molluscan groups25–27, reports of bacteriocin and defensin in mollusks are comparatively scarce. Molluscan homologs of bacteriocin have only been predicted from the whole genome of the zebra mussel, Dreissena polymorpha28, while molluscan defensins have been discovered in some species of the abalone genus, Haliotis29–31.
The remaining putative active peptides predicted in this study align with those reported from other animal groups. For example, abaecin from hymenopterans32,33, metchnikowin from the fruit fly genus Drosophila34,35, LEAP-2 from vertebrates36, brevinin and gaegurin from ranoid frogs37–39, and bombinin and maximin from the fire-bellied toad genus Bombina40. Some peptides predicted in this study have also been previously reported from a single species, such as halocidin from the tunicate Halocynthia aurantium41, tachystatin from the horseshoe crab Tachypleus tridentatus42, and ascaphin from the coastal tailed frog Ascaphus truei43.
The discovery of these putative active peptides from M. siamensis transcriptomes is novel, as they have not been reported in other mollusks before, particularly in terrestrial gastropods. This discovery offers valuable insights into their antimicrobial and anticancer potential, presenting opportunities for further research and development in the field of peptide-based therapeutics.
The analysis of differential gene expression revealed that five selected proteins involved in biological adhesion exhibited higher expression levels in the foot tissue compared to the mantle. These adhesion-related proteins fall into two major classes: lectin-like proteins (C-lectins, C1q, and H-lectins) and matrilin-like proteins (VWA and EGF). These proteins play crucial roles in cell adhesion and movement, aligning with the mechanism of glue-like mucus in terrestrial slugs44,45. Moreover, they also function in the molluscan innate immune system46–49. Notably, among the most highly expressed genes in the mantle transcriptome, the bovine pancreatic trypsin inhibitor BPTI/Kunitz domain is involved in defense against microbial pathogens50. These findings shed light on the molecular mechanisms underlying the biological adhesive and innate immunity-relevant properties of M. siamensis mucus, offering insights into the functional differences between foot and mantle tissues.
Overall, the genome and transcriptome data of M. siamensis not only enhance our understanding of the species itself but also hold broader implications for evolutionary biology, medical research, and the development of biotechnological applications. The draft genome of M. siamensis can serve as a valuable reference for future studies, aiding in the comprehension of evolutionary adaptations on land. Additionally, the transcriptome data provide insights into protein composition and bioactive compounds. The availability of this valuable resource is expected to catalyze further investigations and pave the way for future discoveries and advancements across various fields.
Methods
Sample collection and DNA sequencing
Specimens of M. siamensis were collected from Tham Sai Thong Temple, Kalasin, Thailand. Foot and mantle tissues were dissected from the snail’s body on dry ice to prevent RNA degradation. Genomic DNA extraction was done separately from the foot and mantle tissues using the DNAeasy® Blood and Tissue kit from Qiagen™. DNA quantity and quality assessment were performed using a NanoDrop spectrophotometer (Thermo Scientific, Wilmington, DE) and an Agilent 2100 Bioanalyzer (Agilent Technologies).
We used three different sequencing technologies to obtain the genome sequence. Long-read DNA sequencing was conducted using the SMRTbell library prepared with the SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences) and sequenced on the PacBio Sequel sequencing system. A total of 234 Gb of data was generated from 33 SMRT cells, with an average insert length of 16,534 bp and an N50 read length of 27,177 bp.
For short-read DNA sequencing, approximately 200 Gb of data was produced using the TruSeq Nano DNA library prep kit and sequenced on the Illumina HiSeqX Ten platform (Illumina, San Diego, CA, USA) with 150 bp paired-end reads. Additionally, genomic DNA underwent Dovetail Omni-C library preparation using the Dovetail Hi-C preparation kit (Dovetail Genomics, Scotts Valley, CA, USA) following the manufacturer’s protocol (manual version 1.0 for non-mammalian samples). Sequencing was performed on the Illumina HiSeqX Ten platform with 150 bp paired-end reads.
De novo genome assembly
Binary Alignment Map files containing the subreads generated by PacBio sequencing were converted to FASTA files using DEXTRACTOR software (https://github.com/thegenemyers/DEXTRACTOR). Subreads shorter than 1000 bp and those with quality scores lower than 0.80 were excluded. The resulting clean reads were used for de novo assembly using the CANU pipeline, involving three steps: correction, trimming, and assembly, with default parameters51. Subsequently, short reads were used for polishing by Pilon52, and redundant sequences in the assembly were eliminated using purge_dups53. Hi-C reads were then mapped to the de novo assembled contigs using BWA software to establish construct contacts among the contigs54. Finally, the HiRise scaffolding method was applied to connect the contigs together, resulting in the final assembly55.
Genome assembly evaluation
The quality of the genome assemblies was assessed using descriptive measures, including the number of contigs, the total number of assembled bases, and completeness. This evaluation was conducted using the QUAST analysis tool56. Additionally, the BUSCO (version 5.2.2) was employed to evaluate the completeness of the genome. The metazoan version 10, comprising a set of 954 genes, served as the reference for this analysis. BUSCO was also run with identical parameters to facilitate comparison with 13 other species.
Genome annotation
De novo repeat annotation was conducted using RepeatModeler (http://www.repeatmasker.org/ RepeatModeler/). The generated sequence libraries were subsequently utilized as queries to mask repetitive elements with RepeatMasker (http://www.repeatmasker.org). To calculate the Kimura divergence values, we employed the "calcDivergenceFromAlign.pl" script within the RepeatMasker pipeline. Additionally, the repeat landscape, encompassing the representation of repeats in the genomes, was visualized. For the annotation of protein-coding genes, we used the MAKER pipeline57.
Gene function annotation was conducted using InterProScan to search for domains or motifs in public databases. Additionally, we used the web-based platform KEGG58,59. Orthology assignments and predictions of KEGG pathways were performed through the KEGG Automatic Annotation Server using the bidirectional best hit (BBH) BLAST method (https://www.genome.jp/kegg/kaas/). Furthermore, the draft genomes underwent scanning for COGs annotations using eggNOG-mapper v2 (http://eggnog-mapper.embl.de).
Gene family analysis
The protein sequences of M. siamensis were compared with those of 13 other species, including 10 gastropod species (Achatina fulica, Aplysia californica, Arion vulgaris, Biomphalaria glabrata, Candidula unifasciata, Chrysomallon squamiferum (Scaly-foot gastropod), Gigantopelta aegis, Haliotis rufescens (red abalone), Lottia gigantea, and Pomacea canaliculate) and 3 bivalvia species (Crassostrea gigas (Pacific oyster), Dreissena polymorpha, and Mizuhopecten yessoensis). OrthoFinder software (version 2.4.0)60 was used to identify gene family clusters among the different species, and the results were visualized using the UpSetR package61.
Phylogenetic and divergence time analysis
A phylogenetic tree was constructed using one-to-one orthologous genes, identified through OrthoFinder analysis. The single-copy orthologous proteins were aligned using MUSCLE, and maximum likelihood trees were inferred using RAxML v8.2.8 with 1000 bootstrap replicates. Divergence times were computed using the MCMCTREE program implemented in the PAML v4.8 package, employing the correlated molecular clock62.
Four fossil calibration points were applied to estimate the divergence times: the divergence time of Gastropoda and Bivalvia (515.0–541.7 MYA), Caenogastropoda and Heterobranchia (238.6–429.9 MYA), Pomacea canaliculata and Lottia gigantea (238.6–429.9 MYA), and Octopus bimaculoides (480.0–559.4 MYA) for the root. These calibration pointpoints were obtained from the Timetree database (http://www.timetree.org/)63. The resulting phylogenetic tree was visualized using FigTree v.1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).
Positive selection
To identify positive selection, one-to-one orthologous proteins were aligned using PRANK64, and codon alignments were generated using PAL2NAL65. The aBSREL model (adaptive Branch-Site Random Effects Likelihood) implemented in the Hyphy package (v2.5.15)66 was used to test whether positive selection had occurred on a proportion of branches (ω > 1), with a significance threshold of P-value of less than 0.05. Additionally, GO enrichment analysis was conducted using the PANTHER database.
Transcriptome sequencing analysis
Ribonucleic acid was extracted separately from the foot and mantle tissues using the TruSeq Stranded mRNA LT Sample Prep Kit and sequenced on the Illumina Novaseq platform, resulting in 150 bp paired-end reads. The RNA-seq data underwent assembly using both de novo and genome-guided approaches with Trinity67. Redundant transcripts were identified and removed from the assembly using CD-HIT-EST68 with an identity threshold of 98% sequence similarity. We performed transcript quantification using RNA-Seq by Expectation–Maximization (RSEM)69. Normalization and differential expression analysis were conducted using EdgeR70. Candidate coding regions within transcript sequences were identified using the TransDecoder (https://github.com/TransDecoder/TransDecoder/wiki). Finally, functional annotation of the transcriptome was achieved using Trinotate, which involved conducting BLASTX searches on the Swiss-Prot databases to generate GO terms (http://trinotate.github.io).
The signal peptide was identified using SignalP 5.071. Candidate peptides identified as antimicrobial were evaluated using four machine-learning algorithms: Support Vector Machine (SVM), DA, ANN, and Random Forest (RF), by mapping them with the Collection of Antimicrobial Peptides (CAMP)72. Additionally, the iACP online tool (https://lin.uestc.edu.cn/server/iACP) was utilized to predict the anticancer activity of the identified peptides.
Supplementary Information
Acknowledgements
The study was supported by Ratchadapiseksompotch Fund, Faculty of Medicine, Chulalongkorn University, Grant numbers: RA-MF-08/66 and RA-MF-16/66.
Author contributions
W.C performed analysis and drafted the manuscript. P.J, P.T, and A.P prepared specimens and contributed content on the manuscript. C.S and A.A performed the DNA extraction and sequencing. V.S performed conceptualization, funding acquisition, and editing the manuscript. S.P performed conceptualization and editing the manuscript. All authors contributed to the article and approved the submitted version.
Data availability
The genome assembly generated in this study is available via NCBI under the BioProject number: PRJNA993791. All other relevant data are available upon request to the corresponding authors.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Wanna Chetruengchai and Parin Jirapatrasilp.
These authors jointly supervised this work: Piyoros Tongkerd and Vorasuk Shotelersuk.
Contributor Information
Piyoros Tongkerd, Email: piyorose@hotmail.com.
Vorasuk Shotelersuk, Email: vorasuk.s@chula.ac.th.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-024-64425-6.
References
- 1.Barker, G.M. Gastropods on land: Phylogeny, diversity and adaptive morphology. in The Biology of Terrestrial Molluscs (G.M. Barker, Editor). (CABI Publishing, 2001)
- 2.Cerullo AR, et al. Comparative mucomic analysis of three functionally distinct Cornu aspersum secretions. Nat. Commun. 2023;14(1):5361. doi: 10.1038/s41467-023-41094-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Noothuan N, et al. Snail mucus from the mantle and foot of two land snails, Lissachatina fulica and Hemiplecta distincta, exhibits different protein profile and biological activity. BMC Res. Notes. 2021;14(1):138. doi: 10.1186/s13104-021-05557-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Tachapuripunya V, et al. Unveiling putative functions of mucus proteins and their tryptic peptides in seven gastropod species using comparative proteomics and machine learning-based bioinformatics predictions. Molecules. 2021 doi: 10.3390/molecules26113475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Liudmyla K, Olena C, Nadiia S. Chemical properties of Helix aspersa mucus as a component of cosmetics and pharmaceutical products. Mater. Today Proc. 2022;62:7650–7653. doi: 10.1016/j.matpr.2022.02.217. [DOI] [Google Scholar]
- 6.E-kobon T, et al. Prediction of anticancer peptides against MCF-7 breast cancer cells from the peptidomes of Achatina fulica mucus fractions. Comput. Struct. Biotechnol. J. 2016;14:49–57. doi: 10.1016/j.csbj.2015.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Deng T, et al. A natural biological adhesive from snail mucus for wound repair. Nat. Commun. 2023;14(1):396. doi: 10.1038/s41467-023-35907-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Leśków A, et al. The effect of biologically active compounds in the mucus of slugs Limax maximus and Arion rufus on human skin cells. Sci. Rep. 2021;11(1):18660. doi: 10.1038/s41598-021-98183-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Nualnisachol P, Chumnanpuen P, E-kobon T. Understanding snail mucus biosynthesis and shell biomineralisation through genomic data mining of the reconstructed carbohydrate and glycan metabolic pathways of the giant African snail (Achatina fulica) Biology. 2023 doi: 10.3390/biology12060836. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Chen Z, et al. Pulmonate slug evolution is reflected in the de novo genome of Arion vulgaris Moquin-Tandon, 1855. Sci. Rep. 2022;12(1):14226. doi: 10.1038/s41598-022-18099-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Liu C, et al. Giant African snail genomes provide insights into molluscan whole-genome duplication and aquatic-terrestrial transition. Mol. Ecol. Resour. 2021;21(2):478–494. doi: 10.1111/1755-0998.13261. [DOI] [PubMed] [Google Scholar]
- 12.Chueca LJ, Schell T, Pfenninger M. De novo genome assembly of the land snail Candidula unifasciata (Mollusca: Gastropoda) G3 Genes Genomes Genet. 2021;11(8):jkab180. doi: 10.1093/g3journal/jkab180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Solem A. Some non-marine mollusks from Thailand, with notes on classification of the Helicarionidae. Spolia Zoologica Musei Hauniensis. 1966;24:1–110. [Google Scholar]
- 14.Cockerell TDA. The genus Megaustenia. Nautilus. 1929;43:51–54. [Google Scholar]
- 15.Khrueanet W, Supiwong W, Tumpeesuwan C, Tumpeesuwan S, Pinthong K, Tanomtong A. First chromosome analysis and localization of the nucleolar organizer region of land snail, Sarika resplendens (Stylommatophora, Ariophantidae) in Thailand. Cytologia. 2013;78:213–222. doi: 10.1508/cytologia.78.213. [DOI] [Google Scholar]
- 16.Wade CM, Mordan PB, Naggs F. Evolutionary relationships among the Pulmonate land snails and slugs (Pulmonata, Stylommatophora) Biol. J. Linnean Soc. 2006;87(4):593–610. doi: 10.1111/j.1095-8312.2006.00596.x. [DOI] [Google Scholar]
- 17.Wade CM, Mordan PB, Clarke B. A phylogeny of the land snails (Gastropoda: Pulmonata) Proc. R Soc. Lond. Series B Biol. Sci. 2001;268:413–422. doi: 10.1098/rspb.2000.1372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zhao T, et al. Complete mitochondrial genomes of the slugs Deroceras laeve (Agriolimacidae) and Ambigolimax valentianus (Limacidae) provide insights into the phylogeny of Stylommatophora (Mollusca, Gastropoda) ZooKeys. 2023;1173:43–59. doi: 10.3897/zookeys.1173.102786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Patnaik BB, et al. Transcriptome analysis of air-breathing land slug, Incilaria fruhstorferi reveals functional insights into growth, immunity, and reproduction. BMC Genom. 2019;20(1):154. doi: 10.1186/s12864-019-5526-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cilia, G. & F. Fratini. Antimicrobial properties of terrestrial snail and slug mucus. J. Complement. Integr. Med. 15(3) (2018). [DOI] [PubMed]
- 21.Zhong J, et al. A novel cysteine-rich antimicrobial peptide from the mucus of the snail of Achatina fulica. Peptides. 2013;39:1–5. doi: 10.1016/j.peptides.2012.09.001. [DOI] [PubMed] [Google Scholar]
- 22.Suárez L, et al. Antibacterial, antibiofilm and anti-virulence activity of biactive fractions from mucus secretion of giant African snail Achatina fulica against Staphylococcus aureus strains. Antibiotics. 2021;10:1548. doi: 10.3390/antibiotics10121548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lambert LA, et al. Evolution of the transferrin family: Conservation of residues associated with iron and anion binding. Comp. Biochem. Physiol. Part B Biochem. Mol. Biol. 2005;142(2):129–141. doi: 10.1016/j.cbpb.2005.07.007. [DOI] [PubMed] [Google Scholar]
- 24.Li F, et al. Identification and characterization of a Cystatin gene from Chinese mitten crab Eriocheir sinensis. Fish Shellfish Immunol. 2010;29(3):521–529. doi: 10.1016/j.fsi.2010.05.015. [DOI] [PubMed] [Google Scholar]
- 25.Herath HMLPB, et al. Molecular insights into a molluscan transferrin homolog identified from disk abalone (Haliotis discus discus) evidencing its detectable role in host antibacterial defense. Develop. Comp. Immunol. 2015;53(1):222–233. doi: 10.1016/j.dci.2015.07.013. [DOI] [PubMed] [Google Scholar]
- 26.Li H-W, et al. The characteristics and expression profile of transferrin in the accessory nidamental gland of the Bigfin Reef Squid during bacteria transmission. Sci. Rep. 2019;9(1):20163. doi: 10.1038/s41598-019-56584-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Premachandra HKA, et al. Expression profile of cystatin B ortholog from Manila clam (Ruditapes philippinarum) in host pathology with respect to its structural and functional properties. Fish Shellfish Immunol. 2013;34(6):1505–1513. doi: 10.1016/j.fsi.2013.03.349. [DOI] [PubMed] [Google Scholar]
- 28.McCartney MA, et al. The genome of the zebra mussel, Dreissena polymorpha: A resource for comparative genomics, invasion genetics, and biocontrol. G3 Genes Genomes Genet. 2022;12(2):jkab423. doi: 10.1093/g3journal/jkab423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.De Zoysa M, et al. Defensin from disk abalone Haliotis discus discus: Molecular cloning, sequence characterization and immune response against bacterial infection. Fish Shellfish Immunol. 2010;28(2):261–266. doi: 10.1016/j.fsi.2009.11.005. [DOI] [PubMed] [Google Scholar]
- 30.Yao T, et al. Molecular characterization and immune analysis of a defensin from small abalone, Haliotis diversicolor. Comp. Biochem. Physiol. Part B Biochem. Mol. Biol. 2019;235:1–7. doi: 10.1016/j.cbpb.2019.05.004. [DOI] [PubMed] [Google Scholar]
- 31.Yao T, et al. Molecular characterization and expression pattern analysis of a defensin (HdDef1) from small abalone (Haliotis diversicolor) South China Fisheries Sci. 2019;15(6):1–8. doi: 10.1016/j.cbpb.2019.05.004. [DOI] [PubMed] [Google Scholar]
- 32.Wang L, et al. Cloning and characteristics of the antibacterial peptide gene abaecin in the bumblebee Bombus lantschouensis (Hymenoptera: Apidae) J. Asia-Pacific Entomol. 2021;24(1):369–375. doi: 10.1016/j.aspen.2021.01.013. [DOI] [Google Scholar]
- 33.Shen X, et al. Characterization of an abaecin-like antimicrobial peptide identified from a Pteromalus puparum cDNA clone. J. Invertebrate Pathol. 2010;105(1):24–29. doi: 10.1016/j.jip.2010.05.006. [DOI] [PubMed] [Google Scholar]
- 34.Levashina EA, et al. Metchnikowin, a novel immune-inducible proline-rich peptide from Drosophila with antibacterial and antifungal properties. Eur. J. Biochem. 1995;233(2):694–700. doi: 10.1111/j.1432-1033.1995.694_2.x. [DOI] [PubMed] [Google Scholar]
- 35.Drosophila 12 Genomes Consortium Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007;450(7167):203–218. doi: 10.1038/nature06341. [DOI] [PubMed] [Google Scholar]
- 36.Li H-Z, et al. LEAP2 is a more conserved ligand than ghrelin for fish GHSRs. Biochimie. 2023;209:10–19. doi: 10.1016/j.biochi.2023.01.010. [DOI] [PubMed] [Google Scholar]
- 37.Won H-S, Kang S-J, Lee B-J. Action mechanism and structural requirements of the antimicrobial peptides, gaegurins. Biochimica et Biophysica Acta (BBA) Biomembranes. 2009;1788(8):1620–1629. doi: 10.1016/j.bbamem.2008.10.021. [DOI] [PubMed] [Google Scholar]
- 38.Wang G, et al. Five novel antimicrobial peptides from the Kuhl’s wart frog skin secretions, Limnonectes kuhlii. Mol. Biol. Rep. 2013;40(2):1097–1102. doi: 10.1007/s11033-012-2152-4. [DOI] [PubMed] [Google Scholar]
- 39.Savelyeva, A., et al. An overview of Brevinin superfamily: Structure, function and clinical perspectives. in Anticancer Genes (S. Grimm, Editor). (Springer London, 2014). p. 197–212. [DOI] [PMC free article] [PubMed]
- 40.Lee W-H, et al. Variety of antimicrobial peptides in the Bombina maxima toad and evidence of their rapid diversification. Eur. J. Immunol. 2005;35(4):1220–1229. doi: 10.1002/eji.200425615. [DOI] [PubMed] [Google Scholar]
- 41.Jang WS, et al. Halocidin: A new antimicrobial peptide from hemocytes of the solitary tunicate, Halocynthia aurantium. FEBS Lett. 2002;521(1):81–86. doi: 10.1016/S0014-5793(02)02827-2. [DOI] [PubMed] [Google Scholar]
- 42.Fujitani N, et al. Structure of the antimicrobial peptide tachystatin A*. J. Biol. Chem. 2002;277(26):23651–23657. doi: 10.1074/jbc.M111120200. [DOI] [PubMed] [Google Scholar]
- 43.Conlon JM, et al. The ascaphins: A family of antimicrobial peptides from the skin secretions of the most primitive extant frog, Ascaphus truei. Biochem. Biophys. Res. Commun. 2004;320(1):170–175. doi: 10.1016/j.bbrc.2004.05.141. [DOI] [PubMed] [Google Scholar]
- 44.Smith AM, et al. RNA-Seq reveals a central role for lectin, C1q and von Willebrand factor A domains in the defensive glue of a terrestrial slug. Biofouling. 2017;33(9):741–754. doi: 10.1080/08927014.2017.1361413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Christoforo C, et al. Metal-binding proteins and cross-linking in the defensive glue of the slug Arion subfuscus. J. R. Soc. Interface. 2022;19(196):20220611. doi: 10.1098/rsif.2022.0611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Loker ES. Gastropod immunobiology. In: Söderhäll K, editor. Invertebrate Immunity. Landes Bioscience and Springer Science+Business Media; 2010. pp. 17–43. [Google Scholar]
- 47.Song L, et al. Bivalve immunity. In: Söderhäll K, et al., editors. Invertebrate Immunity. Landes Bioscience and Springer Science+Business Media; 2010. pp. 44–65. [Google Scholar]
- 48.Gerdol, M., et al. Immunity in molluscs: Recognition and effector mechanisms, with a focus on Bivalvia. in Advances in Comparative Immunology (E.L. Cooper, Editor). (Springer International Publishing AG, 2018). p. 225–341.
- 49.Loker ES, Bayne CJ. Molluscan immunobiology: Challenges in the Anthropocene epoch. In: Cooper EL, editor. Advances in Comparative Immunology. Springer International Publishing AG; 2018. pp. 343–407. [Google Scholar]
- 50.Ranasinghe S, McManus DP. Structure and function of invertebrate Kunitz serine protease inhibitors. Develop. Comp. Immunol. 2013;39(3):219–227. doi: 10.1016/j.dci.2012.10.005. [DOI] [PubMed] [Google Scholar]
- 51.Koren S, et al. Canu: Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–736. doi: 10.1101/gr.215087.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Walker BJ, et al. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9(11):e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Guan D, et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36(9):2896–2898. doi: 10.1093/bioinformatics/btaa025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Putnam NH, et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 2016;26(3):342–350. doi: 10.1101/gr.193474.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Gurevich A, et al. QUAST: Quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Cantarel BL, et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18(1):188–196. doi: 10.1101/gr.6743907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Kanehisa M, et al. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457–D462. doi: 10.1093/nar/gkv1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Emms DM, Kelly S. OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20(1):238. doi: 10.1186/s13059-019-1832-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Conway JR, Lex A, Gehlenborg N. UpSetR: An R package for the visualization of intersecting sets and their properties. Bioinformatics. 2017;33(18):2938–2940. doi: 10.1093/bioinformatics/btx364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Yang Z. PAML: A program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 1997;13(5):555–556. doi: 10.1093/bioinformatics/13.5.555. [DOI] [PubMed] [Google Scholar]
- 63.Kumar S, et al. TimeTree: A resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 2017;34(7):1812–1819. doi: 10.1093/molbev/msx116. [DOI] [PubMed] [Google Scholar]
- 64.Loytynoja A. Phylogeny-aware alignment with PRANK. Methods Mol. Biol. 2014;1079:155–170. doi: 10.1007/978-1-62703-646-7_10. [DOI] [PubMed] [Google Scholar]
- 65.Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34(1):W609-12. doi: 10.1093/nar/gkl315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Smith MD, et al. Less is more: An adaptive branch-site random effects model for efficient detection of episodic diversifying selection. Mol. Biol. Evol. 2015;32(5):1342–1353. doi: 10.1093/molbev/msv022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Haas BJ, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8(8):1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
- 69.Li B, Dewey CN. RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Robinson MD, McCarthy DJ, Smyth GK. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Almagro Armenteros JJ, et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 2019;37(4):420–423. doi: 10.1038/s41587-019-0036-z. [DOI] [PubMed] [Google Scholar]
- 72.Waghu FH, Idicula-Thomas S. Collection of antimicrobial peptides database and its derivatives: Applications and beyond. Protein Sci. 2020;29(1):36–42. doi: 10.1002/pro.3714. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The genome assembly generated in this study is available via NCBI under the BioProject number: PRJNA993791. All other relevant data are available upon request to the corresponding authors.