Skip to main content
Nature Communications logoLink to Nature Communications
. 2022 Feb 21;13:977. doi: 10.1038/s41467-022-28634-9

Nuclear genome of Bulinus truncatus, an intermediate host of the carcinogenic human blood fluke Schistosoma haematobium

Neil D Young 1,, Andreas J Stroehlein 1, Tao Wang 1, Pasi K Korhonen 1, Margaret Mentink-Kane 2, J Russell Stothard 3, David Rollinson 4,5, Robin B Gasser 1,
PMCID: PMC8861042  PMID: 35190553

Abstract

Some snails act as intermediate hosts (vectors) for parasitic flatworms (flukes) that cause neglected tropical diseases, such as schistosomiases. Schistosoma haematobium is a blood fluke that causes urogenital schistosomiasis and induces bladder cancer and increased risk of HIV infection. Understanding the molecular biology of the snail and its relationship with the parasite could guide development of an intervention approach that interrupts transmission. Here, we define the genome for a key intermediate host of S. haematobium—called Bulinus truncatus—and explore protein groups inferred to play an integral role in the snail’s biology and its relationship with the schistosome parasite. Bu. truncatus shared many orthologous protein groups with Biomphalaria glabrata—the key snail vector for S. mansoni which causes hepatointestinal schistosomiasis in people. Conspicuous were expansions in signalling and membrane trafficking proteins, peptidases and their inhibitors as well as gene families linked to immune response regulation, such as a large repertoire of lectin-like molecules. This work provides a sound basis for further studies of snail-parasite interactions in the search for targets to block schistosomiasis transmission.

Subject terms: Genome, Parasite genomics, Zoology


The snail Bulinus truncatus is an intermediate host of the carcinogenic human blood fluke Schistosoma haematobium. Here the authors report the genome of Bu. truncatus, explore protein groups inferred to play a role in its interaction with the schistosome parasite, and identify expansions in gene families linked to immune response regulation.

Introduction

The phylum Mollusca (molluscs) is represented by at least 70,000 species (55 families), which are essential invertebrates of terrestrial or marine ecosystems1. Many act as intermediate hosts (i.e., vectors) of parasites of vertebrates including humans2. Key representatives of the latter group (intermediate hosts) are aquatic snails that transmit parasitic trematodes (flukes) which cause some of the most chronic and destructive neglected tropical diseases (NTDs) of humans, including clonorchiasis, opisthorchiasis and schistosomiasis3,4. These NTDs (called trematodiases) affect ~280 million people worldwide3,4 and often cause serious complications, particularly malignant cancers, in chronically affected people5,6.

Although the control of such trematodiases presently relies heavily on the chemotherapeutic treatment of people with a single drug, called praziquantel7, the key to effective and sustainable control is to prevent or block disease transmission. One possible way of achieving this is to break the transmission cycle at the level of entry or exit of the parasite from the snail intermediate host. Thus, understanding the fundamental biology of the intermediate host and its relationship with the parasite, particularly at the molecular level, could assist significantly in finding new methods to interrupt transmission and prevent disease.

While there have been major advances in our understanding of the biology and molecular biology of the parasitic trematodes that cause clonorchiasis, opisthorchiasis and schistosomiasis811, this is not the case for the snail intermediate hosts that transmit these diseases. Indeed, the only studies conducted to date12,13 have reported draft genomes for Biomphalaria glabrata – a key intermediate host of the blood fluke Schistosoma mansoni, which causes hepato-intestinal schistosomiasis in humans, to underpin studies of snail-schistosome interactions12,1418. However, curated genomic resources are lacking for the vast majority of trematode-transmitting snails19, which represents a major blind spot for health-related research.

As some of our work over the years has focused on Schistosoma haematobium—the causative agent of urogenital schistosomiasis, affecting 100 million people worldwide20—we have become particularly keen to create a multi-omic tool box for studying the biology of the key snail intermediate host—Bulinus truncatus—of this carcinogenic blood fluke and the molecular interplay between this snail host and the larval stages of the fluke that asexually replicate within it21.

Recently, we defined a mitochondrial genome that represents an established laboratory line (designated BRI)22 and a transcriptome of the adult stage of this line of Bu. truncatus23. In the present study, we logically extend this work, and build on recent success in the assembly of relatively large genomes (400–550 Mb) of invertebrates using third-generation (i.e. long-read or long-range) technology11,24, to define a reference genome for Bu. truncatus using this technology, combined with second-generation (short-read) sequencing and advanced bioinformatics. This work will underpin fundamental studies of the biology, ecology and population genetics of Bu. truncatus and the interactions between this snail and S. haematobium, with broader implications for comparative genomic and molecular investigations of other snail/pathogen systems.

Results

Genome assembly

From a total of 128.5 Gb (~100-fold coverage) of Illumina short-read, 11 Gb of Oxford Nanopore (~10-fold coverage) and 48 Gb of Hi-C sequence data (Supplementary Table 1), a draft genome (designated Btru.v1) was assembled for Bu. truncatus; it contains 11,176 contigs and 523 scaffolds and has a total length of ~1.2 Gb (N50 = 5 Mb; L50 = 68; longest contig = 36.5 Mb) and a GC content of 36.3% (Table 1). Scaffolding with Hi-C data gave ~8.3 million contacts (Supplementary Fig. 1 and Supplementary Table 2), ~7.3 and 1 million of which were within and between chromosomes, respectively; the ~7 million intra-chromosomal contacts were <20 kb in length. In total, 914 of 954 (95.8%) metazoan BUSCO genes were identified in this genome, indicating that the assembly represents a substantial proportion of the complete genome (Table 1).

Table 1.

Features of the genome (Btru.v1) of Bulinus truncatus.

Assembly Btru.v1
Number of scaffolds 523
Total size of scaffolds 1,221,777,273
Longest scaffold; shortest scaffold 36,501,513; 10,033
Number of scaffolds of >100 kb; 1 Mb; 10 Mb 484; 271; 20
N50 scaffold length; L50 scaffold count 4,956,851; 68
Scaffold %GC 36.27
Scaffold %N 0.44
Number of contigs 11,176
Longest contig 1,917,814
Number of contigs of >100 kb; 1 Mb 3717;28
N50 contig length; L50 contig count 234,265;1488
Contig %GC 36.43
Genome completeness and accuracy:
 Complete BUSCOsa 914 (95.8%)
 Complete single-copy BUSCOs 822 (86.2%)
 Complete and duplicated BUSCOs 92 (9.6%)
 Fragmented BUSCOs 8 (0.8%)
 Missing BUSCOs 32 (3.4%)

aNumber of Benchmarking Universal Single-Copy Orthologs (BUSCOs) identified (genome-mode) and percentage of the 954 genes within the metazoa_odb10 dataset.

Subsequently, ploidy was assessed using short read data representing an individual adult specimen of Bu. truncatus (Supplementary Fig. 2). Using 21-mers, coverage-peaks were inferred using a diploid (Supplementary Fig. 2a) or tetraploid (Supplementary Fig. 2b) model. Although these models were not unequivocally supported by k-mer frequency coverage or read-mapping to the Btru.v1 assembly (Supplementary Fig. 2c), the “smudgeplot” did infer diploidy (A/B) with minor triploidy (AAB), with a high probability (91%) that the data matched a diploid model and a genome size estimate of ~1.2 Gb (Supplementary Fig. 2d).

Genome annotation

A large portion (51.0%) of the genome (Btru.v1) of Bu. truncatus was repetitive (Supplementary Table 3) and included transposable elements (~23.7% DNA transposons and 6.1% retrotransposons). Although most DNA transposons could not be classified, hobo-activator hAT transposon-like elements were highly represented (11.4% of the assembly). Retrotransposons were mainly long terminal repeats (LTR; 4.8%) and long interspersed nuclear elements (LINEs; 1.4%). The remaining repeat content included unclassified (18.0%) or simple (2.9%) repeat elements (Supplementary Table 3). The distribution of total repetitive elements across the Btru.v1 genome was relatively even (Supplementary Table 3 and Supplementary Fig. 3). In each 500 kb genomic region of the Bu. truncatus genome, LTRs, LINEs, and DNA transposons were encoded on average 47.8, 16.5, 375.8 times, respectively, with no clear association with gene model density (Supplementary Table 3 and Supplementary Fig. 3). The highest frequency of encoded LTRs was observed in scaffolds: HiC_scaffold_34 (n = 174 elements in 500 kb); HiC_scaffold_309 (n = 163); and HiC_scaffold_10 (n = 153). The highest frequency of encoded LINEs was observed in scaffolds: HiC_scaffold_188 (n = 157); HiC_scaffold_52 (n = 140); and HiC_scaffold_188 (n = 113). The highest frequency of encoded DNA transposons was observed in scaffolds: HiC_scaffold_252 (n = 1062); HiC_scaffold_26 (n = 897); and HiC_scaffold_58 (n = 773). An analysis of repeat elements (n = 25,778) in the proximity (within 5000 nt; both directions) of a protein-coding gene (Supplementary Fig. 3 and Supplementary Table 4) showed that simple (8072 of 519,680 elements) and low complexity repeats (1509 of 65,058) were more likely (p-value < 0.01), and DNA transposons (6821 of 983,913) and LTRs (652 of 127,776) less likely (p-value < 0.01), to be associated with such genes than with other parts of the genome.

Transcriptomic data for adult Bu. truncatus and Bi. glabrata, and protein data in the UniProtKB/SwissProt database (14 May 2020)25, were used for the evidence-based prediction of protein-coding genes from the Btru.v1 genome. A total of 75,434 genes were predicted in the masked assembly, reflecting marked complexity. To enable the characterisation of this gene set, we defined gene clusters (Supplementary Fig. 4a) and selected those in clusters 2 (n = 12,209 genes), 4 (n = 5238) and 5 (n = 9674) as each having salient, common features (i.e. more than one exon, GC content and complexity within predicted protein) (Supplementary Fig. 4b). Thus, a total of 26,292 protein-encoding genes were characterised for Bu. truncatus (Table 2) and annotated (Table 3).

Table 2.

Comparison of the features of the draft genome of Bulinus truncatus (Btru.v1) with that of Biomphalaria glabrata GCF_000457365.1_ASM45736v112.

Features Bulinus truncatus (BRI strain; Btru.v1) Biomphalaria glabrata (BB02 strain; ASM45736v1)
Number of genes/mRNA 26,292/ 26,292 25,539, 36,662
Gene lengtha 11,860 ± 10,842 12,166 ± 17061
mRNA length 1600 ± 1527 1925 ± 1795
Coding domain length 1600 ± 1527 1296 ± 1356
Number of exons 9 ± 12 7 ± 8
Exon length 175 ± 298 263 ± 494
Intron length 1256 ± 2314 1603 ± 3933
Protein length 532 ± 509 431 ± 452
Genes with transcriptional support 19,274 (73.3%)
Completeness:
    Complete BUSCOsb 905 (95.0%) 847 (88.8%)
    Complete single-copy BUSCOs 737 (77.3%) 825 (86.5%)
    Complete and duplicated BUSCOs 168 (17.6%) 22 (2.3%)
    Fragmented BUSCOs 26 (2.7%) 62 (6.5%)
    Missing BUSCOs 23 (2.4%) 45 (4.7)

aLengths (bp); mean ± standard deviation.

bNumber (%) of Benchmarking Universal Single-Copy Orthologs (BUSCOs) identified (in protein mode) using the Metazoa_odb10 dataset (954 genes) for comparison.

Table 3.

Annotation of the protein-encoding genes (n = 26,292) predicted for Bulinus truncatus.

Description of approach Genes (% of total) Unique annotations
UniProt TrEMBL database annotation 23,248 (88.4%) 16,573
UniProt SwissProt database annotation 18,139 (69.0%) 11,450
eggNOG database 20,436 (77.7%) 12,788
    GeneOntology—eggNOG 13,009 (49.5%) 7418
    Enzymes—eggnog 5256 (20.0%) 1129
InterProScan database annotation 20,943 (79.7%) 10,618
    Pfam annotation 17,568 (66.8%) 4933
    Gene Ontology 15,606 (59.4%) 2755
MEROPS protease database 1299 (4.9%) 537
KEGG Orthology (KO) annotation
    KEGG protein families 15,844 (60.3%) 6403
    KEGG pathways 9494 (36.1%) 3464
Annotated by ≥1 method/databasea 21,951 (83.5%)
Orphan genes/with transcriptional support 4341/2053
Proteins predicted to encode a signal peptide domain 3121 (11.9%)
Proteins predicted to encode one or more transmembrane domains 7786 (29.6%)
Proteins predicted to be extracellular/secreted 1096 (4.2%)

aWithout homology-based matches in the UniProt TrEMBL database (accessed 20 December 2020).

The statistics for the annotated genes in the Btru.v1 genome were similar to those of Bi. glabrata (BB02 strain)12: mean gene lengths (11,860 vs. 12,166 bp for Btru.v1 vs. Bi. glabrata), mRNAs (1600 vs. 1925 bp), exons (175 vs. 263 bp) and introns (1256 vs. 1603 bp); mean protein length (532 vs. 431 amino acid [aa] residues) was slightly less than predicted for Bi. glabrata (Table 2; Supplementary Fig. 5). Coding regions (1600 vs. 1296 bp) and inferred proteins (532 vs. 431 aa residues) were only slightly longer than predicted for Bi. glabrata (Table 2; Supplementary Fig. 5). The transcriptome of adult Bu. truncatus provided support for 19,274 genes (73.3%; transcripts per million, TPM ≥ 0.2) using RNA short-read (16,938; 64.4%) or long-read (17,137; 65.2%) RNA sequence data sets (Supplementary Fig. 6; Table 2; Supplementary Data 1), with 14,801 genes having transcriptional support using both data sets and displaying a direct association between RNA short-read and long-read TPM values (adjusted R-squared: 0.6589; p-value: < 0.01; Supplementary Fig. 6). Long-read data had a mean TPM value of 35.4 and a high coverage (>80%) for 16,216 genes, compared with a TPM value of 28.1 and a high coverage (>80%) for 2507 genes for short-read data.

Within the predicted gene set, we identified 905 (95%) of 954 complete, conserved metazoan genes by BUSCO analysis, suggesting that this set represents most of the genome. This result is similar to that (89%) inferred for complete BUSCOs in Bi. glabrata (BB02 strain); however, the Btru.v1 genome was predicted to have more BUSCO gene duplicates (17.6%), but fewer fragmented (2.7%) or missing (2.4%) genes (Table 2). For Bu. truncatus, we inferred 23,248 (88.4%) or 18,139 (69.0%) genes encoding proteins with sequence homology to proteins present in the non-redundant UniProt TrEMBL or SwissProt databases, respectively (Table 3).

Subsequently, we functionally annotated 21,951 genes using information from one or more of the following databases: InterProScan (n = 20,943; 79.7%), eggNOG (n = 20,436; 77.7%) and MEROPS (n = 1299; 4.9%) (Table 3; Supplementary Data 1). Gene Ontology (GO) terms were assigned to 18,507 sequences (InterProScan domain: 15,606; eggNOG: 13,009). Most of the encoded proteins (n = 26,292) had homology to those in the KEGG database (n = 20,433; 77.7%), 15,844 of which were assigned an orthology term (Table 3; Supplementary Data 1 and 2) and represented 45 protein groups or families (KEGG BRITE; Supplementary Data 1 and 2), with most inferred to be associated with “membrane trafficking” (n = 1799), “chromosome and associated proteins” (n = 1327), “exosome” (n = 1220), “peptidases and their inhibitors” (n = 1166), “transport system” (n = 1124), “ubiquitin system” (n = 878) and “G protein-coupled receptors” (GPCRs; n = 709). More than half of the genes (n = 9494) assigned KO terms were linked to 315 distinct KEGG pathway modules (Supplementary Data 2), including “environmental information processing” – including signalling: PI3K-Akt (n = 416), calcium (n = 309) and cAMP (n = 309); neuroactive ligand-receptor interaction (n = 679); cellular community: focal adhesion (n = 433); and cellular processes: lysosome transport and catabolism (n = 310). Overall, 21,951 (83.5%) of proteins were annotated, and 4341 (16.5%) were not and were, thus, called “orphan” (unknown) proteins (Table 2; Supplementary Data 1). Of these orphans, 2053 had transcriptional support, of which 1549 had homology to proteins predicted for Bi. glabrata (BB02 strain).

Excretory/secretory (ES) proteins are reported to play central roles in snail-schistosome interactions18. Within the Bu. truncatus gene set, 1096 (4.2%) genes were predicted to encode extracellular ES proteins, based on the presence of a signal peptide domain (for 3121 genes; 11.9%) and absence of one or more transmembrane domains (for 18,507 genes) (Table 3; Supplementary Data 1). The full complement of ES proteins (i.e. the secretome) was inferred to represent 434 peptides with KEGG annotations that could be assigned to the following protein groups: peptidases and their inhibitors (n = 102), exosome (67), membrane trafficking (48), glycosaminoglycan-binding proteins (47) and lectins (26). Of these 434 proteins, 330 were orphans, 118 had homology to individual proteins in Bi. glabrata (BB02 strain), and 251 had transcriptional support in the adult stage of Bu. truncatus.

Marked synteny in mollusc genomes

The annotated protein-coding gene set was first used to assess conserved gene order (synteny) in Btru.v1 genomic scaffolds compared to genes encoded in the reference genomes for Bi. glabrata, Achatina immaculata (gastropod; family Achatinidae) and Pecten maximus (bivalve; family Pectinidae); the relationship among these snails is summarised in phylogenetic analyses of concatenated amino acid sequence data inferred from 2315 SCOs using maximum likelihood (ML) and Bayesian inference (BI) methods (Supplementary Fig. 7). Upon pairwise comparison, many blocks of nucleotide sequence were aligned across conserved gene regions among Bu. truncatus, Bi. glabrata (S1316-R1 strain), Ac. immaculata and P. maximus (Fig. 1), with most conservation seen between Btru.v1 and Bi. glabrata (Fig. 2a; Table 4). Almost all genome scaffolds contained regions that aligned in 594 syntenic blocks of ≥3 single-copy orthologs spanning 557.8 Mb (46%) and 396.5 Mb (49 %) of the Bu. truncatus (Btru.v1) and Bi. glabrata genomes, respectively. For Bu. truncatus, most scaffolds (n = 436) had shared synteny with Ac. immaculata (Fig. 1b; Table 4), with 97 % (n = 436; ~1.19 Gb) of scaffolds for Btru.v1 aligning to all chromosomes of Ac. immaculata (n = 32; 1.65 Gb) in 243 syntenic blocks of ≥6 single-copy orthologs spanning 411.7 Mb (35%) and 372.6 Mb (23%) of the Btru.v1 and Ac. immaculata genomes, respectively. Fewer Btru.v1 scaffolds (n = 398) shared syntenic regions with chromosomes of P. maximus, (Fig. 1b; Table 4), with 95 % (n = 398; ~1.17 Gb) of these scaffolds aligning to 19 of the 3983 scaffolds (92%; 844.3 Mb) in 291 syntenic blocks of ≥3 single-copy orthologs spanning 404.9 Mb (35%) and 534.2 Mb (63%) of the Btru.v1 and P. maximus genomes, respectively.

Fig. 1. Synteny between genomes.

Fig. 1

Synteny and contiguity of the genome (Btru.v1) of Bulinus truncatus with the draft genome of Biomphalaria glabrata and the chromosomal-level reference genomes of each Achatina immaculata and Pecten maximus, respectively. Scaffolds are arranged in circular (circos) plots with reference scaffolds for Ac. immaculata or P. maximus linked to inferred syntenic blocks of the Bu. truncatus genome using distinctly-coloured bars. Linked syntenic blocks between scaffolds of Bu. truncatus and Bi. glabrata are shown (light blue). a Synteny between the reference genomes of Bu. truncatus and Bi. glabrata established based on the positions of 595 syntenic blocks each containing three or more single-copy orthologs (i.e. 4047 of a total of 11,051). b Synteny between the reference genomes of Bu. truncatus and Ac. immaculata, established based on the positions of mapped Bu. truncatus proteins in 243 syntenic blocks each containing six or more single-copy orthologs (i.e. 2234 of a total of 8517). c Synteny between the reference genome of Bu. truncatus and P. maximus established based on the position of 291 syntenic blocks containing three or more single-copy orthologs (i.e. 1139 of a total of 4685).

Fig. 2. Orthologous protein groups.

Fig. 2

Orthologous groups of one or more protein(s) in gastropod taxa with available gene annotation. Pecten maximus is an outgroup (bivalve). a Pairwise comparison of the orthologous groups common to Bulinus truncatus and Biomphalaria glabrata. b Genes in orthologous groups common to Bu. truncatus and Bi. glabrata. c UpSet plot of the intersections of unique or shared orthologous groups inferred from protein data sets for Bu. truncatus, Bi. glabrata, Aplysia californica, Elysia chlorotica and P. maximus. d Phylogenetic tree inferred from single copy orthologs aligned among selected gastropod taxa, including all available annotated genes. As the topology of the maximum likelihood (ML) and Bayesian inference (BI) trees was the same, the ML tree is displayed and shows nodal support values for both BI (pp) and ML (bootstrap). e Genes (annotated by KEGG) predicted to be unique to Bu. truncatus (coloured), or shared between or among the five molluscan species included here.

Table 4.

Genome-wide synteny comparisons of the genomes of Bulinus truncatus (Btru.v1), Biomphalaria glabrata (1316-R1/ASM1452496v1), Achatina immaculata and Pecten maximus in a pairwise manner.

Comparison Species Genome size Total no. of scaffolds Scaffolds aligned (%) Total no. of scaffolds aligned Syntenic blocks/length (%) No. of single-copy orthologs in syntenic blocks No. of single-copy orthologs
Bu. truncatus vs. Bi. glabrata Bu. truncatus 1.23 Gb 523 1.21 Gb (99%) 472 594/557.8 Mb (46%) 6939 11,051
Bi. glabrata 852.0 Mb 927 807.7 Mb (95%) 526 594/396.5 Mb (49%) 6939 11,051
Bu. truncatus vs. Ac. immaculata Bu. truncatus 1.23 Gb 523 1.19 Gb (97%) 436 243/411.7 Mb (35%) 2234 8517
Ac. immaculata 1.65 Gb 31 1.65 Gb (100%) 31 243/372.6 Mb (23%) 2234 8517
Bu. truncatus vs. P. maximus Bu. truncatus 1.23 Gb 523 1.17 Gb (95%) 398 291/404.9 Mb (35%) 1139 4685
P. maximus 918.3 Mb 3983 844.3 Mb (92%) 19 291/534.2 Mb (63%) 1139 4685

Protein ortho-groups and expansions

Next, the annotated Bu. truncatus protein-coding gene set was compared to genome annotations available for Bi. glabrata (BB02 strain), Aplysia californica, Elysia chlorotica and P. maximus (Table 5 and Fig. 2); the relationship among these snails is summarised in phylogenetic analyses of concatenated amino acid sequence data inferred from 2315 SCOs using maximum likelihood (ML) and Bayesian inference (BI) methods (Supplementary Fig. 7) Upon comparison, 17,866 ortho-groups containing one or more proteins were identified. Most ortho-groups (n = 13,302) were shared by Bu. truncatus (n = 20,349 proteins) and Bi. glabrata (n = 19,853 proteins); 4583 Bu. truncatus ortho-groups (n = 5943 proteins) and 4252 Bi. glabrata ortho-groups (n = 5685 proteins) were unique to each of these two species (Fig. 2). In total, 7895 ortho-groups containing one or more proteins were shared by Bu. truncatus, Bi. glabrata, Ap. californica, E. chlorotica and P. maximus (Fig. 2), with 3924 ortho-groups (n = 5079 proteins) being unique to Bu. truncatus and 1695 ortho-groups representing both Bu. truncatus (n = 2436 proteins) and Bi. glabrata (n = 2999 proteins), to the exclusion of the other mollusc species studied (Fig. 2c). Of all ortho-groups shared by all species, 2919 were single-copy orthologs (Fig. 2d). Maximum likelihood (ML) and BI analyses of aligned single-copy gene sequence data placed Bu. truncatus with Bi. glabrata, and Ap. californica with E. chlorotica, with absolute nodal support (Fig. 2d).

Table 5.

Molluscan species/strains studied herein and relevant information on their genomes.

Species (strain) Genome code Accession number Gene annotation Assembly level Scaffold N50 Size Reference
Achatina immaculata ASM976088v1 GCA_009760885.1 No Chromosome 56.4 Mb 1.65 Gb Unpublished
Aplysia californica AplCal3.0 GCF_000002075.1 Yes Scaffold 917.5 kb 927.3 Mb Unpublished
Biomphalaria glabrata (BB02) ASM45736v1 GCA_000457365.1 Yes Scaffold 48.1 kb 916.4 Mb 12
Biomphalaria glabrata (1316-R1) ASM1452496v1 GCA_014524965.1 No Scaffold 2.60 Mb 852.0 Mb 13
Elysia chlorotica ElyChl2.0 GCA_003991915.1 Yes Scaffold 442.0 kb 557.5 Mb 88
Pecten maximus xPecMax1.1 GCF_902652985.1 Yes Chromosome 44.8 Mb 918.3 Mb 89

Subsequently, we explored protein family expansions in Bu. truncatus, or in Bu. truncatus and Bi. glabrata (BB02 strain) (Fig. 2e) with respect to other mollusc species included. Of the 5079 proteins (3924 ortho-groups) unique to Bu. truncatus, 1898 were classified into one of 31 (KEGG BRITE) families encoded by at least 10 genes (Fig. 2; Supplementary Data 1). Of the 2436 proteins in Bu. truncatus with an orthologue exclusively in Bi. glabrata, 1039 were classified into 20 families encoded by at least 10 genes (Fig. 2; Supplementary Data 1). Protein families unique to Bu. truncatus were associated with metabolism: peptidases and inhibitors (n = 185), protein kinases (n = 94), protein phosphatases (n = 68) and glycosyltransferases (n = 62); genetic information processing: membrane trafficking (n = 236), the ubiquitin system (n = 110), chromosome (n = 107), DNA repair and recombination proteins (n = 37) and spliceosome (n = 37); and signalling and cellular processes: exosome (n = 209), GPCRs (n = 157), glycosaminoglycan binding proteins (n = 111), transport system (n = 110) and cluster of differentiation (CD) molecules (n = 73). Protein families shared by Bu. truncatus and Bi. glabrata were associated with metabolism: peptidases and their inhibitors (n = 101), protein kinases (n = 61), protein phosphatases (n = 42) and glycosyltransferases (n = 38); genetic information processing: membrane trafficking (n = 126), ubiquitin system (n = 39), chromosome (n = 24) and transcription factors (n = 28); and signalling and cellular processes: GPCRs (n = 134), exosome (n = 107), transport system (n = 91), glycosaminoglycan binding proteins (n = 73) and CD molecules (n = 80).

Protein groups inferred to be involved in the snail-schistosome relationship

Given the role of Bu. truncatus as a key intermediate host for S. haematobium and related schistosome taxa21, we explored protein groups inferred to be involved in the immune/defence system of the snail and/or interactions with schistosomes (Table 6; Fig. 3; Supplementary Table 5).

Table 6.

Key protein groups in Bulinus truncatus and proposed roles and pathway associations – supported by published information.

Protein group Number of proteins predicted for Bu. truncatus Number of ortho-groups Known or proposed roles Pathway associations References
Guadeloupe resistance complex (GRC) 11 (15)a 8 Reduces susceptibility to schistosome infection Cellular processes: lysosome, adherens junction, endocytosis; metabolism: glycosaminoglycan degradation 26
Polymorphic transmembrane cluster 2 (PTC2) 8 (11)a 5 Reduces susceptibility to schistosome infection Glycan biosynthesis and metabolism; glycosphingolipid biosynthesis 13
BIRs/IAPs 117 49 Drug response, apoptosis, innate immune responses Apoptosis, ubiquitin mediated proteolysis, NF-kappa beta signalling, 12, 27
Toll-/IL-1-related proteins 123 49 Reduces susceptibility to schistosome infection, immune response Toll-like receptor signalling, necroptosis, NF-kappa beta signalling and HIF-1 signaling pathway 101, 102
Cathepsins 21 7 Reduces susceptibility to schistosome infection, excretory/secretory product Transport and catabolism (lysosome, phagosome, autophagy), apoptosis, antigen processing and presentation 18
Chitinases 103 18 Reduces susceptibility to schistosome infection, excretory/secretory product Amino sugar and nucleotide sugar metabolism 18, 26
Calmodulins 42 28 Stress response, drug susceptibility Immune system: C-type lectin receptor signaling pathway 27
Lectins 101 33 Immune response Immune system; C-type lectin receptor signalling pathway 103
Fibrinogen-related proteinsb 130 29 Susceptibility to schistosome infection; immune response Signalling molecules and interaction, ECM-receptor interaction, focal adhesion 36, 4244

aNumber of proteins inferred for Biomphalaria glabrata.

bA distinct group of lectins with a characteristic C-terminal fibrinogen domain.

Fig. 3. Gene expansions in key protein groups.

Fig. 3

Expansion of protein ortho-groups in Bulinus truncatus predicted to relate to snail-schistosome interactions, based on published information (see Table 6). Cluster dendograms showing orthogroups for Bu. truncatus with the largest gene expansions inferred from protein data sets of Bu. truncatus, Biomphalaria glabrata, Aplysia californica, Elysia chlorotica and Pecten maximus as defined using OrthoFinder. The locations of genes linked to the Guadeloupe resistance complex (GRC) or polymorphic transmembrane cluster 2 (PTC2) are indicated (boxed).

Proteins of the Guadeloupe resistance complex (GRC)26 and the polymorphic transmembrane cluster 2 (PTC2)13, proposed to be loci associated with parasite recognition and/or reduced susceptibility to schistosome infection13,26, were inferred for Bu. truncatus. The 11 GRC homologs predicted (Fig. 3; Table 6) were linked to cellular processes (lysosome and adherens junction) and metabolic processes required for glycosaminoglycan degradation, and five PTC2 homologs (Table 6, Fig. 3 and Supplementary Table 5) linked to metabolic processes required for glycosphingolipid biosynthesis.

The baculovirus inhibitor of apoptosis (IAP) repeat (BIR) protein homologs (n = 117) identified in Bu. truncatus were inferred to be involved in apoptosis, ubiquitin-mediated proteolysis, necroptosis and NF-kappa beta-signalling pathways (Table 6). Most BIR proteins (n = 25) of Bu. truncatus were represented within ortho-group OG0000004 containing 43, 1, 9 and 8 respective orthologues of Bi. glabrata, Ap. californica, E. chlorotica and P. maximus (Fig. 3). These findings are consistent with earlier reports showing an expansion of BIR proteins in molluscs12.

An expanded set of Toll-like receptors (n = 123) with a Toll/interleukin-1 receptor homology domain (IPR000157; Table 6) associated with Toll-like signalling, necroptosis, NF-kappa beta and/or hypoxia-inducible factor 1 (HIF-1) signalling pathways was identified. Most Toll/interleukin-1 proteins (n = 15) of Bu. truncatus were in ortho-group OG0000039, together with 27 orthologues in Bi. glabrata and 8 in Ap. californica (Fig. 3).

Likely to be involved in reduced snail susceptibility to schistosome infection18 are cathepsin homologs (n = 21) of Bu. truncatus, commonly involved in transport and catabolism (lysosome, phagosome and autophagy), cell growth and death (apoptosis) and antigen-processing/presentation pathways (Table 6). Ten of these cathepsins were within ortho-group OG0000133, together with 9, 5, 1 and 4 respective orthologues of Bi. glabrata, Ap. californica, E. chlorotica and P. maximus (Fig. 3). Of the chitinases (n = 103) predicted to be involved in amino sugar and nucleotide sugar metabolism in Bu. truncatus (Table 6) and reduced snail susceptibility to schistosome infection, 36 were within ortho-group OG0000085, but had no ortholog in any other mollusc species studied (Fig. 3). In addition, of the calmodulins (n = 42) that likely associate with pathogen interactions27 (Tables 6), 6 were within ortho-group OG0000309, also with 6, 4 and 4 respective orthologues of Bi. glabrata, E. chlorotica and P. maximus (Fig. 3).

Lectins (n = 101) identified in Bu. truncatus were predominantly assigned to the C-type lectin receptor signalling pathway, being associated with roles in focal adhesion, interactions with the extracellular matrix (ECM) and phagosomes/lysosomes (Table 6 and Fig. 3). For instance, 18 mannose receptor C type-like proteins (n = 18) were identified to have 20, 9 and 6 orthologues in Bi. glabrata Ap. californica and E. chlorotica, respectively (Fig. 3). A group of lectin-like proteins, homologous to fibrinogen-related proteins of Bi. glabrata (FReD; InterPro identifier IPR036056), were also inferred for Bu. truncatus (n = 130). The abundance of FReDs (n = 130) in Bu. truncatus contrasted the smaller number (n = 72) in Bi. glabrata, and the closest homologs between the two species were in distinct ortho-groups (Table 6 and Fig. 3). Most FReDs of Bu. truncatus represented fibrinogen C-terminal domain proteins within ortho-group OG0000016 (38 proteins), also containing 8, 14 and 2 orthologs of Bi. glabrata, Ap. californica and P. maximus, respectively (Fig. 3).

Classification of fibrinogen-related proteins

The phylogenetic analysis of aligned amino acid sequences of the conserved C-terminal fibrinogen domain (FBG) of FReDs of Bu. truncatus (n = 95) and Bi. glabrata (n = 56) inferred 7 groups, of which Groups 1 to 5 and 7 had strong nodal support (pp = 0.93 to 1.0; Fig. 4a). For both snail species, individual proteins in these groups were annotated/classified based on their predicted tertiary structures (Fig. 4b, Supplementary Table 6 and Supplementary Fig. 8; Supplementary Data 3). Class A proteins matched Bi. glabrata sFReD proteins with a single C-terminal fibrinogen-like domain (Fig. 4b, Supplementary Fig. 8). Unlike sFReDs, some Class A proteins contained N-terminal α-helices (Class A α) or β-sheets and α-helices (Class A βα). The presence of α-helices in Class A α proteins in close proximity to the N-terminus was suggestive of a signal peptide, as they often preferentially adopt an α-helical form28. Class B and C proteins matched Bi. glabrata FREP proteins, with a single FReD domain linked to one (Class B) or two (Class C) N-terminal immunoglobulin superfamily (IgSF) domains by α-helices (Supplementary Table 6 and Supplementary Fig. 8). Class D proteins were Bi. glabrata FREM-like proteins (Supplementary Table 6). Class E and F were Bu. truncatus FReD-like proteins with complex structures in the N-terminus but had no structural homology to known, predicted Bi. glabrata proteins (Fig. 4b). Some (Class A-like) proteins contained a partial FBG domain (Supplementary Table 6), and, thus, had been excluded from the original phylogenetic analysis (Fig. 4a).

Fig. 4. Comparison of select fibrinogen-related proteins (FReDs) of Bulinus truncatus and Biomphalaria glabrata.

Fig. 4

a Phylogenetic relationship of a set of curated FReD-like proteins encoded in the genomes of Bu. truncatus and Bi. glabrata. Each branch tip is labelled with an existing FReD classification system for Bi. glabrata36 (shape) or species (colour). Posterior probability (pp) values are indicated, and groups are numbered (1 to 7). Proteins with predicted signal peptide (SP) domains are indicated with cyan boxes. A distinct colour represents each FReD class, with gene accession numbers boxed. #Bu. truncatus ortho-groups inferred from proteins of Bu. truncatus, Bi. glabrata, Aplysia californica, Elysia chlorotica and Pecten maximus (cf. Figure 3). b Tertiary structure models for FReDs of Bu. truncatus employed for enhanced classification (A–F). C-terminal domain (*), as well as IgSF-like and α-helix and/or β-sheet structures, are shown.

A detailed exploration of fibrinogen-related proteins in these phylogenetic groups based on their predicted structures provided enhanced insights. Group 1 contained 36 Bi. glabrata proteins, 21 of which had a predicted signal peptide based on primary amino acid sequence (Fig. 4a). Group 1 proteins were classified as Class B FREPs, (n = 9), Class C FREPs (n = 20) or sFReDs with (Class A α n = 4; Class A βα n = 1) or without (Class A; n = 1) additional N-terminal tertiary structures. Group 2 included 27 Bu. truncatus proteins, 16 of which had signal peptides. Most Group 2 Bu. truncatus proteins were classified as sFReDs without (Class A; n = 14) or with additional N-terminal structures (Class A α n = 4; Class A βα n = 7). Many proteins in Group 2 were assigned to ortho-group OG0000016, including one Bu. truncatus protein (Btru_033719) with a distinct N-terminal domain encoding several, ordered β-sheets (Class E). The three Bi. glabrata proteins within Group 2 all encoded a signal peptide and were classified as sFReDs (n = 2) or Class Aα (n = 1) (Fig. 4a). Group 3 contained only two Bi. glabrata proteins and were classified as Class Aα. Group 4 included 58 Bu. truncatus and 5 Bi. glabrata proteins, 33 and 3 of which had signal peptides, respectively. Proteins in Group 4 were most diverse, including FReDs assigned to ortho-groups OG0000103, OG0000581, OG0010992 and OG0015404. Bi. glabrata proteins in Group 4 were all predicted to be sFReDs and represented a distinct subgroup with three Bu. truncatus proteins (Fig. 4b; pp = 0.85). Most Bu. truncatus in Group 4 were sFReDs with (Class Aα n = 13; Class A βα n = 2) or without additional N-terminal structures (Class A; n = 37) or were predicted to encode only a partial FReD domain (n = 5). One FReD-like protein (Btru_048110) in this group encoded a novel N-terminal domain (Class F; Fig. 4b), but clustered with Bu. truncatus sFReDs (Class A). Group 5 proteins were all classified as Class Aα (4 Bu. truncatus and 2 Bi. glabrata), two of which were assigned to ortho-group OG006406. Only one protein (Bu. truncatus Btru_048819) in this group had a predicted signal peptide. Group 6 proteins all had signal peptides; two were Bi. glabrata proteins (one sFReD and one in Class C). Interestingly, the tertiary structure of a Class C-like Bu. truncatus protein (Btru_007887) in this group encoded two IgSF-like domains (cf. Figure 4b), which had not been detected previously based on amino acid sequence homology (Supplementary Data 1); this Bu. truncatus protein had a shorter FReD domain and a longer chain of α-helices than seen for Bi. glabrata Class C FREPs (Supplementary Fig. 8). Group 7 contained three Bu. truncatus and three Bi. glabrata protein, all of which were classified as Class A βα, mostly within ortho-group OG0004042, most of which were not predicted to encode a signal peptide.

Discussion

Here, we present the first draft nuclear genome (Btru.v1) of a key representative of the snail genus Bulinus – a complex of at least 37 species presently divided into four main groups29. This draft genome of the BRI-laboratory line of Bu. truncatus, originally from Egypt, was assembled using a combination of second- (short-read) and third-generation (long-read) sequence data. The draft assembly shared a high degree of contiguity with the available reference genomes (chromosomes) of the giant land snail, Ac. immaculata (Gastropoda: Styllommatophora) and a marine scallop, P. maximus (Bivalvia: Pectinida). Most gene models were strongly supported by short- and long-read RNA sequence data, and high-quality models enabled an exploration of the molecular biology of Bu. truncatus (BRI strain) and comparative investigations of the genomes of selected molluscs, including Bi. glabrata—which is an intermediate host of S. mansoni.

Bu. truncatus is proposed to have four sets of chromosomes (tetraploid)29,30, thought to have arisen via alloploidy by ancestral hybridisation of closely related diploid species29. This proposal is supported by the karyotype and zymograms of an Egyptian isolate of Bu. truncatus31. However, alternate hypotheses are that this tetraploidy might have resulted from evolutionary saltation(s) upon nuclear fusion of genomes following hybridisation of two distinct diploid species (i.e. allo-tetraploidy) or whole-genome duplication in a diploid ancestor (i.e. auto-tetraploidy). The polymorphism seen in the Btru.v1 genome of Bu. truncatus, which seemed to reflect a diploid organism (Supplementary Fig. 2), was consistent with that observed in Ac. immaculata which, indeed, underwent whole-genome duplication in a diploid ancestor32. This evidence indicates that Bu. truncatus is an auto-tetraploid snail, with limited chromosomal divergence. Limited genetic divergence among the four sets of chromosomes and/or the use of a single restriction enzyme (DpnII) for Hi-C library construction, are probable reasons for the Bu. truncatus scaffolds being shorter than the expected chromosome lengths. High-coverage long-range and long-read genome sequencing and comparative karyotypic studies of key members of the Bulinus complex, using Btru.v1 as a reference, should establish their chromosomal evolution.

A genome-wide analysis revealed more genomic synteny between Bu. truncatus and Bi. glabrata 1316-R1 strain13 (snail hosts of schistosomes) than between Bu. truncatus and Ac. immaculata (land snail) and P. maximus (marine scallop). This finding is consistent with the evolutionary relationships of planorbid/bulinid gastropods33. The synteny of large regions (>46% of aligned genomes) of the Bu. truncatus and Bi. glabrata genomes has loci likely central to the susceptibility of these snails to respective schistosome species14. For instance, genomic regions in Bu. truncatus homologous to loci encoding members of the GRC and PTC2 complexes associated with reduced susceptibility of Bi. glabrata to S. mansoni infection were identified13,34. Detailed comparative studies are now needed to fill the void in our knowledge of the molecular basis of susceptibility in Bu. truncatus to S. haematobium.

More broadly, a large proportion of proteins inferred for Bu. truncatus were orthologous to those predicted for Bi. glabrata (BB02 strain) and formed ortho-groups to the exclusion of proteins of the other species of gastropod (Ap. californica and E. chlorotica) or bivalve (P. maximus) studied. These distinct planorbid/bulinid protein groups may be linked to one or more evolutionary events which led to the speciation within the superfamily Lymnaeoidea. Expansions of some protein-gene families/groups in Bu. truncatus were relatively consistent with those in Bi. glabrata12,14,3537. For instance, shared expansions were seen for some genes encoding GPCRs, proteins involved in membrane trafficking, and peptidases and their inhibitors. The large repertoire of genes encoding GPCRs and transport proteins likely relates to the diversity of receptors in olfactory organs of gastropods with key chemosensory roles12,38.

Key expansions of select gene families could have arisen during the evolution of snail defences against pathogens, including schistosomes. For example, the cathepsin L-like genes of Bu. truncatus have homologs in Bi. glabrata which are known to regulate snail immunity39. The expansion of Bu. truncatus chitinase-like proteins (n = 103) is consistent with those in cephalopods40 and bivalves41, in which there has been a marked gene expansion relating to the regulation of immune responses41. In Bi. glabrata, there is evidence that a chitinase-like protein is associated with one or more loci that modulate susceptibility to S. mansoni26.

Also carbohydrate-binding C-type and FREPs/FReD-like lectins could play key roles in the defence of snails against schistosomes36,4246. We identified a significant expansion of genes encoding FReD-like lectins in Bu. truncatus, similar to that seen in Bi. glabrata. However, the protein members of these species were very distinct, with few homologs clustering together in orthogroups. This finding is consistent with a previous study of Bu. truncatus, in which no canonical Class C and/or B IgSF-FREPs were identified23. Detailed comparative analysis of FREPs/FReD-like lectins, guided by tertiary structure models, allowed the identification/classification in Bu. truncatus of a single IgSF-FREP-like lectin (Class C) and no gene expansion event in this group, in accord with studies of the pond snail, Physella acuta47, the common periwinkle, Littorina littorea48 and the sea hare, Ap. californica49. Interestingly, there was a marked expansion in the number and diversity of Class A sFReD-like proteins in Bu. truncatus, with two large groups with a similar diversification to FREPs in Bi glabrata36. The presence of additional α-helices and β-sheets in the N-terminal regions of Bu. truncatus sFReDs and signal peptide-like domains in some proteins (Class A α) based on tertiary (but not primary) structure models suggest marked functional diversity, which we propose is central to essential immunobiological processes in Bu. truncatus. Clearly, further work is needed to establish the roles of sFReDs in Bu. truncatus and draw comparisons with information available for Bi. glabrata46, Ph. acuta47, Mytilus edulis50 and other molluscs.

The draft genome (Btru.v1) of Bu. truncatus encodes many BIRs/IAPs, calmodulins and Toll-/IL-1-related proteins which have conserved orthologues/paralogues in Bi. glabrata. An expansion of BIRs/IAPs was reported earlier for Bi. glabrata12. While the exact role(s) of these molecules is/are not yet understood, the gastropod and bivalve species studied here encode several copies of genes that might relate to a regulatory role in apoptosis and innate immune response, with an observed gene expansion in snails that act as intermediate hosts for schistosomes12. Calmodulins transduce signals in response to increases in intracellular Ca2+, represent a major component of calcium-dependent signalling pathways51 and can play a role in pathogen defence52. The diversification of calmodulins in molluscs has been reported previously53 and has been associated with defence against S. mansoni infection in Bi. glabrata27, and against bacteria and yeast54.

Most invertebrate immune systems include an array of Toll-like receptors (TLRs) that mediate TLR-directed innate immunity to a wide range of pathogen species55,56. A much larger number of TLR-like proteins was predicted here from the Btru.v1 genome than reported previously for the transcriptome23, and more than two-times the number predicted from the Bi. glabrata genome12. Most of the TLRs predicted for Bu. truncatus were categorised (KEGG BRITE) as conserved TLR3/4-like molecules, which are likely to be conserved in invertebrates and mammals57. Future work is needed to acurately classify TLRs of planorbid/bulinid snails, to localise their expression in cells and tissues, and to establish which pathogen-associated molecular patterns (PAMPs) they recognise.

The relative conservation of the order of genes in the genomes of the snails studied here indicates that it should be feasible to characterise the genomes and gene orthologues of a range of lymnaeid and physid snails, which are key intermediate hosts (vectors) of socioeconomically important parasitic trematodes other than schistosomes2 – using the same technological approach as established here. Such an effort could assist in closing some of the knowledge gaps that exist in the understanding of systematics of these groups22,33, and would provide insight into the molecular evolution of molluscan groups.

The availability of the laboratory (BRI) lines for both Bu. truncatus and S. haematobium offers excellent opportunities to now study – under well-controlled experimental conditions – the molecular biology of each of these two invertebrates as well as their interactions. Bu. truncatus is an essential intermediate host of S. haematobium. In an aquatic environment, this snail becomes infected by the miracidium of S. haematobium; the ciliated ectoderm of this first larval stage sloughs off upon entry through the snail foot; and the miracidium transforms into a sporocyst, which then undergoes extensive asexual replication within the snail host. Having high-quality genomes and transcriptomes for both Bu. truncatus and S. haematobium now underpins critical molecular investigations of this asexual phase of reproduction, the cross-talk that occurs between the parasite and the snail host during replication, and the mechanisms and/or processes that govern snail susceptibility and/or immunity to the parasite. We propose that the use of a multi-omics approach58, involving the use of genome-guided transcriptomic, proteomic, lipidomic and/or glycomic analyses as well as high-resolution single-cell and spatial transcriptomics59, will strongly complement this focus. In addition, explorations of tertiary structure models for all proteins encoded in snail genomes using AlphaFold60 should allow the identification of distant orthologues and elucidate dark matter in these proteomes.

In conclusion, defining the first nuclear genome (of ≥ 1 Gb) for a well-defined laboratory line of Bu. truncatus opens the door to exploring a range of operational taxonomic units (OTUs) of Bulinus from natural populations as well as other key species of snail hosts of schistosomes, as a basis for future systematic, genetic, epidemiological and ecological investigations. Insights into these areas could significantly assist both fundamental and applied studies of schistosomes and schistosomiases, and enable the development of new interventions for this important neglected tropical disease-complex.

Methods

Procurement of the snail

Adult specimens of Bu. truncatus originated from a laboratory line (designated BRI), which is routinely maintained in the Biomedical Research Institute (BRI), Rockville, MD, USA61. This line was originally sourced from Egypt (Margaret Mentink-Kane, personal communication, 10 October 2020). Individual snails were washed extensively in phosphate-buffered saline (PBS, pH 7.0) and frozen at −80 °C.

Isolation of high molecular weight genomic DNA, library construction and sequencing

High quality genomic DNA was isolated from two adult Bu. truncatus snails using the Circulomics Tissue Kit (Circulomics, Baltimore, MD, USA). The integrity of the DNA was assessed using Genomic DNA ScreenTape and the Agilent 4200 TapeStation (ThermoFisher, MA, USA). Low molecular weight DNA was removed using a 10 kb short-read eliminator (SRE) kit (Circulomics, Baltimore, MD, USA). The high molecular weight DNA from the two individual snails was used to construct Nanopore Rapid Sequencing (SQK-RAPD004; Oxford Nanopore Technologies) and Ligation Sequencing (SQK-LSK109; Oxford Nanopore Technologies) genomic DNA libraries, according to the manufacturer’s instructions. Each flow cell used was washed using a Flow Cell Wash Kit (EXP-WSH003; Oxford Nanopore Technologies, Oxford, UK) and re-used to sequence additional SQK-LSK109 libraries. All libraries were sequenced in the MinION sequencer (Oxford Nanopore Technologies). Following sequencing, bases were converted into FASTQ format from raw FAST5 signals using the program Guppy v.4.2.2 (Oxford Nanopore Technologies). Reads with an average quality (Q) value of <7 were removed. A short-insert (500 bp) genomic DNA library was also constructed using the DNA from one snail, and paired-end sequenced (150 base reads) using TruSeq sequencing chemistry and the NovaSeq sequencing platform (Illumina, CA, USA). Finally, an in situ Hi-C library was constructed from an additional adult specimen of Bu. truncatus, according to manufacturer’s instructions (Proximo Hi-C Animal Kit, CA, USA), and paired-end sequenced (150 bp) using the NovaSeq sequencing platform (Illumina).

Isolation of total RNA, Oxford nanopore library construction and sequencing

Total RNA that had been isolated from an adult of Bu. truncatus (Egyptian strain) in an earlier study23 was used to prepare a long-read library using the Oxford Nanopore PCR-cDNA Sequencing Kit (SQK-PCS109; Oxford Nanopore Technologies, Oxford, UK), as recommended. This library was sequenced on a MinION sequencer (Oxford Nanopore Technologies) using an EXP-FLP002 flow cell priming kit and a R9.4.1 flow cell (FLO-MIN106). Following sequencing, bases were converted into the FASTQ format from raw FAST5 signals using the program Guppy v. 4.2.2. Reads with an average Q value of <7 were removed.

Assembly of the genome

Long reads from the genomic DNA from two Bu. truncatus snails were used to assemble contigs using FLYE v2.8-b167462 with the --nano-raw option and setting a genome size estimate of 900 Mb. Errors in long read sequence data were initially corrected using medaka_consensus in the medaka package v.1.0.3 (https://github.com/nanoporetech/medaka) and nanopore read data. Contigs were polished with the data derived from the short-insert (500 bp) genomic DNA library using pilon v.1.2363. Scaffolds were combined with the in situ Hi-C data using 3D-DNA v.18092264. Haplotig redundancy was removed by mapping long-read data to the genomic scaffolds, and haplotigs were eliminated employing purge_haplotigs v.1.1.165. Error-corrected long reads were then used to close gaps in scaffolds using TGS-GapCloser v.1.1.1 (https://github.com/BGI-Qingdao/TGS-GapCloser). Following the mapping of short-read data to the genomic scaffolds, haplotig redundancy was eliminated using purge_haplotigs v.1.1.165. The completeness of the genome was assessed (in genome-mode) using BUSCO v 4.0.266.

Assessing genome size, heterozygosity and ploidy

A short-insert (500 bp) genomic DNA library from a single Bu. truncatus snail was used to estimate genome size, heterozygosity and ploidy using the GenomeScope v.2.0 and smudgeplot v.0.2.4 packages67. Input into each program was the frequency of 21-mers in the raw short-read data determined using kmc v.3.1.168. Upper and lower frequencies used in smudgeplot were 1300 and 28, respectively. GenomeScope analyses were performed assuming a diploid or tetraploid genome model, according prior evidence for Bulinus from the literature29. Reads from the short-insert genomic DNA library were also mapped to the reference genome using bwa v.269 to estimate minor allelic frequencies and ploidy using PloiPy https://github.com/floutt/PloidPy.

Predicting repeat-elements, gene models and protein function

Repeat elements in the genome were predicted using RepeatModeler v. 1.0.8 (http://www.repeatmasker.org) and EDTA v v.1.9.470. Libraries were combined and redundancy was removed using CD-HIT v.4.8.171. The final repeat element library was used to mask the genome using RepeatMasker v.4.1.172. Gene models were predicted using funannotate v.1.7.4 (https://github.com/nextgenusfs/funannotate), publicly available RNA-seq data (NCBI BioProject PRJNA680620)23 (Supplementary Table 1) and inferred protein sequence data sets for Bi. glabrata (BB02 strain)12. The evidence modeler (EVM) v.1.1.173 matrix was weighted as follows: hiq: 7; predicted location of aligned Bi. glabrata or Swiss-Prot proteins (accessed 20 December 2020)25: 6; PASA v.2.4.173: 5; augustus v.3.3.374: 4; StringTie v2.1.275: 4; and geneMark ES v.3.3276: 3. The completeness of the gene set was assessed (in protein-mode) using the tool BUSCO v 4.0.266. The annotation of each inferred amino acid sequence was achieved using InterPro v5.3577, EggNOG mapper v.5.078 and/or homology (E-value ≤ 10−5) to proteins in the databases Swiss-Prot and TrEMBL within UniProtKB (accessed 20 December 2020)25, Kyoto Encyclopedia of Genes and Genomes (KEGG)79 and/or MEROPS release 1280 using DIAMOND BLASTp v. 0.9.2181. Protein groups and pathways were inferred based on homology to KEGG orthology (KO) terms linked to curated KEGG BRITE and pathway hierarchies. Signal peptide domains and/or transmembrane domains were predicted using phobius v.1.0482. The sub-cellular localisation of protein sequences was predicted computationally using the program MultiLoc2 v.2.2.2583. Evidence of gene transcription was inferred by mapping short and long RNA-seq data to the genome using HISAT2 v.2.1.084, and the level of transcription per gene (in transcripts per million, TPM) was inferred using StringTie v2.1.275. Gene models were inferred to have transcriptional support if one or more library had a TPM value of >0.2.

Protein-encoding genes were retained based on the features of their gene models. For each gene, the following features were curated: (1) GeneValidator v.2.1.1085 score estimated using comparisons to proteins in Swiss-Prot within UniProtKB25; (2) evidence of transcription (in TPM); (3) sequence homology (E-value ≤ 10−5) to proteins in the TrEMBL within UniProtKB25; (4) proportion of proteins predicted to be a “low probability subsequence” (LPS) using the program fLPS v.186; (5) GC content for coding domain; (6) length of inferred mRNA sequences; (7) number of exons in the gene model; (8) presence of one or more Pfam conserved domains inferred using InterProScan; and (9) numbers of genes representing individual groups of orthologous protein between Bu. truncatus (Btru.v1) and Bi. glabrata12, established using OrthoFinder v.2.3.1187. These features were transformed to normal distributions and subjected to PCA and K-means clustering analyses. Clusters with protein-encoding-like genes were retained for further curation and characterisation. Subsequently, we studied the distribution of repeats in the genome and their association with the 5’ and/or 3’ untranslated regions (UTRs; 5000 nucleotides for each) of protein-coding genes, employing the Fisher’s exact test (p-value < 0.01) to assess statistical significance of association(s).

Comparative genomic analyses

Groups of proteins that were orthologous between Bu. truncatus (Btru.v1) and Bi. glabrata (BB02 strain)12, E. chlorotica88, Ap. californica (https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Aplysia_californica/101/) and/or P. maximus89 (outgroup) were inferred using OrthoFinder v.2.3.1187 (Table 5). Then, inferred single copy orthologues of Bu. truncatus were mapped to the genomes of Ac. immaculata (https://www.ncbi.nlm.nih.gov/assembly/GCA_009760885.1) and Bi. glabrata (S1316-R1 strain)13 using Exonerate v.2.090. For P. maximus, the locations of inferred single copy orthologues were identified using respective genome feature format files. Locations of paired single copy orthologues in the same genomic region were grouped using bundlelinks in the program circos v.0.69-891 using the following settings: min_bundle_size = 1e4, min_bundle_membership = three or six and max_gap = 1e6. Scaffolds were reordered and displayed using the program circos v.0.69-891.

Subsequently, single-copy orthologues were inferred from homologous genes shared by Bu. truncatus, Bi. glabrata, E. chlorotica, Ap. californica, P. maximus (= outgroup) and/or Ac. immaculata, and their amino acid sequences conceptually translated. The clusters of single-copy orthologues representing all five or six species were aligned using the program AQUA92, employing the programs MUSCLE v3.8.3193 and MAFFT v.7.27194 for the alignment and RASCAL v1.3495 for alignment refinement. Individual clusters of genes with an alignment score of ≥0.8, obtained from the program NorMD96, were merged using the program PartitionFinder v2.1.197 to assign each merged partition to a predicted amino acid substitution matrix. Partitions that did not contain more than 20 amino acids were removed. Remaining partitions were then subjected to separate phylogenetic analyses using the Bayesian inference (BI) and maximum likelihood (ML) tree-building methods. BI analysis was conducted using the program MrBayes v3.2.698 from four independent Markov chains, run for 1,000,000 metropolis-coupled MCMC iterations, where trees were sampled every 1000 iterations. The resultant tree was inferred by initially discarding 25% of sampled trees as burn-in, and then using the remaining trees to infer tree topology, branch lengths and to calculate Bayesian posterior probabilities (pp). For ML, a partitioned ML tree was constructed using the program RAxML v8.2.699 – selected models for each partition, employing 20 iterations– and the best tree selected for bootstrap analysis (n = 100). A representative tree was prepared using FigTree v.1.31 (http://tree.bio.ed.ac.uk/software/figtree).

Identification and characterisation of fibrinogen-related domain containing proteins

Within the Bu. truncatus and Bi. glabrata protein data sets, proteins with sequence homology to the fibrinogen beta and gamma chains, C-terminal globular domain (Pfam: PF00147.20) were identified using hmmsearch (HMMER v.3.2.1; http://hmmer.janelia.org/) and using a threshold of E-value ≤ 10−5. Proteins containing a conserved fibrinogen domain were then aligned using hmmalign and using the --trim option and sequences with more than 150 amino acid residues across the conserved fibrinogen domain were retained. Trimmed sequences were de-gapped and re-aligned using MUSCLE v3.8.3193. Aligned sequences were then subjected to Bayesian inference (BI) analysis using the program MrBayes v3.2.698 and using a WAG model with fixed rate matrices, generating 4,000,000 trees and sampling every 400th tree. The resultant tree was inferred by initially discarding 50% of sampled trees as burn-in, and then using the remaining trees to infer tree topology, branch lengths and to calculate Bayesian posterior probabilities (pp). Phylogenetic trees were rendered and annotated using ggtree (v.1.10.5)100 in R v.3.4.3 (http://www.R-project.org/). Next, mature peptides (excluding predicted signal peptide domains) were subjected to tertiary structure prediction using AlphaFold60. The best ranked model was retained and used in subsequent analyses. FReD-like proteins were classified based on known domains described previously36, including the presence of a complete or partial C-terminus fibrinogen domain (Class A or A-like, respectively) and with one (Class B; 1-IgSF FREP) or two (Class C) additional N-terminal immunoglobulin superfamily (IgSF) domain. Proteins with FReD domain (Class A) but with additional N-terminal alpha-helices and/or beta-sheets or with complex novel domains were also classified.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Supplementary information

Peer Review File (603.9KB, pdf)
41467_2022_28634_MOESM3_ESM.pdf (559.7KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (8MB, xlsx)
Supplementary Data 2 (24.6KB, xlsx)
Supplementary Data 3 (17MB, zip)
Reporting Summary (357.7KB, pdf)

Acknowledgements

This research project was supported by grants from the Australian Research Council (ARC) to R.B.G., N.D.Y. and P.K.K. Informatics was supported via the LIEF HPC-GPGPU facility at the University of Melbourne, with assistance from LIEF grant LE170100200. The following was provided by the NIAID Schistosomiasis Resource Center for distribution through BEI Resources, NIAID, NIH: Bulinus truncatus subsp. truncatus, NR-21971.

Author contributions

N.D.Y. conceived, planned and developed the study, conducted the laboratory work, developed methods, wrote the original draft and funded the project. A.J.S. assisted with methods, laboratory work and edited the manuscript. T.W. assisted with laboratory work and edited the manuscript. P.K.K. assisted with informatics, funding and commented on the manuscript. M.M-K. reared and provided the snails used in this study. J.R.S. and D.R. both reviewed and edited the manuscript. R.B.G. was involved in planning and developing the conceptual framework, co-wrote the original draft and funded the project. All authors read, commented on and approved the submitted version of the manuscript.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Data availability

The nucleotide sequence data linked to the nuclear genome reported in this article is publicly available in the GenBank database and the Sequence Read Archive (SRA; https://www.ncbi.nlm.nih.gov/sra) under the accession numbers SAMN17050146, SAMN16898649 and SAMN16898648 with the NCBI BioProject accession number PRJNA680620. Protein sequences used for sequence homology searches are available from the Swiss-Prot (UniProtKB; https://www.uniprot.org/help/uniprotkb; accessed 20 December 2020)25, TrEMBL (UniProtKB; https://www.uniprot.org/help/uniprotkb; accessed 20 December 2020)25, Encyclopedia of Genes and Genomes (KEGG; https://www.genome.jp/kegg/; accessed 20 December 2020)79 and MEROPS release 12 (https://www.ebi.ac.uk/merops/)80 databases. All other data used are referred to in this article and its supplementary files.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Neil D. Young, Email: nyoung@unimelb.edu.au

Robin B. Gasser, Email: robinbg@unimelb.edu.au

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-022-28634-9.

References

  • 1.Rosenberg G. A new critical estimate of named species-level diversity of the recent Mollusca. Am. Malacol. Bull. 2014;32:308–322. [Google Scholar]
  • 2.Lu XT, et al. Snail-borne parasitic diseases: an update on global epidemiological distribution, transmission interruption and control methods. Infect. Dis. Poverty. 2018;7:28. doi: 10.1186/s40249-018-0414-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Keiser J, Utzinger J. Food-borne trematodiases. Clin. Microbiol. Rev. 2009;22:466–483. doi: 10.1128/CMR.00012-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.McManus DP, et al. Schistosomiasis. Nat. Rev. Dis. Prim. 2018;4:13. doi: 10.1038/s41572-018-0013-8. [DOI] [PubMed] [Google Scholar]
  • 5.Palumbo E. Association between schistosomiasis and cancer: a review. Infect. Dis. Clin. Pract. 2007;15:145–148. [Google Scholar]
  • 6.Sripa B, et al. The tumorigenic liver fluke Opisthorchis viverrini–multiple pathways to cancer. Trends Parasitol. 2012;28:395–407. doi: 10.1016/j.pt.2012.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Panic G, Keiser J. Acting beyond 2020: better characterization of praziquantel and promising antischistosomal leads. Curr. Opin. Pharmacol. 2018;42:27–33. doi: 10.1016/j.coph.2018.06.004. [DOI] [PubMed] [Google Scholar]
  • 8.Young ND, et al. Whole-genome sequence of Schistosoma haematobium. Nat. Genet. 2012;44:221–225. doi: 10.1038/ng.1065. [DOI] [PubMed] [Google Scholar]
  • 9.Young ND, et al. The Opisthorchis viverrini genome provides insights into life in the bile duct. Nat. Commun. 2014;5:4378. doi: 10.1038/ncomms5378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wang D, Young ND, Korhonen PK, Gasser RB. Clonorchis sinensis and clonorchiasis: the relevance of exploring genetic variation. Adv. Parasitol. 2018;100:155–208. doi: 10.1016/bs.apar.2018.03.006. [DOI] [PubMed] [Google Scholar]
  • 11.Stroehlein, A. J. et al. High-quality Schistosoma haematobium genome achieved by single-molecule and long-range sequencing. Gigascience8, giz108 (2019). [DOI] [PMC free article] [PubMed]
  • 12.Adema CM, et al. Whole genome analysis of a schistosomiasis-transmitting freshwater snail. Nat. Commun. 2017;8:15451. doi: 10.1038/ncomms15451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Tennessen, J. A. et al. Clusters of polymorphic transmembrane genes control resistance to schistosomes in snail vectors. Elife9, e59395 (2020). [DOI] [PMC free article] [PubMed]
  • 14.Castillo MG, et al. Biomphalaria glabrata immunity: post-genome advances. Dev. Comp. Immunol. 2020;104:103557. doi: 10.1016/j.dci.2019.103557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Silva-Neto A, Fonseca COD, Silva LM, Coelho PMZ, Fortes-Dias CL. Interaction of Biomphalaria cells in primary cultures with Schistosoma mansoni sporocysts. Rev. Soc. Bras. Med. Trop. 2020;53:e20200257. doi: 10.1590/0037-8682-0257-2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Habib MR, et al. Biochemical and apoptotic changes in the nervous and ovotestis tissues of Biomphalaria alexandrina following infection with Schistosoma mansoni. Exp. Parasitol. 2020;213:107887. doi: 10.1016/j.exppara.2020.107887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Mendes TMF, et al. Proteomic, metabolic and immunological changes in Biomphalaria glabrata infected with Schistosoma mansoni. Int. J. Parasitol. 2019;49:1049–1060. doi: 10.1016/j.ijpara.2019.08.001. [DOI] [PubMed] [Google Scholar]
  • 18.Fogarty CE, et al. Comparative study of excretory-secretory proteins released by Schistosoma mansoni-resistant, susceptible and naive Biomphalaria glabrata. Parasit. Vectors. 2019;12:452. doi: 10.1186/s13071-019-3708-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Davison A, Neiman M. Mobilizing molluscan models and genomes in biology. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2021;376:20200163. doi: 10.1098/rstb.2020.0163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Rollinson D. A wake up call for urinary schistosomiasis: reconciling research effort with public health importance. Parasitology. 2009;136:1593–1610. doi: 10.1017/S0031182009990552. [DOI] [PubMed] [Google Scholar]
  • 21.Despommier, D. D., Gwadz, R. W., Hotez, P. J. & Knirsch, C. A. Parasitic Diseases. 5th Edition. (Apple Trees Productions, LLC, 2005).
  • 22.Young ND, et al. Mitochondrial genome of Bulinus truncatus (Gastropoda: Lymnaeoidea): Implications for snail systematics and schistosome epidemiology. CRPVBD. 2021;1:100017. doi: 10.1016/j.crpvbd.2021.100017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Stroehlein AJ, et al. Bulinus truncatus transcriptome – a resource to enable molecular studies of snail and schistosome biology. CRPVBD. 2021;1:100015. doi: 10.1016/j.crpvbd.2021.100015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Young, N. D. et al. High-quality reference genome for Clonorchis sinensis. Genomics, 113, 1605–1615 (2021). [DOI] [PubMed]
  • 25.Magrane M, Consortium U. UniProt Knowledgebase: a hub of integrated protein data. Database. 2011;2011:bar009. doi: 10.1093/database/bar009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Tennessen JA, et al. Hyperdiverse gene cluster in snail host conveys resistance to human schistosome parasites. PLoS Genet. 2015;11:e1005067. doi: 10.1371/journal.pgen.1005067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Buddenborg SK, et al. Transcriptional responses of Biomphalaria pfeifferi and Schistosoma mansoni following exposure to niclosamide, with evidence for a synergistic effect on snails following exposure to both stressors. PLoS Negl. Trop. Dis. 2019;13:e0006927. doi: 10.1371/journal.pntd.0006927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Akopian D, Shen K, Zhang X, Shan SO. Signal recognition particle: an essential protein-targeting machine. Annu. Rev. Biochem. 2013;82:693–721. doi: 10.1146/annurev-biochem-072711-164732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Brown, D. S. Freshwater snails of Africa and their medical importance. (CRC Press, 1994).
  • 30.Yaseen AE. Cytogenetics and Biology of the Intermediate Host of Human Bilharziasis, Bulinus truncatus Common in Upper Egypt. Cytologia (Tokyo) 1993;58:53–60. [Google Scholar]
  • 31.Goldman MA, LoVerde PT, Chrisman CL. Hybrid origin of polyploidy in freshwater snails of the genus Bulinus (Mollusca: Planorbidae) Evolution. 1983;37:592–600. doi: 10.1111/j.1558-5646.1983.tb05576.x. [DOI] [PubMed] [Google Scholar]
  • 32.Liu C, et al. Giant African snail genomes provide insights into molluscan whole-genome duplication and aquatic-terrestrial transition. Mol. Ecol. Resour. 2021;21:478–494. doi: 10.1111/1755-0998.13261. [DOI] [PubMed] [Google Scholar]
  • 33.Albrecht C, Kuhn K, Streit B. A molecular phylogeny of Planorboidea (Gastropoda, Pulmonata): insights from enhanced taxon sampling. Zool. Scr. 2007;36:27–39. [Google Scholar]
  • 34.Tennessen JA, et al. Genome-wide scan and test of candidate genes in the snail Biomphalaria glabrata reveal new locus influencing resistance to Schistosoma mansoni. PLoS Negl. Trop. Dis. 2015;9:e0004077. doi: 10.1371/journal.pntd.0004077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Geyer, K. K. et al. The Biomphalaria glabrata DNA methylation machinery displays spatial tissue expression, is differentially active in distinct snail populations and is modulated by interactions with Schistosoma mansoni. PLoS Negl. Trop. Dis. 11, e0005246 (2017). [DOI] [PMC free article] [PubMed]
  • 36.Lu L, Loker ES, Adema CM, Zhang SM, Bu L. Genomic and transcriptional analysis of genes containing fibrinogen and IgSF domains in the schistosome vector Biomphalaria glabrata, with emphasis on the differential responses of snails susceptible or resistant to Schistosoma mansoni. PLoS Negl. Trop. Dis. 2020;14:e0008780. doi: 10.1371/journal.pntd.0008780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Pinaud S, et al. New insights into biomphalysin gene family diversification in the vector snail Biomphalaria glabrata. Front. Immunol. 2021;12:635131. doi: 10.3389/fimmu.2021.635131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Cummins SF, et al. Candidate chemoreceptor subfamilies differentially expressed in the chemosensory organs of the mollusc Aplysia. BMC Biol. 2009;7:28. doi: 10.1186/1741-7007-7-28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lockyer AE, et al. Biomphalaria glabrata transcriptome: cDNA microarray profiling identifies resistant- and susceptible-specific gene expression in haemocytes from snail strains exposed to Schistosoma mansoni. BMC Genomics. 2008;9:634. doi: 10.1186/1471-2164-9-634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Albertin CB, et al. The octopus genome and the evolution of cephalopod neural and morphological novelties. Nature. 2015;524:220–224. doi: 10.1038/nature14668. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Badariotti F, Lelong C, Dubos MP, Favrel P. Characterization of chitinase-like proteins (Cg-Clp1 and Cg-Clp2) involved in immune defence of the mollusc Crassostrea gigas. FEBS J. 2007;274:3646–3654. doi: 10.1111/j.1742-4658.2007.05898.x. [DOI] [PubMed] [Google Scholar]
  • 42.Adema CM, Hertel LA, Miller RD, Loker ES. A family of fibrinogen-related proteins that precipitates parasite-derived molecules is produced by an invertebrate after infection. Proc. Natl Acad. Sci. USA. 1997;94:8691–8696. doi: 10.1073/pnas.94.16.8691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Lockyer AE, et al. Early differential gene expression in haemocytes from resistant and susceptible Biomphalaria glabrata strains in response to Schistosoma mansoni. PLoS ONE. 2012;7:e51102. doi: 10.1371/journal.pone.0051102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Gordy MA, Pila EA, Hanington PC. The role of fibrinogen-related proteins in the gastropod immune response. Fish. Shellfish Immunol. 2015;46:39–49. doi: 10.1016/j.fsi.2015.03.005. [DOI] [PubMed] [Google Scholar]
  • 45.Li, H., et al. Coordination of humoral immune factors dictates compatibility between Schistosoma mansoni and Biomphalaria glabrata. Elife9, e51708 (2020). [DOI] [PMC free article] [PubMed]
  • 46.Wu XJ, et al. Proteomic analysis of Biomphalaria glabrata plasma proteins with binding affinity to those expressed by early developing larval Schistosoma mansoni. PLoS Pathog. 2017;13:e1006081. doi: 10.1371/journal.ppat.1006081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Schultz JH, Bu L, Adema CM. Comparative immunological study of the snail Physella acuta (Hygrophila, Pulmonata) reveals shared and unique aspects of gastropod immunobiology. Mol. Immunol. 2018;101:108–119. doi: 10.1016/j.molimm.2018.05.029. [DOI] [PubMed] [Google Scholar]
  • 48.Gorbushin AM, Borisova EA. Lectin-like molecules in transcriptome of Littorina littorea hemocytes. Dev. Comp. Immunol. 2015;48:210–220. doi: 10.1016/j.dci.2014.10.007. [DOI] [PubMed] [Google Scholar]
  • 49.Gorbushin AM, Panchin YV, Iakovleva NV. In search of the origin of FREPs: characterization of Aplysia californica fibrinogen-related proteins. Dev. Comp. Immunol. 2010;34:465–473. doi: 10.1016/j.dci.2009.12.007. [DOI] [PubMed] [Google Scholar]
  • 50.Gorbushin AM, Iakovleva NV. A new gene family of single fibrinogen domain lectins in Mytilus. Fish. Shellfish Immunol. 2011;30:434–438. doi: 10.1016/j.fsi.2010.10.002. [DOI] [PubMed] [Google Scholar]
  • 51.Racioppi L, Means AR. Calcium/calmodulin-dependent kinase IV in immune and inflammatory responses: novel routes for an ancient traveller. Trends Immunol. 2008;29:600–607. doi: 10.1016/j.it.2008.08.005. [DOI] [PubMed] [Google Scholar]
  • 52.McCormack E, Braam J. Calmodulins and related potential calcium sensors of Arabidopsis. N. Phytol. 2003;159:585–598. doi: 10.1046/j.1469-8137.2003.00845.x. [DOI] [PubMed] [Google Scholar]
  • 53.Simpson RJ, Wilding CS, Grahame J. Intron analyses reveal multiple calmodulin copies in Littorina. J. Mol. Evol. 2005;60:505–512. doi: 10.1007/s00239-004-0232-3. [DOI] [PubMed] [Google Scholar]
  • 54.Wright B, Lacchini AH, Davies AJ, Walker AJ. Regulation of nitric oxide production in snail (Lymnaea stagnalis) defence cells: a role for PKC and ERK signalling pathways. Biol. Cell. 2006;98:265–278. doi: 10.1042/BC20050066. [DOI] [PubMed] [Google Scholar]
  • 55.Satake H, Sekiguchi T. Toll-like receptors of deuterostome invertebrates. Front. Immunol. 2012;3:34. doi: 10.3389/fimmu.2012.00034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Buckley KM, Rast JP. Diversity of animal immune receptors and the origins of recognition complexity in the deuterostomes. Dev. Comp. Immunol. 2015;49:179–189. doi: 10.1016/j.dci.2014.10.013. [DOI] [PubMed] [Google Scholar]
  • 57.Nie L, Cai SY, Shao JZ, Chen J. Toll-like receptors, associated biological roles, and signaling networks in non-mammals. Front. Immunol. 2018;9:1523. doi: 10.3389/fimmu.2018.01523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Ma G, et al. Elucidating the molecular and developmental biology of parasitic nematodes: Moving to a multiomics paradigm. Adv. Parasitol. 2020;108:175–229. doi: 10.1016/bs.apar.2019.12.005. [DOI] [PubMed] [Google Scholar]
  • 59.Asp M, Bergenstrahle J, Lundeberg J. Spatially resolved transcriptomes-Next generation tools for tissue exploration. Bioessays. 2020;42:e1900221. doi: 10.1002/bies.201900221. [DOI] [PubMed] [Google Scholar]
  • 60.Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Lewis FA, Liang YS, Raghavan N, Knight M. The NIH-NIAID schistosomiasis resource center. PLoS Negl. Trop. Dis. 2008;2:e267. doi: 10.1371/journal.pntd.0000267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019;37:540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]
  • 63.Walker BJ, et al. PILON: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Roach MJ, Schmidt SA, Borneman AR. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinforma. 2018;19:460. doi: 10.1186/s12859-018-2485-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  • 67.Ranallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 2020;11:1432. doi: 10.1038/s41467-020-14998-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Kokot M, Dlugosz M, Deorowicz S. KMC 3: counting and manipulating k-mer statistics. Bioinformatics. 2017;33:2759–2761. doi: 10.1093/bioinformatics/btx304. [DOI] [PubMed] [Google Scholar]
  • 69.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Ou S, et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275. doi: 10.1186/s13059-019-1905-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. BioinformaticsChapter 4, Unit 4.10 (2009). [DOI] [PubMed]
  • 73.Haas BJ, et al. Automated eukaryotic gene structure annotation using EvidenceModeler and the program to assemble spliced alignments. Genome Biol. 2008;9:R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Hoff KJ, Stanke M. Predicting genes in single genomes with AUGUSTUS. Curr. Protoc. Bioinformatics. 2019;65:e57. doi: 10.1002/cpbi.57. [DOI] [PubMed] [Google Scholar]
  • 75.Pertea M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 2005;33:6494–6506. doi: 10.1093/nar/gki937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Zdobnov EM, Apweiler R. InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001;17:847–848. doi: 10.1093/bioinformatics/17.9.847. [DOI] [PubMed] [Google Scholar]
  • 78.Huerta-Cepas J, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012;40:D109–114. doi: 10.1093/nar/gkr988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Rawlings ND, Tolle DP, Barrett AJ. MEROPS: the peptidase database. Nucleic Acids Res. 2004;32:D160–164. doi: 10.1093/nar/gkh071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 2015;12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
  • 82.Kall L, Krogh A, Sonnhammer EL. Advantages of combined transmembrane topology and signal peptide prediction–the Phobius web server. Nucleic Acids Res. 2007;35:W429–432. doi: 10.1093/nar/gkm256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Blum T, Briesemeister S, Kohlbacher O. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction. BMC Bioinforma. 2009;10:274. doi: 10.1186/1471-2105-10-274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Dragan MA, Moghul I, Priyam A, Bustos C, Wurm Y. GeneValidator: identify problems with protein-coding gene predictions. Bioinformatics. 2016;32:1559–1561. doi: 10.1093/bioinformatics/btw015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Harrison PM. fLPS: Fast discovery of compositional biases for the protein universe. BMC Bioinforma. 2017;18:476. doi: 10.1186/s12859-017-1906-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238. doi: 10.1186/s13059-019-1832-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Cai H, et al. A draft genome assembly of the solar-powered sea slug Elysia chlorotica. Sci. Data. 2019;6:190022. doi: 10.1038/sdata.2019.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Kenny, N. J., et al. The gene-rich genome of the scallop Pecten maximus. Gigascience9, giaa037 (2020). [DOI] [PMC free article] [PubMed]
  • 90.Slater GS, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinforma. 2005;6:31. doi: 10.1186/1471-2105-6-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Krzywinski M, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Muller J, Creevey CJ, Thompson JD, Arendt D, Bork P. AQUA: automated quality improvement for multiple sequence alignments. Bioinformatics. 2010;26:263–265. doi: 10.1093/bioinformatics/btp651. [DOI] [PubMed] [Google Scholar]
  • 93.Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113. doi: 10.1186/1471-2105-5-113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33:511–518. doi: 10.1093/nar/gki198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Thompson JD, Thierry JC, Poch O. RASCAL: rapid scanning and correction of multiple sequence alignments. Bioinformatics. 2003;19:1155–1161. doi: 10.1093/bioinformatics/btg133. [DOI] [PubMed] [Google Scholar]
  • 96.Thompson JD, Plewniak F, Ripp R, Thierry JC, Poch O. Towards a reliable objective function for multiple sequence alignments. J. Mol. Biol. 2001;314:937–951. doi: 10.1006/jmbi.2001.5187. [DOI] [PubMed] [Google Scholar]
  • 97.Lanfear R, Frandsen PB, Wright AM, Senfeld T, Calcott B. PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses. Mol. Biol. Evol. 2017;34:772–773. doi: 10.1093/molbev/msw260. [DOI] [PubMed] [Google Scholar]
  • 98.Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–1574. doi: 10.1093/bioinformatics/btg180. [DOI] [PubMed] [Google Scholar]
  • 99.Stamatakis A, Ludwig T, Meier H. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics. 2005;21:456–463. doi: 10.1093/bioinformatics/bti191. [DOI] [PubMed] [Google Scholar]
  • 100.Yu GC, Smith DK, Zhu HC, Guan Y, Lam TTY. GGTREE: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 2017;8:28–36. [Google Scholar]
  • 101.Pila EA, Tarrabain M, Kabore AL, Hanington PC. A novel Toll-like receptor (TLR) influences compatibility between the gastropod Biomphalaria glabrata, and the Digenean Trematode Schistosoma mansoni. PLoS Pathog. 2016;12:e1005513. doi: 10.1371/journal.ppat.1005513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Kenny NJ, Truchado-Garcia M, Grande C. Deep, multi-stage transcriptome of the schistosomiasis vector Biomphalaria glabrata provides platform for understanding molluscan disease-related pathways. BMC Infect. Dis. 2016;16:618. doi: 10.1186/s12879-016-1944-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Buddenborg SK, et al. Transcriptomic responses of Biomphalaria pfeifferi to Schistosoma mansoni: Investigation of a neglected African snail that supports more S. mansoni transmission than any other snail species. PLoS Negl. Trop. Dis. 2017;11:e0005984. doi: 10.1371/journal.pntd.0005984. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Peer Review File (603.9KB, pdf)
41467_2022_28634_MOESM3_ESM.pdf (559.7KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (8MB, xlsx)
Supplementary Data 2 (24.6KB, xlsx)
Supplementary Data 3 (17MB, zip)
Reporting Summary (357.7KB, pdf)

Data Availability Statement

The nucleotide sequence data linked to the nuclear genome reported in this article is publicly available in the GenBank database and the Sequence Read Archive (SRA; https://www.ncbi.nlm.nih.gov/sra) under the accession numbers SAMN17050146, SAMN16898649 and SAMN16898648 with the NCBI BioProject accession number PRJNA680620. Protein sequences used for sequence homology searches are available from the Swiss-Prot (UniProtKB; https://www.uniprot.org/help/uniprotkb; accessed 20 December 2020)25, TrEMBL (UniProtKB; https://www.uniprot.org/help/uniprotkb; accessed 20 December 2020)25, Encyclopedia of Genes and Genomes (KEGG; https://www.genome.jp/kegg/; accessed 20 December 2020)79 and MEROPS release 12 (https://www.ebi.ac.uk/merops/)80 databases. All other data used are referred to in this article and its supplementary files.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES