Chromosome-Scale Genome Assembly of the Freshwater Snail Semisulcospira habei from the Lake Biwa Drainage System

Osamu Miura; Atsushi Toyoda; Tetsuya Sakurai

doi:10.1093/gbe/evad208

. 2023 Nov 28;15(11):evad208. doi: 10.1093/gbe/evad208

Chromosome-Scale Genome Assembly of the Freshwater Snail Semisulcospira habei from the Lake Biwa Drainage System

Osamu Miura ^1,^✉, Atsushi Toyoda ², Tetsuya Sakurai ³

Editor: John Wang

PMCID: PMC10683039 PMID: 38014863

Abstract

Semisulcospira habei is a freshwater snail species endemic to the Lake Biwa drainage and belongs to a species group radiated within the lake system. We report the chromosome-scale genome assembly of S. habei, including eight megascaffolds larger than 150 Mb. The genome assembly size is about 2.0 Gb with an N50 of 237 Mb. There are 41,547 protein-coding genes modeled by ab initio gene prediction based on the transcriptome data set, and the BUSCO completeness of the annotated genes was 92.2%. The repeat elements comprise approximately 76% of the genome assembly. The Hi-C contact map showed seven well-resolved scaffolds that correspond to the basic haploid chromosome number of S. habei inferred from the preceding karyotypic study, while it also exhibited one scaffold with a complicated mosaic pattern that is likely to represent the complex of multiple supernumerary chromosomes. The genome assembly reported here represents a high-quality genome resource in disentangling the genomic background of the adaptive radiation of Semisulcospira and also facilitates evolutionary studies in the superfamily Cerithioidea.

Keywords: Semisulcospira, Cerithioidea, chromosome-scale assembly, B chromosome

Significance.

There are fascinating examples of adaptive radiation in the ancient lakes. The Japanese ancient lake, Lake Biwa, harbors many endemic species, including a species flock of the gastropod genus Semisulcospira. The Semisulcospira snails in Lake Biwa can be a useful model system to study the mechanisms of species diversification, yet no high-quality genomic reference was available in this group. We report a chromosome-scale genome assembly of Semisulcospira habei, which will serve as an essential genomic resource for future research on the adaptive radiation of Semisulcospira and, more broadly, for evolutionary studies of the superfamily Cerithioidea.

Introduction

Lake Biwa is an ancient lake in central Japan with approximately 4 million years of history and harbors more than 60 endemic species and subspecies (Nishino 2012). The freshwater snails in the genus Semisulcospira in the Lake Biwa drainage system involve 18 species (Sawada and Fuke 2022, 2023), exhibiting the highest within-genus diversity among all Lake Biwa endemic taxa. Two genetically and morphologically distinct Semisulcospira groups exist in Lake Biwa: the Semisulcospira niponica group and the Semisulcospira nakasekoae group (Sawada and Fuke 2022). Each group involves nine species. Recent genome DNA studies based on the reduced representation sequencing technique demonstrated that these endemic Semisulcospira groups concurrently diversified in response to the enlargement of the lake about 0.4 million years ago (Miura et al. 2019). Their rapid species diversification within the limited geographical scale provides an ideal study system to evaluate the genetic background of species diversification, as shown in cichlid fishes in ancient African lakes (Malinsky et al. 2018; Nakamura et al. 2021) and finches in the Galapagos Islands (Lamichhaney et al. 2015, 2018).

Semisulcospira habei (fig. 1A) belongs to the S. niponica group and is the only living species that also appears in the fossil records from the Paleo-Lake Biwa deposits (Matsuoka 1987; Nishino and Watanabe 2000). Other living Semisulcospira species in Lake Biwa were absent from the fossil records, perhaps because of their recent diversification (Miura et al. 2019). This fossil evidence suggests that S. habei is a candidate stem species that had seeded the radiation of the S. niponica group. The high-quality draft genome assembly of S. habei should be an essential resource to disentangle ecological and evolutionary mechanisms of species radiation in the Semisulcospira snails, including the identification of genes that facilitate the rapid species diversification of the Semisulcospira snails in Lake Biwa.

Results and Discussion

The superfamily Cerithioidea includes more than 200 genera and approximately 1,100 species distributed worldwide across tropical, subtropical, and warm temperate regions and inhabits a variety of marine, brackish water, and freshwater habitats (Strong et al. 2011). Despite their ubiquitousness in aquatic habitats, only three genome assemblies were available in the INSD database, and two of them are of low genome coverage with less than 25% of BUSCO completeness (supplementary table S1, Supplementary Material online). Our genome assembly is the first chromosome-scale assembly in this large molluskan superfamily.

We used about 250 Gb PacBio long-read sequences, 240 Gb Illumina paired short-read sequences, and 160 Gb Hi-C library sequences to assemble the genome of S. habei (supplementary table S2, Supplementary Material online). We estimated the genome size of S. habei as 1.95 Gb based on the k-mer distribution and 1.89 Gb based on the back-mapping technique. The genome size of the final assembly was about 1.98 Gb, slightly larger than the estimates based on the short-read sequences. Repeated elements often affect short-read sequence-based estimations, resulting in smaller genome sizes (Heckenhauer et al. 2022; Pfenninger et al. 2022). This perhaps explains the discrepancy between the assembly size and genome size estimations.

The final assembly is composed of 7,743 contigs and 578 scaffolds (table 1), and approximately 98.9% of sequences were contained in the eight megascaffolds. The mapping rate of the Illumina short reads against the final assembly was 99.7%, and 99.5% of the reads were paired. We estimated the total repeat content of the S. habei genome was about 76% (supplementary table S3, Supplementary Material online), which is similar to other freshwater and terrestrial snails with large genome sizes (∼3 Gb genome size, repeat content: 71–77%; Schell et al. 2017; Guo et al. 2019; Saenko et al. 2021) but is much larger compared with the genome sequences of other freshwater snails (genome size <1 Gb, repeat content: 11–48%; Gomes-dos-Santos et al. 2019). Long interspersed nuclear elements (LINEs) dominated about 20% of the genome (supplementary table S3, Supplementary Material online). Long terminal repeat (LTR) retrotransposon was the second largest group, occupying about 10% of the genome. Nearly 30% of the repeat elements were not classified into known categories.

Table 1.

The Statistics for the Genome Assembly and Annotation for S. habei

Item	Category	Value
Assembly statistics	Assembled genome size	1,984,187,800
	Number of scaffolds	578
	Number of contigs	7,743
	N50	236,511,961
	N90	178,179,616
	GC content (%)	45.0
	BUSCO completeness (%)	94.4
	Single copy (%)	93.1
	Duplicated (%)	1.3
	Fragmented (%)	3.1
	Missing (%)	2.5
Protein-coding genes	Number of genes	41,547
	Number of transcripts	45,269
	Number of annotated transcripts	32,409
	Uniprot_Swiss-prot	14,367
	Uniprot_TrEMBL	28,904
	RefSeq (invertebrates)	28,789
	InterPro	30,114
	EggNOG	29,214
	With GO terms	24,676
	With KEGG pathways	8,983
	BUSCO completeness (%)	92.2
	Single copy (%)	89.0
	Duplicated (%)	3.2
	Fragmented (%)	4.0
	Missing (%)	3.8

Open in a new tab

The BUSCO completeness was estimated using the metazoan core gene database.

We annotated the genome by ab initio gene prediction based on 11 Gb of S. habei transcriptome sequences (supplementary table S2, Supplementary Material online). The final assembly contains 41,547 protein-coding genes (45,269 transcripts; table 1). The number of genes modeled in our assembly is larger than that of other freshwater snail genomes (14,000–24,000 genes), while this is comparable to that of some marine and terrestrial snails (Masonbrink et al. 2019; Saenko et al. 2021; Patra et al. 2023). The mean coding sequence length is 1,400 bp, and about 82% of genes have multiple exons (6.9 exons on average). Of the protein-coding genes modeled in our assembly, 32,409 genes were successfully annotated in one or more of the following databases: SwissProt, TrEMBL, EggNOG, InterPro, and RefSeq Invertebrate databases (table 1), and 24,676 genes possess gene ontology (GO) terms in the EggNOG database. The BUSCO analyses detected 94.4% of the metazoan core genes in the genome assembly and 92.2% in the annotated genes (table 1), suggesting the assembly covers the majority of genes present in the S. habei genome sequence. The result of the BlobTools analysis demonstrated the contamination from bacteria was negligible (supplementary fig. S1, Supplementary Material online).

Burch (1968) inferred that the basic haploid chromosome number of S. habei is seven (2N = 14) based on his karyotypic observations. He further reported three to six supernumerary chromosomes (B chromosomes) with variable chromosome lengths in S. habei. The sizes of these supernumerary chromosomes were comparable to those of the other chromosomes in S. habei. The Hi-C contact map demonstrates that there are seven well-resolved scaffolds and one scaffold with a complicated mosaic pattern (chr4 in fig. 1B). We consider that the seven well-resolved scaffolds correspond to the seven basic chromosomes. Because the supernumerary chromosomes often experience large structural rearrangements (Endo et al. 2008), we postulate that one scaffold with a mosaic pattern represents the complex of multiple supernumerary chromosomes. We ensured that the observed mosaicism was not the artifact of the long-read–based scaffolding since we got a similar mosaic pattern without the long-read–based scaffolding process. The supernumerary chromosomes are often not functional and contain no essential genes (Camacho et al. 2000). However, in our assembly, the amount and distribution pattern of coding genes and repeat elements are comparable to the other chromosomes (fig. 1C), suggesting chr4 is functional and perhaps essential for the survival of S. habei. Future genomic studies in combination with cytological techniques are required to evaluate the supernumerary status and the detailed genomic function of chr4.

Despite the large and repetitive nature of the S. habei genome, the assembly reported here is of high quality and will be an essential genomic resource for the evolutionary studies on the cerithioidean snails. In particular, it will substantially contribute to understanding the genomic background of the adaptive radiation of Semisulcospira in the ancient Lake Biwa.

Materials and Methods

Study Sample, DNA Extraction, and Quality Evaluation

We collected a male S. habei from Uji River, the main outlet of Lake Biwa. Semisulcospira habei is characterized by grid-like nodes on the shell surface and the elongated conical shell outlines with fewer number of basal cords on the body whorl (Davis 1969). High-molecular-weight genomic DNA (HMW-gDNA) was extracted from fresh snail tissue using cetyltrimethylammonium bromide (CTAB) and NucleoBond HMW DNA (Macherey-Nagel, Germany). In brief, about 110 mg of foot tissue was cut into small pieces (<0.5 mm²) and digested in 3 ml of 2× CTAB solution with 200 µl of Proteinase K solution at 50 °C for about 2 h. HMW-gDNA was washed once with 3 ml of chloroform, and the water phase containing HMW-gDNA was transferred to the NucleoBond HWM DNA column and cleaned up following manufacturer's protocol. In the final process of DNA extraction, we added 3.5 ml of isopropanol into 5 ml of Buffer H5 to precipitate DNA. The HMW-gDNA was briefly washed with 70% ethanol, air-dried for approximately 10 min, and dissolved in Tris-HCl buffer. The purity and concentration of the extracted HMW-gDNA were evaluated using BioSpec-nano spectrometer (Shimadzu, Japan) and Qubit fluorometer (Thermo Fisher Scientific, United States). The approximate length of the extracted DNA fragments was evaluated using a pulse-field agarose gel electrophoresis.

Whole Genome Sequencing

We sequenced the genome of S. habei using two runs of the PacBio Sequel II sequencer. The PacBio library preparation for the continuous long reads (CLRs) and sequencing were performed with the SMRTbell Express Template Prep Kit 2.0 and the Sequel II Binding Kit 2.0/Sequencing Kit 2.0 at the National Institute of Genetics Japan (NIG). We also obtained the short-read sequences using the Illumina NovaSeq 6000 sequencer. The 150 bp paired-end sequencing was performed using the TruSeq DNA PCR-Free Library Prep Kit and the NovaSeq 6000 SP Reagent Kit v1.5 at NIG. We conducted the quality filtering of the short-read sequences using fastp (Chen et al. 2018) by eliminating the low-quality bases with a quality score less than Q30.

Estimation of Genome Size

We used two short-read sequence-based approaches to estimate the genome size of S. habei. First, we estimated the genome size using the k-mer distribution. We used KMC (Kokot et al. 2017) to count canonical 21-mers from the short-read sequences and produced k-mer count histogram with a max coverage threshold of 1 million reads. The obtained histogram was analyzed by GenomeScope (Vurture et al. 2017) to estimate the genome size and average heterozygosity. Second, we used the back-mapping method, which estimates the genome size by dividing the total sum of the short-read sequences by the peak coverage from mapping back the final assembly (Schell et al. 2017). We used a Perl wrapper script (backmap.pl) of ModEst (Pfenninger et al. 2022) to execute the back-mapping process.

De Novo Genome Assembly

The genome assembly of S. habei was constructed based on the PacBio long-read data set using Flye v. 2.9 (Kolmogorov et al. 2019) with genome size set to 2 Gb and assembly coverage of 100. Duplicated contigs were removed from the assembly using the long-read sequences by Purge_haplotigs v. 1.1.2 (Roach et al. 2018) and Redundans v. 0.14a (Pryszcz and Gabaldón 2016). The resultant assembly was used as input for further scaffolding using the long-read sequences by Longstitch v. 1.0.4 (Coombe et al. 2021). The assembly was then polished using the short-read sequences by Pilon v. 1.23. Finally, the paired-end RNA-seq data set was used for scaffolding the assembly using P_RNA_scaffolder (Zhu et al. 2018).

Hi-C Scaffolding

We obtained the additional sequences using the Hi-C technology to achieve the chromosomal-scale assembly. To construct the Hi-C library, about 20 mg of columellar muscle tissue from the same individual was used for the genome sequencing. The Hi-C library was made using an Arima-HiC+ kit (Arima Genomics, United States) and sequenced using NovaSeq 6000. The library construction and sequencing were performed at NIG. The Hi-C data set was processed using Juicer (Durand et al. 2016b) followed by 3D-DNA v. 180114 (Dudchenko et al. 2017) to scaffold the genome assembly at the chromosomal scale. Juicebox v. 1.11.08 (Durand et al. 2016a) was used to manually review the assembly errors. We eliminated the scaffolds with more than 50% of ambiguous bases (or assembly gaps) and without any protein-coding genes, and the resulting 578 scaffolds were defined as the final assembly.

Genome Annotation

We used RepeatModeler v. 2.0.3 (Flynn et al. 2020) to construct a species-specific library of transposable elements and repeats for S. habei. The obtained species-specific model was combined with the known repeat library of Mollusca from the RepeatMasker database. This combined library was used to detect and soft masked the repeat elements using RepeatMasker v. 4.1.4 (http://www.repeatmasker.org).

We performed the transcriptome sequencing using the tissues from the same individual to make the transcriptome database for S. habei (table 1). The head and middle part, including foot and mantle tissues, and the other part, including gonadal and hepatopancreas tissues, were fixed separately using RNAlater (Qiagen, United States). These two parts were used for the transcriptome library constructions and sequencing. The transcriptome sequencing was performed at Macrogen Japan. The RNA-seq data set was aligned to the final assembly using Hisat2 v. 2.2.1 (Kim et al. 2019). We used Braker2 v. 2.1.6 (Brůna et al. 2021) to achieve ab initio gene prediction with the aid of Augustus (Stanke et al. 2008) and GeneMark (Brůna et al. 2020). Untranslated regions were also detected using Gushr (Hoff and Stanke 2019). Protein sequences of the predicted genes were constructed using Gffread v. 0.12.7 (Pertea and Pertea 2020). We then performed functional annotations of the predicted genes using the EnTAP v. 0.10.8 pipeline (Hart et al. 2020) based on SwissProt, TrEMBL, EggNOG, InterPro, and RefSeq Invertebrate databases. We evaluated gene content completeness of the final assembly and annotation using BUSCO v.5 by counting the presence of the metazoan core genes (Manni et al. 2021). Finally, BlobTools v.1.1.1 (Laetsch and Blaxter 2017) was used to assess potential bacterial contaminations based on the NCBI nonredundant nucleotide database and the UniProt reference proteome database.

Supplementary Material

evad208_Supplementary_Data

Click here for additional data file.^{(714.5KB, zip)}

Acknowledgments

We thank Prof. Y. Sakakibara and V. Jayakumar for their valuable advice in constructing the genome assembly. We also thank T. Saito and K. Morita for their assistance in the sampling of S. habei. This work was supported by JSPS KAKENHI Grant Numbers 16H06279 (PAGS), 20K06788, and 23H02540.

Contributor Information

Osamu Miura, Faculty of Agriculture and Marine Science, Kochi University, Nankoku, Kochi, Japan.

Atsushi Toyoda, Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Shizuoka, Japan.

Tetsuya Sakurai, Faculty of Agriculture and Marine Science, Kochi University, Nankoku, Kochi, Japan.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).

Data Availability

The whole genome sequences of S. habei were deposited to DDBJ BioProject accession number PRJDB16287. The accession number of the PacBio long-read sequences is DRR494364, and that of the Illumina short-read sequences is DRR494361. The sequences of the Hi-C library were deposited at DDBJ under accession number DRR494363. The transcriptome short-read sequences were deposited under accession number DRR494362. The final chromosome-scale assembly was deposited at DDBJ under accession number BTPG01000001-BTPG01000578. The genome and genome annotation results have also been deposited in Figshare under the DOI: 10.6084/m9.figshare.24180666.

Literature Cited

Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. 2021. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform. 3:lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brůna T, Lomsadze A, Borodovsky M. 2020. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom Bioinform. 2:lqaa026. [DOI] [PMC free article] [PubMed] [Google Scholar]
Burch J. 1968. Cytotaxonomy of some Japanese Semisulcospira (Streptoneura: Pleuroceridae). J Conchyliol. 107:3–51. [Google Scholar]
Camacho JPM, Sharbel TF, Beukeboom LW. 2000. B-chromosome evolution. Philos Trans R Soc Lond Ser B Biol Sci. 355:163–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen S, Zhou Y, Chen Y, Gu J. 2018. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890. [DOI] [PMC free article] [PubMed] [Google Scholar]
Coombe L, et al. 2021. Longstitch: high-quality genome assembly correction and scaffolding using long reads. BMC Bioinformatics 22:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
Davis G. 1969. A taxonomic study of some species of Semisulcospira in Japan (Mesogastropoda: Pleuroceridae). Malacologia 7:211–294. [Google Scholar]
Dudchenko O, et al. 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356:92–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
Durand NC, et al. 2016a. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3:99–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
Durand NC, et al. 2016b. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3:95–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
Endo TR, et al. 2008. Dissection of rye B chromosomes, and nondisjunction properties of the dissected segments in a common wheat background. Genes Genet Syst. 83:23–30. [DOI] [PubMed] [Google Scholar]
Flynn JM, et al. 2020. Repeatmodeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA. 117:9451–9457. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gomes-dos-Santos A, Lopes-Lima M, Castro LFC, Froufe E. 2019. Molluscan genomics: the road so far and the way forward. Hydrobiologia 847:1705–1726. [Google Scholar]
Guo Y, et al. 2019. A chromosomal-level genome assembly for the giant African snail Achatina fulica. Gigascience 8:giz124. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hart AJ, et al. 2020. EnTAP: bringing faster and smarter functional annotation to non-model eukaryotic transcriptomes. Mol Ecol Resour. 20:591–604. [DOI] [PubMed] [Google Scholar]
Heckenhauer J, et al. 2022. Genome size evolution in the diverse insect order Trichoptera. GigaScience 11:giac011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hoff KJ, Stanke M. 2019. Predicting genes in single genomes with AUGUSTUS. Curr Protoc Bioinform. 65:e57. [DOI] [PubMed] [Google Scholar]
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 37:907–915. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kokot M, Długosz M, Deorowicz S. 2017. KMC 3: counting and manipulating k-mer statistics. Bioinformatics 33:2759–2761. [DOI] [PubMed] [Google Scholar]
Kolmogorov M, Yuan J, Lin Y, Pevzner PA. 2019. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 37:540–546. [DOI] [PubMed] [Google Scholar]
Laetsch DR, Blaxter ML. 2017. Blobtools: interrogation of genome assemblies. F1000Res. 6:1287. [Google Scholar]
Lamichhaney S, et al. 2015. Evolution of Darwin's finches and their beaks revealed by genome sequencing. Nature 518:371–375. [DOI] [PubMed] [Google Scholar]
Lamichhaney S, et al. 2018. Rapid hybrid speciation in Darwin’s finches. Science 359:224–228. [DOI] [PubMed] [Google Scholar]
Malinsky M, et al. 2018. Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow. Nat Ecol Evol. 2:1940–1955. [DOI] [PMC free article] [PubMed] [Google Scholar]
Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. 2021. BUSCO Update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 38:4647–4654. [DOI] [PMC free article] [PubMed] [Google Scholar]
Masonbrink RE, et al. 2019. An annotated genome for Haliotis rufescens (red abalone) and resequenced green, pink, pinto, black, and white abalone species. Genome Biol Evol. 11:431–438. [DOI] [PMC free article] [PubMed] [Google Scholar]
Matsuoka K. 1987. Malacofaunal succession in Pliocene to Pleistocene non-marine sediments in the Omi and Ueno basins, central Japan. J Earth Sci Nagoya Univ. 35:23–115. [Google Scholar]
Miura O, Urabe M, Nishimura T, Nakai K, Chiba S. 2019. Recent lake expansion triggered the adaptive radiation of freshwater snails in the ancient Lake Biwa. Evol Lett. 3:43–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nakamura H, et al. 2021. Genomic signatures for species-specific adaptation in Lake Victoria cichlids derived from large-scale standing genetic variation. Mol Biol Evol. 38:3111–3125. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nishino M. 2012. Biodiversity of Lake Biwa. In: Kawanabe HNishino M and Maehata M, editors. Lake Biwa: interactions between nature and people. Dordrecht, Heidelberg, New York and London: Springer. p. 31–35. [Google Scholar]
Nishino M, Watanabe N. 2000. Evolution and endemism in Lake Biwa, with special reference to its gastropod mollusc fauna. Adv Ecol Res. 31:151–180. [Google Scholar]
Patra AK, et al. 2023. Genome assembly of the Korean intertidal mud-creeper Batillaria attramentaria. Sci Data. 10:498. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pertea G, Pertea M. 2020. GFF utilities: GffRead and GffCompare. F1000Res. 9:304. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pfenninger M, Schönnenbeck P, Schell T. 2022. Modest: accurate estimation of genome size from next generation sequencing data. Mol Ecol Resour. 22:1454–1464. [DOI] [PubMed] [Google Scholar]
Pryszcz LP, Gabaldón T. 2016. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 44:e113. [DOI] [PMC free article] [PubMed] [Google Scholar]
Roach MJ, Schmidt SA, Borneman AR. 2018. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Saenko SV, Groenenberg DS, Davison A, Schilthuizen M. 2021. The draft genome sequence of the grove snail Cepaea nemoralis. G3 (Bethesda) 11:jkaa071. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sawada N, Fuke Y. 2022. Systematic revision of the Japanese freshwater snail Semisulcospira decipiens (Mollusca: Semisulcospiridae): implications for diversification in the ancient Lake Biwa. Invertebr Syst. 36:1139–1177. [Google Scholar]
Sawada N, Fuke Y. 2023. Diversification in ancient Lake Biwa: integrative taxonomy reveals overlooked species diversity of the Japanese freshwater snail genus Semisulcospira (Mollusca: Semisulcospiridae). Contrib Zool. 92:1–37. [Google Scholar]
Schell T, et al. 2017. An annotated draft genome for Radix auricularia (Gastropoda, Mollusca). Genome Biol Evol. 9:585–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stanke M, Diekhans M, Baertsch R, Haussler D. 2008. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24:637–644. [DOI] [PubMed] [Google Scholar]
Strong EE, et al. 2011. Phylogeny of the gastropod superfamily Cerithioidea using morphology and molecules. Zool J Linn Soc. 162:43–89. [Google Scholar]
Vurture GW, et al. 2017. Genomescope: fast reference-free genome profiling from short reads. Bioinformatics 33:2202–2204. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu B-H, et al. 2018. P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads. BMC Genomics 19:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

evad208_Supplementary_Data

Click here for additional data file.^{(714.5KB, zip)}

Data Availability Statement

[evad208-B1] Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. 2021. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform. 3:lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B2] Brůna T, Lomsadze A, Borodovsky M. 2020. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom Bioinform. 2:lqaa026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B3] Burch J. 1968. Cytotaxonomy of some Japanese Semisulcospira (Streptoneura: Pleuroceridae). J Conchyliol. 107:3–51. [Google Scholar]

[evad208-B4] Camacho JPM, Sharbel TF, Beukeboom LW. 2000. B-chromosome evolution. Philos Trans R Soc Lond Ser B Biol Sci. 355:163–178. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B5] Chen S, Zhou Y, Chen Y, Gu J. 2018. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B6] Coombe L, et al. 2021. Longstitch: high-quality genome assembly correction and scaffolding using long reads. BMC Bioinformatics 22:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B7] Davis G. 1969. A taxonomic study of some species of Semisulcospira in Japan (Mesogastropoda: Pleuroceridae). Malacologia 7:211–294. [Google Scholar]

[evad208-B8] Dudchenko O, et al. 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356:92–95. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B9] Durand NC, et al. 2016a. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3:99–101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B10] Durand NC, et al. 2016b. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3:95–98. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B11] Endo TR, et al. 2008. Dissection of rye B chromosomes, and nondisjunction properties of the dissected segments in a common wheat background. Genes Genet Syst. 83:23–30. [DOI] [PubMed] [Google Scholar]

[evad208-B12] Flynn JM, et al. 2020. Repeatmodeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA. 117:9451–9457. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B13] Gomes-dos-Santos A, Lopes-Lima M, Castro LFC, Froufe E. 2019. Molluscan genomics: the road so far and the way forward. Hydrobiologia 847:1705–1726. [Google Scholar]

[evad208-B14] Guo Y, et al. 2019. A chromosomal-level genome assembly for the giant African snail Achatina fulica. Gigascience 8:giz124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B15] Hart AJ, et al. 2020. EnTAP: bringing faster and smarter functional annotation to non-model eukaryotic transcriptomes. Mol Ecol Resour. 20:591–604. [DOI] [PubMed] [Google Scholar]

[evad208-B16] Heckenhauer J, et al. 2022. Genome size evolution in the diverse insect order Trichoptera. GigaScience 11:giac011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B17] Hoff KJ, Stanke M. 2019. Predicting genes in single genomes with AUGUSTUS. Curr Protoc Bioinform. 65:e57. [DOI] [PubMed] [Google Scholar]

[evad208-B18] Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 37:907–915. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B19] Kokot M, Długosz M, Deorowicz S. 2017. KMC 3: counting and manipulating k-mer statistics. Bioinformatics 33:2759–2761. [DOI] [PubMed] [Google Scholar]

[evad208-B20] Kolmogorov M, Yuan J, Lin Y, Pevzner PA. 2019. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 37:540–546. [DOI] [PubMed] [Google Scholar]

[evad208-B21] Laetsch DR, Blaxter ML. 2017. Blobtools: interrogation of genome assemblies. F1000Res. 6:1287. [Google Scholar]

[evad208-B22] Lamichhaney S, et al. 2015. Evolution of Darwin's finches and their beaks revealed by genome sequencing. Nature 518:371–375. [DOI] [PubMed] [Google Scholar]

[evad208-B23] Lamichhaney S, et al. 2018. Rapid hybrid speciation in Darwin’s finches. Science 359:224–228. [DOI] [PubMed] [Google Scholar]

[evad208-B24] Malinsky M, et al. 2018. Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow. Nat Ecol Evol. 2:1940–1955. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B25] Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. 2021. BUSCO Update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 38:4647–4654. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B26] Masonbrink RE, et al. 2019. An annotated genome for Haliotis rufescens (red abalone) and resequenced green, pink, pinto, black, and white abalone species. Genome Biol Evol. 11:431–438. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B27] Matsuoka K. 1987. Malacofaunal succession in Pliocene to Pleistocene non-marine sediments in the Omi and Ueno basins, central Japan. J Earth Sci Nagoya Univ. 35:23–115. [Google Scholar]

[evad208-B28] Miura O, Urabe M, Nishimura T, Nakai K, Chiba S. 2019. Recent lake expansion triggered the adaptive radiation of freshwater snails in the ancient Lake Biwa. Evol Lett. 3:43–54. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B29] Nakamura H, et al. 2021. Genomic signatures for species-specific adaptation in Lake Victoria cichlids derived from large-scale standing genetic variation. Mol Biol Evol. 38:3111–3125. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B30] Nishino M. 2012. Biodiversity of Lake Biwa. In: Kawanabe HNishino M and Maehata M, editors. Lake Biwa: interactions between nature and people. Dordrecht, Heidelberg, New York and London: Springer. p. 31–35. [Google Scholar]

[evad208-B31] Nishino M, Watanabe N. 2000. Evolution and endemism in Lake Biwa, with special reference to its gastropod mollusc fauna. Adv Ecol Res. 31:151–180. [Google Scholar]

[evad208-B32] Patra AK, et al. 2023. Genome assembly of the Korean intertidal mud-creeper Batillaria attramentaria. Sci Data. 10:498. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B33] Pertea G, Pertea M. 2020. GFF utilities: GffRead and GffCompare. F1000Res. 9:304. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B34] Pfenninger M, Schönnenbeck P, Schell T. 2022. Modest: accurate estimation of genome size from next generation sequencing data. Mol Ecol Resour. 22:1454–1464. [DOI] [PubMed] [Google Scholar]

[evad208-B35] Pryszcz LP, Gabaldón T. 2016. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 44:e113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B36] Roach MJ, Schmidt SA, Borneman AR. 2018. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B37] Saenko SV, Groenenberg DS, Davison A, Schilthuizen M. 2021. The draft genome sequence of the grove snail Cepaea nemoralis. G3 (Bethesda) 11:jkaa071. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B38] Sawada N, Fuke Y. 2022. Systematic revision of the Japanese freshwater snail Semisulcospira decipiens (Mollusca: Semisulcospiridae): implications for diversification in the ancient Lake Biwa. Invertebr Syst. 36:1139–1177. [Google Scholar]

[evad208-B39] Sawada N, Fuke Y. 2023. Diversification in ancient Lake Biwa: integrative taxonomy reveals overlooked species diversity of the Japanese freshwater snail genus Semisulcospira (Mollusca: Semisulcospiridae). Contrib Zool. 92:1–37. [Google Scholar]

[evad208-B40] Schell T, et al. 2017. An annotated draft genome for Radix auricularia (Gastropoda, Mollusca). Genome Biol Evol. 9:585–592. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B41] Stanke M, Diekhans M, Baertsch R, Haussler D. 2008. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24:637–644. [DOI] [PubMed] [Google Scholar]

[evad208-B42] Strong EE, et al. 2011. Phylogeny of the gastropod superfamily Cerithioidea using morphology and molecules. Zool J Linn Soc. 162:43–89. [Google Scholar]

[evad208-B43] Vurture GW, et al. 2017. Genomescope: fast reference-free genome profiling from short reads. Bioinformatics 33:2202–2204. [DOI] [PMC free article] [PubMed] [Google Scholar]

[evad208-B44] Zhu B-H, et al. 2018. P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads. BMC Genomics 19:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Chromosome-Scale Genome Assembly of the Freshwater Snail Semisulcospira habei from the Lake Biwa Drainage System

Osamu Miura

Atsushi Toyoda

Tetsuya Sakurai

Roles

Abstract

Significance.

Introduction

Fig. 1.

Results and Discussion

Table 1.

Materials and Methods

Study Sample, DNA Extraction, and Quality Evaluation

Whole Genome Sequencing

Estimation of Genome Size

De Novo Genome Assembly

Hi-C Scaffolding

Genome Annotation

Supplementary Material

Acknowledgments

Contributor Information

Supplementary Material

Data Availability

Literature Cited

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Chromosome-Scale Genome Assembly of the Freshwater Snail Semisulcospira habei from the Lake Biwa Drainage System

Osamu Miura

Atsushi Toyoda

Tetsuya Sakurai

Roles

Abstract

Significance.

Introduction

Fig. 1.

Results and Discussion

Table 1.

Materials and Methods

Study Sample, DNA Extraction, and Quality Evaluation

Whole Genome Sequencing

Estimation of Genome Size

De Novo Genome Assembly

Hi-C Scaffolding

Genome Annotation

Supplementary Material

Acknowledgments

Contributor Information

Supplementary Material

Data Availability

Literature Cited

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases