Abstract
Since historical times, the inherent human fascination with pearls turned the freshwater pearl mussel Margaritifera margaritifera (Linnaeus, 1758) into a highly valuable cultural and economic resource. Although pearl harvesting in M. margaritifera is nowadays residual, other human threats have aggravated the species conservation status, especially in Europe. This mussel presents a myriad of rare biological features, e.g. high longevity coupled with low senescence and Doubly Uniparental Inheritance of mitochondrial DNA, for which the underlying molecular mechanisms are poorly known. Here, the first draft genome assembly of M. margaritifera was produced using a combination of Illumina Paired-end and Mate-pair approaches. The genome assembly was 2.4 Gb long, possessing 105,185 scaffolds and a scaffold N50 length of 288,726 bp. The ab initio gene prediction allowed the identification of 35,119 protein-coding genes. This genome represents an essential resource for studying this species’ unique biological and evolutionary features and ultimately will help to develop new tools to promote its conservation.
Keywords: Margaritifera margaritifera, freshwater mussel, pearls, unionida genome, whole genome
1. Introduction
Pearls are fascinating organic gemstones that have populated the human beauty imaginary for millennia. Legend says that Cleopatra, to display her wealth to her lover Marc Antony, dissolved a pearl in a glass of vinegar and drank it. The human use of pearls or their shell precursor material, nacre, is ancient. The earliest known use of decorative nacre dates to 4200 BC in Egypt, with pearls themselves only becoming popular around 600 BC. Before the arrival of marine pearls to Europe, most were harvested from a common and widespread freshwater bivalve, the freshwater pearl mussel Margaritifera margaritifera L. 1758 (Fig. 1), where generally one pearl is found per 3,000 mussels leading to massive mortality.1 During the Roman Empire period, pearls were a desirable luxury, so that it is believed that one of the reasons that persuaded Julius Caesar to invade Britain was to access its vast freshwater pearl resources.2M.margaritifera freshwater pearls were extremely valuable being included in many royal family jewels, such as the British, Scottish, Swedish, Austrian, and German crown jewels and even in the Russian city’s coat of arms.2–5 Although over-harvesting represented a serious threat to the species for centuries, there has been a decrease in interest and demand for freshwater pearls in the 20th century.4 However, the global industrialization process introduced stronger threats to the survival of the species.6–8 In fact, M. margaritifera belongs to one of the most threatened taxonomic groups on earth, the Margaritiferidae.6 The species was once abundant in cool oligotrophic waters throughout most of northwest Europe and northeast North America.6–8 However, habitat degradation, fragmentation, and pollution have resulted in massive population declines.8 Consequently, the Red List of Threatened Species from the International Union for Conservation of Nature has classified M. margaritifera as Endangered globally and Critically Endangered in Europe.7,9 Besides being able to produce pearls, M. margaritifera presents many other remarkable biological characteristics, e.g. is among the most longest-living invertebrates, reaching up to 280 years6,10; displays very weak signs of senescence, referred as the concept of ‘negligible senescence’;11 has an obligatory parasitic larval stage on salmonid fishes used for nurturing and dispersion;8,12 and, like many other bivalves (see Gusman et al.13 for a recent enumeration), shows an unusual mitochondrial DNA inheritance system, called Doubly Uniparental Inheritance or DUI.14,15 Although these biological features are well described, the molecular mechanisms underlying their regulation and functioning are poorly studied and practically unknown. Thus, a complete genome assembly for M. margaritifera is critical for developing the molecular resources required to improve our knowledge of such mechanisms.
To date, several Mollusca genomes are currently available and new assemblies are released every year at an increasing trend (reviewed in Refs16–18) Despite this, to date, only three Unionida mussel genomes have been published, Venustaconcha ellipsiformis (Conrad, 1836),19Megalonaias nervosa (Rafinesque, 1820),20 and Potamilus streckersoni (Smith, Johnson, Inoue, Doyle & Randklev, 2019).21 Therefore, considering the importance of increasing the availability of genomic resources for Unionida, this study presents the first draft genome assembly of the freshwater pearl mussel M. margaritifera. The assembled genome has a total length of 2.4 Gb, a scaffold N50 length of 288,726 bp and 35,119 protein-coding genes were predicted. A Bivalvia phylogeny using whole-genome single copy orthologs was also constructed and the Hox and ParaHox gene complement within Unionida order was here characterized for the first time.
2. Materials and methods
2.1. Sample collection, DNA extraction, and sequencing
One female M. margaritifera (Linnaeus, 1758) specimen was collected from the River Tua, Douro basin in the North of Portugal (permit 284/2020/CAPT and fishing permit 26/20 issued by ICNF—Instituto de Conservação da Natureza e das Florestas). The whole individual is stored in 96% ethanol at the Unionoid DNA and Tissue Databank, CIIMAR, University of Porto. Genomic DNA (gDNA) was extracted from the foot tissue using DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions.
Two distinct NGS libraries and sequencing approaches were implemented i.e. Illumina Paired-end reads (PE) and Illumina long insert size Mate-pair reads (MP). Illumina PE library preparation with standard Illumina adaptors used 100 ng of gDNA sheared to a length of 300–400 bp and was sequenced in an Illumina machine NovaSEQ6000 system located at Deakin Genomics Centre using a run configuration of 2 × 150 bp. Illumina MP library preparation and sequencing were performed by Macrogen Inc., Korea, where a 10 kb insert size Nextera Mate Pair Library was constructed and subsequently sequenced in a NovaSeq6000 S4 using a run configuration of 2 ×150 bp.
2.2. Genome size and heterozygosity estimation
The overall characteristics of the genome were accessed using PE reads. Reads quality was evaluated using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and raw reads were quality trimmed with Trim Galore v.0.4.0 (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), allowing the trimming of adapter sequences and removal of low-quality reads using Cutadapt.22 Clean reads were used for genome size estimation with Jellyfish v.2.2.10 and GenomeScope223,24 using k-mers lengths of 25 and 31.
2.3. Genome assembly and quality assessment
Long range Illumina MP quality processing was as described above and both PE and MP cleaned reads were used for whole-genome assembly. The assembly was produced by running Meraculous v.2.2.6 with several distinct k-mer sizes (meraculoususing).25 This allowed determining the optimal k-mer size of 101. Genome assembly metrics were estimated using QUAST v5.0.2.26 Assembly completeness, heterozygosity, and collapsing of repetitive regions were evaluated through analysis of k-mer distribution using PE reads, with K-mer Analysis Toolkit.27 Furthermore, PE reads were aligned to the genome assembly using BBMap.28 BUSCO v. 3.0.229 was used to provide a quantitative measure of the assembly completeness, with a curated list of eukaryotic (n = 303) and metazoan (n = 978) near-universal single-copy orthologous. Finally, in order to inspect the genome for possible contamination, we used BlobTools30 (Additional File 1).
The whole mitochondrial genome was assembled using the PE reads with MitoBim v.1.9.031 and its annotation performed using MITOS232 web server and manually validated against other Margaritiferidae mitogenomes.
2.4. Repeat sequences, gene models predictions, and transcriptome alignment
Given the generally high composition of repetitive elements in Mollusca genomes (e.g. Ref.16) they should be identified and masked before proceeding to genome annotation. A de novo library of repetitive elements was created for M. margaritifera genome assembly, using RepeatModeler v.2.0.133 (excluding sequences <2.5 kb). Soft masking of the genome was performed with RepeatMasker v.4.0.734 combining the de novo library with the ‘Bivalvia’ libraries from Dfam_consensus-20170127 and RepBase-20181026.
BRAKER2 pipeline v2.1.535,36 was used for gene prediction in the genome. First, all RNA-seq data of M. margaritifera37,38 available on GenBank were downloaded, assessed with FastQC v.0.11.8, quality trimmed with Trimmomatic v.0.3839 (Parameters, LEADING: 5 TRAILING: 5 SLIDINGWINDOW: 4:20 MINLEN: 36) and error corrected with Rcorrector v.1.0.3.40 Afterwards, the RNA-seq data were aligned to the masked genome assembly, using Hisat2 v.2.2.0 with the default settings.41 The complete proteomes of 13 mollusc species, one Chordata (Ciona intestinalis) and one Echinodermata (Strongylocentrotus purpuratus) were downloaded from distinct public databases (Supplementary Table S1) and used as additional evidence for gene prediction. The BRAKER2 pipeline was applied with the parameters (–etpmode; –softmasking; –UTR=off; –crf; –cores =30) and following the authors’ instructions.35,36 The resulting gene predictions (i.e. gff3 file) were renamed, cleaned, and filtered using AGAT v.0.4.0,42 correcting coordinates of overlapping gene prediction, removing predicted coding sequence regions (CDS) with <100 amino acid (in order to avoid a high rate of false-positive predictions) and removing incomplete gene predictions (i.e. without start and/or stop codons). Functional annotation was conducted by searching for protein domain information using InterProScan v.5.44.8043 and protein blast search using DIAMOND v. 0.9.3244 against SwissProt (Download at 2/07/2020), TREMBL (Download at 2/07/2020), and RefSeq-NCBI (Download at 3/07/2020).45,46 BUSCO v. 3.0.229 scores for the predicted proteins were accessed using the eukaryotic (n = 255) and metazoan (n = 954) curated lists of near-universal single-copy orthologous.
Finally, the M. margaritifera transcriptome assembly from Bertucci et al.37 downloaded from NCBI (BioProject: PRJNA369722) was aligned to the masked genome with pblat_v2.5,47 specifying the option ‘-fine -q=rna’ while maintaining the remaining parameters as default. Alignment stats were calculated with isoblat_v0.3148 using default parameters.
2.5. Phylogenetic analyses
For the phylogenetic assessment, the proteomes of 12 molluscan species were downloaded from distinct public databases (Supplementary Table S2), which included 11 Autobranchia bivalves and 2 outgroup species, i.e. the Cephalopoda Octopus bimaculoides and Gastropoda Biomphalaria glabrata (Fig. 3). Single-copy orthologous between these 12 species and M. margaritifera were retrieved using OrthoFinder v2.4.0,49 specifying multiple sequence alignment as the method of gene tree inference (-M). The resulting 118 single-copy orthologous sequences were individually aligned using MUSCLE v3.8.31,50 with default parameters and subsequently trimmed with TrimAl v.1.251 specifying a gap threshold of 0.5 (-gt). Trimmed sequences were then concatenated using FASconCAT-G (https://github.com/PatrickKueck/FASconCAT-G). The best molecular evolutionary model was estimated using ProTest v.3.4.1.52 Phylogenetic inferences were conducted in IQ-Tree v.1.6.1253 for Maximum-Likelihood analyses (with initial tree searches followed by 10 independent runs and 10,000 ultra-bootstrap replicates) and MrBayes v.3.2.654 for Bayesian Inference (2 independent runs, 1,000,000 generations, sampling frequency of 1 tree per 1,000 generations). All phylogenetic analyses were applied using the substitution model LG+I + G.
2.6. Hox and ParaHox gene identification and phylogeny
To identify the repertoire Hox and ParaHox genes in M. margaritifera, a similarity search by BLASTn55 of the CDS of M. margaritifera genome, was conducted using the annotated homeobox gene set of Crassostrea gigas.56,57 Candidate CDSs were further validated for the presence of the homeodomain by CD-Search.58 Finally, each putative CDS identity was verified by BLASTx and BLASTp55 searches in Nr-NCBI nr database and phylogenetic analyses. Since the search was conducted in the annotated genome (i.e. scaffolds over 2.5 kb), when genes were not found, a new search was conducted in the remaining scaffolds. At the end, any genes still undetected were search in the Transcriptome assembly of the species (Bioproject: PRJNA369722).37 Due to the phylogenetic proximity and for comparative purposes, Hox and ParaHox genes were also searched in the genome assembly of M.nervosa.20
For phylogenetic assessment of Hox and Parahox genes, amino acid sequences of homeodomain of the genes from M. margaritifera and M. nervosa, were aligned with other Mollusca orthologous.59,60 Molecular evolutionary models and Maximum-Likelihood phylogenetic analyses were obtained using IQ-TREE v.1.6.12.53,61
3. Results and discussion
3.1. Sequencing results
A total of 494 Gb (∼209×) of raw PE and 76 Gb (∼32×) of raw MP data were generated, which after trimming and quality filtering were reduced by 0.3% and 10%, respectively (Supplementary Table S3). GenomeScope2 model fitting of the k-mer distribution analysis estimated a genome size between 2.31–2.36 Gb and very low heterozygosity between 0.127–0.105% (Fig. 2A). Although larger than the genome of V. ellipsiformis (i.e. 1.80 Gb), the size estimation of the M. margaritifera genome is in line with the recently assembled Unionida mussel M. nervosa20 (i.e. 2.38 Gb). The estimated heterozygosity is the lowest observed within Unionida genomes19,20 and one of the lowest in Mollusca,16 which is remarkable considering it refers to a wild individual. This low value is likely a consequence of population bottlenecks during glaciations events, which have been shown to shape the evolutionary history of many freshwater mussels (e.g.19,62,63) and may also be enhanced by recent human-mediated threats.
3.2. Margaritifera margaritifera de novo genome assembly
The Meraculous assembly and scaffolding yield a final genome size of 2.47 Gb with a contig N50 of 16,899 bp and a scaffold N50 of 288,726 bp (Table 1). Both N50 values are significantly higher than V. ellipsiformis genome assembly, i.e. 3,117 and 6,523 bp, respectively.19 Presently, this M. margaritifera genome assembly reveals one the highest scaffold N50 of the Unionida genomes currently available.19–21 On the other hand, M. nervosa genome assembly contig N50, i.e. 51,552 bp, is higher than M. margaritifera, which is expected given the use of Oxford Nanopore ultra-long reads libraries in the assembly produced by Rogers et al.20 BUSCOs scores of the final assembly indicate a fairly complete genome assembly (Table 1) and although the contiguity is lower when compared with other recent Bivalve genome assemblies, the low percentage of fragmented genes (i.e. 5.9% for Eukaryota and 4.9% for Metazoa) gives further support to the quality of the genome assembly. Similarly, the slight difference observed between the genome size and the initial size estimation is unlikely to be a consequence of erroneous assembly duplication, as duplicated BUSCOs scores are also low (i.e. 1% for Eukaryota and 1.1% for Metazoa). The quality of the genome assembly is further supported by the high percentages of PE reads mapping back to the genome (i.e. 97.75%, Table 1), as well as the KAT k-mer distribution spectrum (Fig. 2B), which demonstrates that almost no read information was excluded from the final assembly. Additionally, around 99% of the transcripts of the M. margaritifera transcriptome assembly37 aligned to the genome assembly (Supplementary Table S4). Overall, these statistics indicate that the M. margaritifera draft genome assembly here presented is fairly complete, non-redundant, and useful resource for various applications.
Table 1.
Contiga | Scaffolda | |
---|---|---|
Total number of sequences (≥1,000 bp) | 265,718 | 105,185 |
Total number of sequences (≥10,000 bp) | 66,019 | 15,384 |
Total number of sequences (≥25,000 bp) | 18,725 | 11,583 |
Total number of sequences (≥50,000 bp) | 4,284 | 9,265 |
Total length (≥1,000 bp) | 2,230,001,992 | 2,472,078,101 |
Total length (≥10,000 bp) | 1,523,143,239 | 2,293,496,118 |
Total length (≥25,000 bp) | 789,559,702 | 2,236,013,546 |
Total length (≥50,000 bp) | 299,796,296 | 2,152,307,394 |
N50 length (bp) | 16,899 | 288,726 |
L50 | 34,910 | 2,393 |
Maximum length (bp) | 209,744 | 2,510,869 |
GC content, % | 35.42 | 35.42 |
Clean paired-end (PE) reads alignment stats | ||
Pecentage of mapped PE (%) | — | 97.754 |
Pecentage of proper pairs PE (%) | — | 90.653 |
Average PE sequence coverage | — | 181.968 |
Pecentage of scaffolds with any coverage (%) | — | 100.00 |
Total BUSCOS for the genome assembly (%) | ||
#Euk database | — | C: 86.8% (S: 85.8%, D: 1.0%), F: 5.9% |
#Met database | — | C: 84.9% (S: 83.8%, D: 1.1%), F: 4.9% |
Gene prediction and annotation statsb | ||
Protein-coding genes (CDS) | — | 35,119 |
Transcripts (mRNA) | — | 40,544 |
Protein-coding genes functional annotated | — | 26,836 |
Transcripts functional annotated | — | 31,584 |
Total gene length (bp) | — | 902,994,752 |
Total mRNA length (bp) | — | 1,101,526,909 |
Total CDS length (bp) | — | 52,211,391 |
Total exon length (bp) | — | 52,211,391 |
Total intron length (bp) | — | 1,024,450,311 |
Total BUSCOS for the predicted proteins (%) | ||
+Euk database | — | C: 90.6% (S: 81.2%, D: 9.4%), F: 3.9% |
+Met database | — | C: 92.6% (S: 82.3%, D: 10.3%), F: 3.2% |
C: complete; S: single; D: duplicated; F: fragmented.
#Euk: from a total of 303 genes of Eukaryota library profile.
#Met: from a total of 978 genes of Metazoa library profile.
Euk: from a total of 255 genes of Eukaryota library profile.
Met: from a total of 954 genes of Metazoa library profile.
All statistics are based on contigs/scaffolds of size ≥1,000 bp.
All statistics are based on contigs/scaffolds of size ≥2,500 bp.
The whole mitochondrial genome obtained with MitoBim is 16,124bp long and its gene content is the expected for Margaritiferidae female type mitogenomes64 with 13 protein-coding genes, 22 transfer RNA, and 2 ribosomal RNA.
3.3. Repeat identification and masking and gene models prediction
The use of the custom repetitive library combined with the RepBase65 ‘Bivalvia’ library, resulted in masking repetitive elements in more than half of the genome assembly, i.e. 59.07% (Table 2). Most of the annotated repetitive elements were unclassified (31.86%), followed by DNA elements (16.00%), long interspersed nuclear elements (6.13%), long terminal repeats (3.72%), and short interspersed nuclear elements (0.79%). After masking, gene prediction resulted in the identification of 35,119 protein-coding genes, with an average gene length of 25,712 bp and average CDS length of 1,287 bp (Supplementary Table S5). Furthermore, 26,836 genes were functionally annotated by similarity to at least one of the three databases used in the annotation (Table 1). The number of predicted genes is in accordance to those observed in other bivalves (and Mollusca) genome assemblies, which although highly variable, in average have around 34,949 predicted genes (calculated from Table 2 of Gomes-dos-Santos et al.16) Although the number of genes predicted within the three Unionida genomes is highly variable, i.e. 123,457 in V. ellipsiformis, 49,149 in M. nervosa, and 35,119 in M. margaritifera, a direct comparison should be taken with caution, given the considerable differences in genome qualities and the different gene predictions strategies applied in the three assemblies.
Table 2.
Number of elements | Length occupied (bp) | Percentage of sequence (%) | ||
---|---|---|---|---|
Marmar + Bivalvia | Marmar + Bivalvia | Marmar + Bivalvia | ||
SINEs: | 108,986 | 17,810,092 | 0.79 | |
ALUs | 0 | 0 | 0 | |
MIRs | 51,807 | 7,321,859 | 0.33 | |
LINEs: | 395,376 | 137,422,770 | 6.13 | |
LINE1 | 7,854 | 2,661,360 | 0.12 | |
LINE2 | 108,179 | 29,801,298 | 1.33 | |
L3/CR1 | 13,806 | 3,697,570 | 0.17 | |
LTR elements: | 174,445 | 83,417,191 | 3.72 | |
ERVL | 0 | 0 | 0 | |
ERVL-MaLRs | 0 | 0 | 0 | |
ERV_classI | 2,849 | 481,472 | 0.02 | |
ERV_classII | 1,072 | 286,047 | 0.01 | |
DNA elements: | 1,208,077 | 358,545,022 | 16.00 | |
hAT-Charlie | 22,178 | 3,778,430 | 0.17 | |
TcMar-Tigger | 54,446 | 15,068,283 | 0.67 | |
Unclassified: | 3,057,728 | 713,890,849 | 31.86 | |
Total interspersed repeats: | 1,311,085,924 | 58.51 | ||
Small RNA: | 51,767 | 7,672,478 | 0.34 | |
Satellites: | 24,005 | 4,250,110 | 0.19 | |
Simple repeats: | 64,021 | 8,534,185 | 0.38 | |
Low complexity: | 970 | 115,583 | 0.01 | |
Total masked | 1,323,560,844 | 59.07 |
Values were produced by RepeatMasker using a RepeatModeler’s custom build M. margaritifera repeat library (abbreviated with ‘Marmar’) combined with the RepBase Biavalve repeat library (RepeatMasker option -lib).
3.4. Single copy orthologous phylogeny
Both Maximum-Likelihood and Bayesian Inference phylogenetic trees revealed the same topology with high support for all nodes (Fig. 3). The phylogeny recovered the reciprocal monophyletic groups Pteriomorphia (represented by Orders Ostreida, Mytilida, Pectinida, and Arcida) and Heteroconchia (represented by Orders Unionida and Venerida). These results are in accordance with recent comprehensive bivalve phylogenetic studies.38,66–68 The only difference is observed within Pteriomorphia, where two sister clades are present, one composed by Arcida and Pectinida and the other by Mytilida and Osteida (Fig. 3), while accordingly to the most recent phylogenomic studies, Arcida appears basal to all other Pteriomorphia.38,67,68 It is noteworthy that Arcida and Pectinida clade is the less supported in the phylogeny, which together with the fact that many Pteriomorphia clades are missing in this study, should explain these discrepant results. Heteroconchia is divided into monophyletic Palaeoheterodonta and Heterodonta (here only represented by two Euheterodonta bivalves). As expected, the two Unionida species, i.e. M. nervosa and the newly obtained M. margaritifera, are placed within Palaeoheterodonta.
3.5. Hox and ParaHox gene repertoire and phylogeny
Homeobox genes refer to a family of homeodomain-containing transcription factors with important roles in Metazoan development by specifying anterior–posterior axis and segment identity (e.g. Refs69,70). Many of these genes are generally found in tight evolutionary conserved physical clusters (e.g. Refs 71,72). Hox genes are typically arranged into tight physical clusters, showing temporal and spatial collinearity.73 Consequently, Hox genes provide useful information for understanding the emergence of morphological novelties, understanding the historical evolution of the species, infer ancestral genomic states of genes/clusters, and even study genome rearrangements, such as whole-genome duplications (e.g. Refs69,70,74). Given the disparate body plans in molluscan classes, the study of Hox cluster composition, organization and gene expression has practically become a standard in Mollusca genome assembly studies.60,75–88 Homeobox genes are divided into four classes, of which the Antennapedia (ANTP)-class (Hox, ParaHox, NK, Mega-homeobox, SuperHox) is the best studied, particularly the Hox and ParaHox clusters.60,74,84 The number of genes from these two clusters is relatively well conserved across Lophotrochozoa, with Hox cluster being composed of 11 genes (3 anterior, 6 central, and 2 posterior) and ParaHox cluster composed of 3 genes. Although several structural and compositional differences have been observed within Mollusca ANTP-class (e.g. Bivalvia,80 Cephalopoda,81 Gastropoda,83 and Polyplacophora),77 most Bivalvia seem to retain the gene composition expected for lophotrochozoans: Hox1, Hox2, Hox3, Hox5, Lox, Antp, Lox4, Lox2, Post2, and Post1 for the Hox cluster and Gsx, Xlox, and Cdx for the ParaHox cluster.78 Consequently, the identification of these genes on a bivalve genome assembly represent further validation of the genome completeness and overall correctness. Furthermore, to the best of our knowledge, this study reports for the first time the Hox and ParaHox genes were identified Unionida. A single copy of the 3 ParaHox and 10 Hox genes were found in the M. margaritifera genome assembly (Supplementary Table S6). Despite an intensive search, no evidence of the presence of Hox4 was detected. However, the gene was identified in the M. margaritifera transcriptome, thus confirming its presence in the species. All genes, apart from Antp and Lox5, were scattered in different scaffolds, with Hox5, Post1, and Gsx being present in scaffolds smaller than 2.5 kb (Supplementary Table S6). Both the small proximity between Antp and Lox5 and the fact that both genes are expressed in the same direction are in accordance with the results observed in other bivalves, including in the phylogenetically closest species (from which Hox cluster has been characterized), i.e. the Venerida clam Cyclina sinensis (Gmelin, 1791).60 The fact that the remaining genes were scattered in the different scaffolds is likely a consequence of the low contiguity of the genome assembly since the distances between Bivalvia Hox genes within a cluster can be as high as 9.9 Mb.60 Conversely, three Hox and one ParaHox genes were found in the M. margaritifera transcriptome assembly and nine Hox and one ParaHox gene were found in M. nervosa genome assembly (Supplementary Table S6). Finally, to further validate the identity of the identified Hox and ParaHox genes, a phylogenetic analysis using the homeodomains (encoded 60–63 amino acid domain) of several Mollusca species was conducted (Fig. 4). All Hox and ParaHox genes of M. margaritifera (as well as M. nervosa) were well positioned within their respective orthologous genes from other Mollusca species (Fig. 3), thus confirming their identity.
3.6. Conclusion and future perspectives
Unionida freshwater mussels are a worldwide distributed and diverse group of organisms with 6 recognized families and around 800 described species.89,90 These organisms play fundamental roles in ecosystems, such as water filtration, nutrient cycling, and sediment bioturbation and oxygenation,91,92 allowing to maintain and support freshwater communities.12 However, as a consequence of several anthropogenic threats, freshwater mussels are experiencing a global-scale decline.12,93M.margaritifera belongs to the most threatened of the 6 Unionida families, i.e. Margaritiferidae. Despite all this, our understanding of the genetics of this species is still to date restricted to a few mtDNA markers phylogenetic and restricted phylogeographical studies6,94–96 as well as neutral genetic markers,95,97,98 making the availability of the present genome a timely resource with application in multiple fields. The characterization of genetic features and identification of genomic novelties (such as single genes or gene families, genomic pathways, single-nucleotide polymorphism, among others) may provide guidance understanding molecular and cellular mechanisms of biomineralization in freshwater mussel shells that may facilitate the use of shell material as environmental and metabolic archives99 and even help clarify the formation of new mineralized tissue following extracorporeal shock wave therapy in humans.100 Being the first representative genome of the family Margaritiferidae, it will help launch both basic and applied genomic-level research on the unique biological and evolutionary features characteristic of this emblematic group.
Accession numbers
SRR13091478, SRR13091479, SRR13091477, JADWMO000000000, PRJNA678877, and SAMN16815977
Supplementary data
Supplementary data are available at NARES online.
Funding
A.G.-d.-S. was funded by the Portuguese Foundation for Science and Technology (FCT) under the grants SFRH/BD/137935/2018, EF (CEECIND/00627/2017) and MLL (2020.03608.CEECIND). This research was developed under ConBiomics: the missing approach for the Conservation of freshwater Bivalves Project No. NORTE-01-0145-FEDER-030286, co-financed by COMPETE 2020, Portugal 2020 and the European Union through the ERDF, and by FCT through national funds. Additional strategic funding was provided by FCT UIDB/04423/2020 and UIDP/04423/2020. Authors’ interaction and writing of the article was promoted and facilitated by the COST Action CA18239: CONFREMU—Conservation of freshwater mussels: a pan-European approach.
Data availability
All the raw sequencing data are available from GenBank via the accession numbers SRR13091478, SRR13091479, and SRR13091477. The assembled genomes are available in the assession number JADWMO000000000, under the BioProject PRJNA678877 and BioSample SAMN16815977 (Supplementary Table S7). The whole mitogenome is available in GenBank under the accession number MW556443. Fasta alignment of homeodomain amino acid sequences from Hox and ParaHox genes used in gene tree construction is available in Additional File 2. The scaffolds in which homeodomains were detected (as described in Supplementary Table S6) are available as Additional File 3. The repeat masked genome assembly, BRAKER2 prediction statistic and prediction gff files, as well as all predicted genes, transcripts and amino acid sequence files are available at Figshare: 10.6084/m9.figshare.13333841.
Conflict of interest
None declared.
Additional File 1 BloobTools contamination screening methods’ description and results.
Additional File 2 Fasta alignment of homeodomain amino acid sequences from Hox and ParaHox genes used in gene tree construction. Sequences used include the Hox and ParaHox homeodomains obtained in this study as well as other Mollusca homeodomain sequences retrieved from Refs.59,60
Additional File 3 Scaffolds fasta sequences in which homeodomains were detected (as described in Supplementary Table S6).
Supplementary Material
References
- 1. von Hessling T. 1859, Die Perlnmuscheln und thre Perlen (Naturwissen-schaftlich und geschichtlich mit, Beruecksichtigung der Perlgewaesser Baerns), Forgotten Books: Leipzig. [Google Scholar]
- 2. Strack E. 2015, European freshwater pearls: part 1-Russia, J. Gemmol., 34, 580–92. [Google Scholar]
- 3. Bespalaya Y.V., Bolotov I.N., Makhrov A.A., Vikhrev I.V.. 2012, Historical geography of pearl fishing in rivers of the Southern White Sea Region (Arkhangelsk Oblast), Reg. Res. Russ., 2, 172–81. [Google Scholar]
- 4. Makhrov A., Bespalaya J., Bolotov I., et al. 2014, Historical geography of pearl harvesting and current status of populations of freshwater pearl mussel Margaritifera margaritifera (L.) in the western part of Northern European Russia, Hydrobiologia, 735, 149–59. [Google Scholar]
- 5. Schlüter J., Retch C.. 1999, Perlen und Perl mutt. Eller und Richter: Hamburg. [Google Scholar]
- 6. Lopes-Lima M., Bolotov I.N., Do V.T., et al. 2018, Expansion and systematics redefinition of the most threatened freshwater mussel family, the Margaritiferidae, Mol. Phylogenetic. Evil., 127, 98–118. [DOI] [PubMed] [Google Scholar]
- 7. Moorkens E., Corduroy J., Sedona M., von Proschwitz T., Woolnough D.. 2018, Margaritifera margaritifera (errata version published in 2018). IUCN Red List Threat. Species 2018, e.T12799A128686456.
- 8. Geist J. 2010, Strategies for the conservation of endangered freshwater pearl mussels (Margaritifera margaritifera L.): a synthesis of conservation genetics and ecology. Hydrobiologia, 644, 69–88. [Google Scholar]
- 9. Moorkens E.A. 2018, Short-term breeding: releasing post-parasitic juvenile Margaritifera into ideal small-scale receptor sites: a new technique for the augmentation of declining populations, Hydrobiologia, 810, 145–55. [Google Scholar]
- 10. Bauer G. 1992, Variation in the life span and size of the freshwater pearl mussel, J. Anim. Ecol., 61, 425. [Google Scholar]
- 11. Hassall C., Amaro R., Ondina P., Outeiro A., Cordero-Rivera A., San Miguel E.. 2017, Population-level variation in senescence suggests an important role for temperature in an endangered mollusc, J. Zool., 301, 32–40. [Google Scholar]
- 12. Lopes-Lima M., Sousa R., Geist J., et al. 2017, Conservation status of freshwater mussels in Europe: state of the art and future challenges, Biol. Rev. Camb. Philos. Soc., 92, 572–607. [DOI] [PubMed] [Google Scholar]
- 13. Gusman A., Lecomte S., Stewart D.T., Passamonti M., Breton S.. 2016, Pursuing the quest for better understanding the taxonomic distribution of the system of doubly uniparental inheritance of mtDNA, PeerJ, 2016, e2760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Breton S., Stewart D.T., Shepardson S., et al. 2011, Novel protein genes in animal mtDNA: a new sex determination system in freshwater mussels (Bivalvia: unionoida)? Mol. Biol. Evil., 28, 1645–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Breton S., Beaupré H.D., Stewart D.T., Hoeh W.R., Blier P.U.. 2007, The unusual system of doubly uniparental inheritance of mtDNA: isn’t one enough? Trends Genet., 23, 465–74. [DOI] [PubMed] [Google Scholar]
- 16. Gomes-dos-Santos A., Lopes-Lima M., Castro L.F.C., Froufe E.. 2020, Molluscan genomics: the road so far and the way forward, Hydrobiologia, 847, 1705–26. [Google Scholar]
- 17. Hollenbeck C.M., Johnston I.A.. 2018, Genomic tools and selective breeding in molluscs, Front. Genet., 9, 253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Takeuchi T. 2017, Molluscan genomics: implications for biology and aquaculture, Curr. Mol. Bio. Rep., 3, 297–305. [Google Scholar]
- 19. Renaut S., Guerra D., Hoeh W.R., et al. 2018, Genome survey of the freshwater mussel Venustaconcha ellipsiformis (Bivalvia: unionida) using a hybrid de novo assembly approach, Genome Biol. Evil., 10, 1637–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Rogers R.L., Grizzard S.L., Titus‐McQuillan J.E., et al. 2021, Gene family amplification facilitates adaptation in freshwater unionid bivalve Megalonaias nervosa, Mol. Ecol., 30, 1155–73. [DOI] [PubMed] [Google Scholar]
- 21. Smith C.H. 2021, A high-quality reference genome for a parasitic bivalve with doubly uniparental inheritance (Bivalvia: unionida), Genome Biol. Evol., 13, evab029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Martin M. 2011, Cutadapt removes adapter sequences from high-throughput sequencing reads, Embnet J., 17, 10. [Google Scholar]
- 23. Ranallo-Benavidez T.R., Jaron K.S., Schatz M.C.. 2020, GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes, Nat. Commun., 11, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Vurture G.W., Sedlazeck F.J., Nattestad M., et al. 2017, GenomeScope: fast reference-free genome profiling from short reads, Bioinformatics, 33, 2202–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Chapman J.A., Ho I., Sunkara S., Luo S., Schroth G.P., Rokhsar D.S.. 2011, Meraculous: de novo genome assembly with short paired-end reads, PLoS One, 6, e23501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Gurevich A., Saveliev V., Vyahhi N., Tesler G.. 2013, QUAST: quality assessment tool for genome assemblies, Bioinformatics, 29, 1072–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Mapleson D., Accinelli G.G., Kettleborough G., Wright J., Clavijo B.J.. 2017, KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies, Bioinformatics, 33, 574–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Bushnell B., Rood J.. 2018, BBTools. BBMap, Joint Genome Institute. [Google Scholar]
- 29. Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M.. 2015, BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs, Bioinformatics, 31, 3210–2. [DOI] [PubMed] [Google Scholar]
- 30. Laetsch D.R., Blaxter M.L.. 2017, BlobTools: interrogation of genome assemblies, F1000Res, 6, 1287. [Google Scholar]
- 31. Hahn C., Bachmann L., Chevreux B.. 2013, Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads - A baiting and iterative mapping approach, Nucleic Acids Res., 41, e129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Bernt M., Donath A., Jühling F., et al. 2013, MITOS: improved de novo metazoan mitochondrial genome annotation, Mol. Phylogenetic. Evil., 69, 313–9. [DOI] [PubMed] [Google Scholar]
- 33. Smit A., Hubley R.. 2015, RepeatModeler, Institute for Systems Biolog, Seattle, WA, USA. [Google Scholar]
- 34. Smit A., Hubley R.. 2015, RepeatMasker, Institute for Systems Biolog, Seattle, WA, USA. [Google Scholar]
- 35. Hoff K.J., Lange S., Lomsadze A., Borodovsky M., Stanke M.. 2016, BRAKER1: unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS, Bioinformatics, 32, 767–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Hoff K.J., Lomsadze A., Borodovsky M., Stanke M.. 2019, Whole-genome annotation with BRAKER, Methods Mol. Biol., 1962, 65–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Bertucci A., Pierron F., Thébault J., et al. 2017, Transcriptomic responses of the endangered freshwater mussel Margaritifera margaritifera to trace metal contamination in the Dronne River, Environ. Sci. Pollut. Res. Int., 24, 27145–59. [DOI] [PubMed] [Google Scholar]
- 38. Gonzalez V.L., Andrade S.C.S., Bieler R., et al. 2015, A phylogenetic backbone for Bivalvia: an RNA-seq approach, Proc. R Soc. B., 282, 20142332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Bolger A.M., Lohse M., Usadel B.. 2014, Trimmomatic: a flexible trimmer for Illumina sequence data, Bioinformatics, 30, 2114–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Song L., Florea L.. 2015, Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads, Gigascience, 4, 48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Kim D., Langmead B., Salzberg S.L.. 2015, HISAT: a fast spliced aligner with low memory requirements, Nat. Methods, 12, 357–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Dainat J., Hereñú D., Pucholt P. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format. Zenodo.
- 43. Quevillon E., Silventoinen V., Pillai S., et al. 2005, InterProScan: protein domains identifier, Nucleic Acids Res., 33, W116–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Buchfink B., Xie C., Huson D.H.. 2015, Fast and sensitive protein alignment using DIAMOND, Nat. Methods, 12, 59–60. [DOI] [PubMed] [Google Scholar]
- 45. Boeckmann B., Bairoch A., Apweiler R., et al. 2003, The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucleic Acids Res., 31, 365–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Pruitt K.D., Tatusova T., Maglott D.R.. 2007, NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins, Nucleic Acids Res., 35, D61–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Wang M., Kong L.. 2019, pblat: a multithread blat algorithm speeding up aligning sequences to genomes, BMC Bioinformatics, 20, 28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Ryan J.F. 2013, Baa.pl: a tool to evaluate de novo genome assemblies with RNA transcripts. ArXiv:1309.2087.
- 49. Emms D.M., Kelly S.. 2019, OrthoFinder: phylogenetic orthology inference for comparative genomics, Genome Biol., 20, 238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Edgar R.C. 2004, MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res., 32, 1792–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Capella-Gutiérrez S., Silla-Martínez J.M., Gabaldón T.. 2009, trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses, Bioinformatics, 25, 1972–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Abascal F., Zardoya R., Posada D.. 2005, ProtTest: selection of best-fit models of protein evolution, Bioinformatics, 21, 2104–5. [DOI] [PubMed] [Google Scholar]
- 53. Nguyen L.T., Schmidt H.A., Von Haeseler A., Minh B.Q.. 2015, IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies, Mol. Biol. Evil., 32, 268–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Ronquist F., Teslenko M., van der Mark P., et al. 2012, MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space, Syst. Biol., 61, 539–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J.. 1990, Basic local alignment search tool, J. Mol. Biol., 215, 403–10. [DOI] [PubMed] [Google Scholar]
- 56. Barton-Owen T.B., Szabó R., Somorjai I.M.L., Ferrier D.E.K.. 2018, A revised spiralian homeobox gene classification incorporating new polychaete transcriptomes reveals a diverse TALE class and a divergent hox gene, Genome Biol. Evil., 10, 2151–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Paps J., Xu F., Zhang G., Holland P.W.H.. 2015, Reinforcing the egg-timer: recruitment of novel lophotrochozoa homeobox genes to early and late development in the pacific oyster, Genome Biol. Evil., 7, 677–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Lu S., Wang J., Chitsaz F., et al. 2020, CDD/SPARCLE: the conserved domain database in 2020, Nucleic Acids Res., 48, D265–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Huan P., Wang Q., Tan S., Liu B.. 2020, Dorsoventral decoupling of Hox gene expression underpins the diversification of molluscs, Proc. Natl. Acad. Sci. USA, 117, 503–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Li Y., Nong W., Baril T., et al. 2020, Reconstruction of ancient homeobox gene linkages inferred from a new high-quality assembly of the Hong Kong oyster (Magallana hongkongensis) genome, BMC Genomics, 21, 713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Kalyaanamoorthy S., Minh B.Q., Wong T.K.F., Von Haeseler A., Jermiin L.S.. 2017, ModelFinder: fast model selection for accurate phylogenetic estimates, Nat. Methods, 14, 587–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Froufe E., Gonçalves D.V., Teixeira A., et al. 2016, Who lives where? Molecular and morphometric analyses clarify which Unio species (Unionida, Mollusca) inhabit the southwestern Palearctic, Org. Divers. Evil., 16, 597–611. [Google Scholar]
- 63. Froufe E., Prié V., Faria J., et al. 2016, Phylogeny, phylogeography, and evolution in the Mediterranean region: news from a freshwater mussel (Potomida, Unionida), Mol. Phylogenetic. Evil., 100, 322–32. [DOI] [PubMed] [Google Scholar]
- 64. Gomes-dos-Santos A., Froufe E., Amaro R., et al. 2019, The male and female complete mitochondrial genomes of the threatened freshwater pearl mussel Margaritifera margaritifera (Linnaeus, 1758) (Bivalvia: Margaritiferidae), Mitochondrial DNA B Resour., 4, 1417–20. [Google Scholar]
- 65. Bao W., Kojima K.K., Kohany O.. 2015, Repbase update, a database of repetitive elements in eukaryotic genomes, Mob. DNA, 6, 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Bieler R., Mikkelsen P.M., Collins T.M., et al. 2014, Investigating the Bivalve Tree of Life – an exemplar-based approach combining molecular and novel morphological characters, Invertebr. Syst., 28, 32. [Google Scholar]
- 67. Lemer S., Bieler R., Giribet G.. 2019, Resolving the relationships of clams and cockles: dense transcriptome sampling drastically improves the bivalve tree of life, Proc. Biol. Sci., 286, 20182684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Lemer S., González V.L., Bieler R., Giribet G.. 2016, Cementing mussels to oysters in the pteriomorphian tree: a phylogenomic approach, Proc. R Soc. B., 283, 20160857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Ferrier D.E.K., Holland P.W.H.. 2001, Ancient origin of the Hox gene cluster, Nat. Rev. Genet., 2, 33–8. [DOI] [PubMed] [Google Scholar]
- 70. Holland P.W.H. 2013, Evolution of homeobox genes, Wiley Interdiscip. Rev. Dev. Biol., 2, 31–45. [DOI] [PubMed] [Google Scholar]
- 71. Pollard S.L., Holland P.W.H.. 2000, Evidence for 14 homeobox gene clusters in human genome ancestry, Curr. Biol., 10, 1059–62. [DOI] [PubMed] [Google Scholar]
- 72. Castro L.F.C., Holland P.W.H.. 2003, Chromosomal mapping of ANTP class homeobox genes in amphioxus: piecing together ancestral genomes, Evil. Dev., 5, 459–65. [DOI] [PubMed] [Google Scholar]
- 73. Ferrier D.E.K., Holland P.W.H.. 2002, Ciona intestinalis ParaHox genes: evolution of Hox/ParaHox cluster integrity, developmental mode, and temporal colinearity, Mol. Phylogenetic. Evil., 24, 412–7. [DOI] [PubMed] [Google Scholar]
- 74. Brooke N.M., Garcia-Fernàndez J., Holland P.W.H.. 1998, The ParaHox gene cluster is an evolutionary sister of the Hox gene cluster, Nature, 392, 920–2. [DOI] [PubMed] [Google Scholar]
- 75. Albertin C.B., Simakov O., Mitros T., et al. 2015, The octopus genome and the evolution of cephalopod neural and morphological novelties, Nature, 524, 220–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Takeuchi T., Koyanagi R., Gyoja F., et al. 2016, Bivalve-specific gene expansion in the pearl oyster genome: implications of adaptation to a sessile lifestyle, Zool. Lett., 2, 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Varney R.M., Speiser D.I., McDougall C., Degnan B.M., and Kocot K.M., 2021, The iron-responsive genome of the chiton Acanthopleura granulata, Genome Biol. Evol., 13, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Wang S., Zhang J., Jiao W., et al. 2017, Scallop genome provides insights into evolution of bilaterian karyotype and development, Nat. Ecol. Evil., 1, 120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Yan X., Nie H., Huo Z., et al. 2019, Clam genome sequence clarifies the molecular basis of its benthic adaptation and extraordinary shell color diversity, iScience, 19, 1225–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Zhang G., Fang X., Guo X., et al. 2012, The oyster genome reveals stress adaptation and complexity of shell formation, Nature, 490, 49–54. [DOI] [PubMed] [Google Scholar]
- 81. Da Fonseca R.R., Couto A., Machado A.M., et al. 2020, A draft genome sequence of the elusive giant squid, Architeuthis dux, Gigascience, 9, giz152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82. Li Y., Sun X., Hu X., et al. 2017, Scallop genome reveals molecular adaptations to semi-sessile life and neurotoxins, Nat. Commun., 8, 1721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83. Liu C., Ren Y., Li Z., et al. 2021, Giant African snail genomes provide insights into molluscan whole-genome duplication and aquatic–terrestrial transition, Mol. Ecol. Resour., 21, 478–94. [DOI] [PubMed] [Google Scholar]
- 84. Pérez-Parallé M.L., Pazos A.J., Mesías-Gansbiller C., Sánchez J.L.. 2016, Hox, Parahox, Ehgbox, and NK genes in bivalve molluscs: evolutionary implications, J. Shellfish Res., 35, 179–90. [Google Scholar]
- 85. Simakov O., Marletaz F., Cho S.-J., et al. 2013, Insights into bilaterian evolution from three spiralian genomes, Nature, 493, 526–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Sun J., Chen C., Miyamoto N., et al. 2020, The scaly-foot snail genome and implications for the origins of biomineralised armour, Nat. Commun., 11, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87. Sun J., Mu H., Ip J.C.H., et al. 2019, Signatures of divergence, invasiveness, and terrestrialization revealed by four apple snail genomes, Mol. Biol. Evil., 36, 1507–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88. Sun J., Zhang Y., Xu T., et al. 2017, Adaptation to deep-sea chemosynthetic environments as revealed by mussel genomes, Nat. Ecol. Evil., 1, 121. [DOI] [PubMed] [Google Scholar]
- 89. Bogan A.E. 2008, Global diversity of freshwater mussels (Mollusca, Bivalvia) in freshwater. Hydrobiologia, 595, 139–47. [Google Scholar]
- 90. Graf D.L., Cummings K.S.. 2007, Review of the systematics and global diversity of freshwater mussel species (Bivalvia: unionoida), J. Molluscan. Stud., 73, 291–314. [Google Scholar]
- 91. Howard J.K., Cuffey K.M.. 2006, The functional role of native freshwater mussels in the fluvial benthic environment, Freshwater Biol., 51, 460–74. [Google Scholar]
- 92. Vaughn C.C. 2018, Ecosystem services provided by freshwater mussels, Hydrobiologia, 810, 15–27. [Google Scholar]
- 93. Böhm M., Dewhurst-Richman N.I., Sedona M., et al. 2020, The conservation status of the world’s freshwater molluscs, Hydrobiologia, 640, 1–24. [Google Scholar]
- 94. Bolotov I.N., Vikhrev I.V., Bespalaya Y.V., et al. 2016, Multi-locus fossil-calibrated phylogeny, biogeography and a subgeneric revision of the Margaritiferidae (Mollusca: Bivalvia: Unionoida), Mol. Phylogenetic. Evil., 103, 104–21. [DOI] [PubMed] [Google Scholar]
- 95. Zanatta D.T., Stoeckle B.C., Inoue K., et al. 2018, High genetic diversity and low differentiation in North American Margaritifera margaritifera (bivalvia: unionida: argaritiferidae), Biol. J. Linn. Soc., 123, 850–63. [Google Scholar]
- 96. Araujo R., Schneider S., Roe K.J., Erpenbeck D., Machordom A.. 2017, The origin and phylogeny of Margaritiferidae (Bivalvia, Unionoida): a synthesis of molecular and fossil data, Zool. Scr., 46, 289–307. [Google Scholar]
- 97. Bouza C., Castro J., Martínez P., et al. 2007, Threatened freshwater pearl mussel Margaritifera margaritifera L. in NW Spain: low and very structured genetic variation in southern peripheral populations assessed using microsatellite markers, Conserv. Genet., 8, 937–48. [Google Scholar]
- 98. Geist J., Kuehn R.. 2005, Genetic diversity and differentiation of central European freshwater pearl mussel (Margaritifera margaritifera L.) populations: implications for conservation and management, Mol. Ecol., 14, 425–39. [DOI] [PubMed] [Google Scholar]
- 99. Geist J., Auerswald K., Boom A.. 2005, Stable carbon isotopes in freshwater mussel shells: environmental record or marker for metabolic activity? Geochim. Cosmochim. Acta, 69, 3545–54. [Google Scholar]
- 100. Sternecker K., Geist J., Beggel S., et al. 2018, Exposure of zebra mussels to extracorporeal shock waves demonstrates formation of new mineralized tissue inside and outside the focus zone, Biol. Open., 7, bio033258. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All the raw sequencing data are available from GenBank via the accession numbers SRR13091478, SRR13091479, and SRR13091477. The assembled genomes are available in the assession number JADWMO000000000, under the BioProject PRJNA678877 and BioSample SAMN16815977 (Supplementary Table S7). The whole mitogenome is available in GenBank under the accession number MW556443. Fasta alignment of homeodomain amino acid sequences from Hox and ParaHox genes used in gene tree construction is available in Additional File 2. The scaffolds in which homeodomains were detected (as described in Supplementary Table S6) are available as Additional File 3. The repeat masked genome assembly, BRAKER2 prediction statistic and prediction gff files, as well as all predicted genes, transcripts and amino acid sequence files are available at Figshare: 10.6084/m9.figshare.13333841.