Skip to main content
Frontiers in Genetics logoLink to Frontiers in Genetics
. 2020 Jul 7;11:664. doi: 10.3389/fgene.2020.00664

The Chromosome-Level Genome Assembly and Comprehensive Transcriptomes of the Razor Clam (Sinonovacula constricta)

Yinghui Dong 1,, Qifan Zeng 2,, Jianfeng Ren 3,, Hanhan Yao 1, Liyuan Lv 1, Lin He 1, Wenbin Ruan 1, Qinggang Xue 1, Zhenmin Bao 2,4, Shi Wang 2,5,6,*, Zhihua Lin 1,*
PMCID: PMC7358530  PMID: 32733535

Introduction

The Chinese razor clam Sinonovacula constricta (Lamarck, 1818) is a member of the Solenidae family of bivalve molluscs, recognizing for its straight razor-like shape and fragile shells (Morton, 1984). It is widely distributed in the intertidal zone along the west Pacific Ocean and engages in a pelago-benthic life cycle (Wang and Xu, 1997). As adaptation to a deep-burrowing lifestyle, the razor clam is characterized by smooth shells, muscular foot, and elongated siphons (Morton, 1984). Benefit from its relatively short production cycle and high productive efficiency, the razor clam has become one of most important maricultured bivalve species in Asia with over 800,000 metric tons of production in 2017 (FAO, 2019).

As living in the estuarine and intertidal region, the razor clam faces tremendous exposure to extreme environmental stresses such as drastic salinity fluctuation, highly variable temperature, high concentration of ammonia nitrogen and hydrogen sulfide (Morton, 1984). Unlike oysters, mussels and most clams with thick and sealed shells for protecting their soft bodies, the razor clam with two thin and unclosed shells has chosen a survival strategy of deep-burrowing lifestyle with high tolerance of a broad range of salinities (5–45 psu) (Morton, 1984; Peng et al., 2019). Therefore, it is an ideal model with which to investigate the adaptive mechanisms of a deep-burrowing lifestyle. Despite that increased genomic sequences (Dong et al., 2019; Ran et al., 2019) and transcriptomic data (Niu et al., 2013) have been generated, full-spectrum spatial-temporal transcriptomes are still insufficient for exploring its unique biology and adaptive evolution.

Here, we generated the high-quality chromosomal-level genome assembly and comprehensive transcriptomes of S. constricta and investigated the transcriptomic changes under salinity and ammonia stresses. These genomic resources will lay a prime foundation for future studies of deep-burrowing lifestyle-related adaptive evolution and genetic improvement in commercial breeding programs of razor clam.

Data

The experimental design was illustrated in Figure 1. In total, 386.2 Gb clean data were used for assembling the genome of S. constricta, including 129.73 Gb Illumina reads, 101.79 Gb Pacbio reads, and 154.68 Gb Hi-C reads (Table 1). The genome size was estimated to be 1,244.27 Mb, with a heterozygous ratio of 1.53% and a repeat rate of 53.12%, which was consistent with the results of a previous study (Ran et al., 2019). By using the WTDBG pipeline, the resulting 1,331.97 Mb assembly was obtained with a contig N50 of 678,857 bp (Table 1). The contigs were clustered into 19 linkage groups using Lachesis (Figure 2A), which was in concordance with the karyotype analysis (Wang et al., 1998). The final assembly was 1,331.28 Mb in length and the scaffold N50 reached over 57.99 Mb. The S. constricta genome exhibited a remarkable level of macrosynteny to the ancestral bilaterian linkage groups with a conservation index (CI) of 0.71, indicating the high accuracy of the Hi-C clustering (Figure 2B; Simakov et al., 2013; Wang et al., 2017). The integrity of the genomic assembly was evaluated by alignment of the Illumina reads. In summary, 88.90% of the genomic sequences were covered by 93.93% of the total reads (Table S1). The Core Eukaryotic Genes Mapping Approach (CEGMA) analysis (Parra et al., 2007) and Benchmarking Universal Single-Copy Orthologs (BUSCO version 3) analysis also revealed a high-level of completeness by identifying 227 of the 248 core eukaryotic genes (91.53%) and 868 of the 978 near-universal single-copy metazoan orthologs (88.7%), respectively (Tables S2, S3).

Figure 1.

Figure 1

Overview of the experimental design and the data processing pipeline. Illustrations of the anatomical structure (A) and life cycle (B) of S. constricta. Experimental design for the salt stress (C) and ammonia nitrogen challenge (D). Pipelines for the genome sequencing, assembly and annotation (E), full-length transcriptome sequencing and analysis (F), and Illumina transcriptome sequencing and analysis (G).

Table 1.

Summary statistics of genome assembly of Sinonovacula constricta.

Section Results
A. SEQUENCES
Clean data (Gb) Depth (X)
Illumina 129.73 104.62
PacBio 101.79 82.01
Hi-C 154.68 124.74
Total 386.20 311.37
B. ASSEMBLY
Number of scaffolds 7,932
Total bases (bp) 1,332,277,427
Contig N50 (bp) 678,857
Scaffold N50 (bp) 57,991,182
C. GENES
Number of genes 26,270
Complete BUSCOs 88.75%
Number of functional annotated genes 26,140
D. REPETITIVE SEQUENCES
Repeat size (bp) % of genome
DNA 191,499,094 14.38
LINE 71,938,692 5.40
LTR 144,451,530 10.84
SINE 5,528,172 0.42
Tandem repeat 204,889,587 15.38
Other 157,232 0.01
Unknown 56,940,582 4.27

Figure 2.

Figure 2

(A) Phylogenetic tree and number of shared orthologs among S. constricta and other animal species. Numbers of gene families undergoing expansion and contraction for each lineage are indicated as red and green, respectively. (B) The comparison of the copy numbers of dynein heavy chain (DYH) and alpha tubulin (TUA) genes in 8 molluscan species. (C–D) The tissue-wide expression patterns of DYH genes and TUA genes. G, gill; L, liver; A. adductor muscle; M, mantle; F, foot; S, siphon. (E) Hi-C interaction heat map of of S. constricta. (F) Chromosome-based macro-synteny between S. constricta and the 17 presumed bilaterian ALGs retrieved from Simakov et al. (2013).

Genomic annotation identified 50.71% of the assembled genome as repetitive sequences. Repetitive sequences were dominated by tandem repeats (15.39%) and followed by DNA transposons (14.38%) and LTR retrotransposons (10.84%) (Table S4). Using a combination of de novo prediction, homologous prediction, and alignment of transcriptomic data, 26,270 protein-coding genes were annotated (Table S5). A total of 26,140 (99.5%) genes could be functionally annotated by at least one database (Table S6).

The gene set of S. constricta were compared to fifteen eumetazoan species, including Homo sapiens, Branchiostoma floridae, Drosophila melanogaster, Apis mellifera, Helobdella robusta, Capitella teleta, Octopus bimaculoides, Lottia gigantea, Aplysia californica, Crassostrea gigas, Crassostrea virginica, Saccostrea glomerata, Patinopecten yessoensis, Chlamys farreri, and Nematostella vectensis. The proteins from all 16 species were assigned to 39,058 families with 337 strict single-copy orthologs. Maximum likelihood phylogenetic inference of these single-copy orthologs revealed that S. constricta diverged from the lineage that led to oysters and scallops ~494 million years ago (Mya; Figure 2C). Among the 12,945 gene families identified in the S. constricta genome, 193 and 31 gene families were significantly expanded and contracted, respectively (Figure 2C). Notably, the cytoskeletal protein alpha tubulin (TUA) and the motor protein dynein heavy chain (DYH) families were rapidly expanded in the S. constricta genome (Figure 2D, Figures S1, S2). These families play vital roles in the microtubule architecture and bending movement of cilia (Mohri et al., 2012). The razor clam has a highly developed ciliary system for gill filtering, food-particle retention and water pumping (Morton, 1984). Adjoining cilia generate effective beats through coordinated wavelike movements. The pumping rate of the ciliary system in the gill and mantle cavity can be adjusted to generate powerful currents that facilitate the principal sorting and retaining of suspended particles in the labial palps. Effluxes can also be ejected from the pedal gape to flush away sources of irritation detected by sensory tentacles (Morton, 1984). The transcriptomic data revealed that high levels of TUA and DYH genes are expressed in the gills (Figures 2E,F), implicating that the expansion of these genes might be an adaptation to the deep-burrowing lifestyle.

Molecular adaptations of S. constricta to salt fluctuations were analyzed by comparing the whole transcriptomic alterations under the salinity of 3, 25, and 38 psu (Figure S3). A total of 462 upregulated and 655 downregulated genes were identified under elevated salinity. Functional analysis showed that these differentially expressed genes (DEGs) were significantly enriched in chitin metabolism, immune response, scavenger receptor activity, and ATPase activity coupled to transmembrane substance movement. When facing the decreased salinity, S. constricta enhanced the expression of 898 genes and suppressed 826 genes. Enrichment analysis revealed that these DEGs were mainly related to transition metal ion binding and oxidation-reduction process. Tolerance of razor clams to the ammonia nitrogen stress were investigated by transcriptome analysis (Figure S4). In the gill, 1,029 and 386 genes were upregulated and downregulated after ammonia challenge, respectively. Functional analysis indicated that DEGs were significantly enriched in the regulation of nitrogen metabolism and nitrogen transport. A smaller number of DEGs were identified in the liver, including 248 upregulated and 58 downregulated genes, respectively. GO and KEGG analysis showed that the DEGS were significant enrichments in response to stress, exopeptidase activity, and copper ion binding.

Materials and Methods

Genomic DNA Preparation, PacBio, and Illumina Sequencing

An individual S. constricta was collected from the brood stock of the genetic breeding center of Zhejiang Wanli University. Genomic DNA was extracted from the adductor muscle with phenol-chloroform as described (Green and Sambrook, 2012). High molecular-weight genomic DNA was sheared into fragments of ~30 kb using a Covaris ultrasonicator (Covaris Inc., Woburn, MA, USA). The fragments were enzymatically repaired and converted into SMRTbell™ template library following the manufacturer's instructions. Size-selection was performed to enrich the DNA fragments longer than 10 kb for sequencing on a Pacific Biosciences (PacBio) Sequel Single Molecule Real Time (SMRT) platform. A paired-end Illumina library with an insert size of 350 bp was prepared using Illumina Genomic DNA sample preparation kits and sequenced on an Illumina Xten system. Adductor muscle tissue of a razor clam from the same population was collected to construct a Hi-C library. The specimen was fixed with 1% formaldehyde and the genomic DNA was cross-linked, digested with the restriction enzyme MboI, labeled with a biotinylated residue and end repaired. The library was sequenced on an Illumina NovaSeq platform.

Estimation of Genome Size and Sequencing Coverage

The Illumina reads were first trimmed to remove adaptors and reads with >10% ambiguous or >20% low-quality bases using the Trimmomatic package (Bolger et al., 2014). The distribution of the 17-mer frequency was estimated using the clean reads. Genome size was estimated according to the formula: genome size = k-mer number/peak depth (Varshney et al., 2011).

De novo Genome Assembly and Quality Assessment

PacBio long reads were assembled using the WTDBG package (Ruan and Li, 2019). The consensus calling of preceding assembly was conducted with wtdbg-cns to reduce sequencing errors and subsequently polishing by Arrow (SMRT Link v 5.1.0). To ensure high accuracy of the genome assembly, Illumina paired-end clean reads were aligned to the assembly using BWA. Post-processing error was corrected and conflicts of the assembly were resolved via Pilon (Walker et al., 2014).

The HiC reads were truncated at the junctions and aligned to the polished genome using BWA (version 0.7.17) with default parameters. Only unique aligned reads with a mapping quality >20 were further processed. After filtering invalid interaction pairs using HiC-Pro (v.2.8.0) (Servant et al., 2015), the valid pairs were used to evaluate interaction strength among whole genome contigs. Lachesis (version 2e27abb) was used to cluster and anchor the contigs to the chromosomes using agglomerative hierarchical clustering (Burton et al., 2013).

To assess the integrity of the genome assembly, Illumina reads were mapped to the contigs using BWA (version 0.6.2). Genome completeness was also evaluated using the Core Eukaryotic Genes Mapping Approach (CEGMA) (Parra et al., 2007) and Benchmarking Universal Single-Copy Orthologs (BUSCO version 3) (Waterhouse et al., 2017).

Repetitive Sequence Annotation

Repetitive sequences in the genome assembly were identified through ab initio prediction and homology-based searches. Repeat families in the S. constricta genome were identified de novo using RepeatScout (version 1.0.5) and Repeat Modeler (version 1.0.11). Full length long terminal repeat (LTR) retrotransposons were also identified using LTR-finder (version 1.0.2) (Xu and Wang, 2007) with the parameters “-E –C.” Tandem repeats were screened using Tandem Repeats Finder (TRF version 4.09) (Benson, 1999) with the parameters “match = 2, mismatching penalty = 7, indel penalty = 7, match probability = 80, indel probability = 10, minimum alignment score = 50, maximum period size = 500.” The predicted repetitive sequences along with the RepBase database (Bao et al., 2015) were used for homology-based searches using Repeatmasker (version 4.5.0) with the parameters “-a -nolow -no_is -norna -parallel 32 -small -xsmall -poly -e ncbi -pvalue 0.0001” (Tarailo-Graovac and Chen, 2009).

Protein-Coding Gene Prediction and Annotation

Protein-coding genes were annotated based on de novo prediction, homology-based searches, and transcriptome-assisted methods. The protein sequences of 11 species (Table S5) were aligned to the genome assembly using TBLASTN with the parameters “-evalue 1e-5.” The gene structures were predicted using GeneWise (version 2.4.1) (Doerks et al., 2002). The Illumina RNA-seq reads of eight tissues and eight developmental stages were aligned to the genome assembly using Tophat (version 2.1.1) (Trapnell et al., 2009) and gene models were generated using Cufflinks (version 2.1.0) (Trapnell et al., 2010) with the parameter “-multi-read-correct.” The resulting GTF file and the PacBio Iso-seq transcripts were used to model gene structures with the PASA pipeline (version 2.0.2) (Haas et al., 2008). De novo gene prediction packages, including Augustus (version 2.5.5) (Stanke et al., 2006), glimmerHMM (version 3.01) (Majoros et al., 2004), SNAP (version 2006-07-28) (Leskovec and Sosic, 2016), Geneid (version 1.4) (Parra et al., 2000), and Genscan (version 3.1) (Burge and Karlin, 1997) were used to predict genes with repeat-masked genome sequences. All gene model evidences were integrated using EVidenceModeler (version 1.1.1) (Haas et al., 2008).

Functional annotation was performed by aligning the predicted protein sequences to public databases, including KEGG, SwissProt and NCBI-NR, using BLASTP with an E-value threshold of 1e-5. InterProScan (v.4.8) (Jones et al., 2014) was also used to identify motifs and domains obtained from searching the Pfam, InterPro, and Gene Ontology (GO) databases.

Noncoding RNA Prediction and Annotation

The noncoding RNA genes, including miRNA, rRNA, snRNA, and tRNA, were annotated in the S. constricta genome. Transfer RNA were predicated by tRNAscan-SE 2.0 (Lowe and Chan, 2016) with eukaryote parameters. The miRNA and snRNA were screened using INFERNAL 1.1.2 against the Rfam database (version 14.1) (Kalvari et al., 2018) with default parameters (Table S7).

Gene Family Identification and Phylogenetic Analysis

Fifteen Eumetazoan species were selected for gene family analysis. All data were downloaded from either NCBI or Ensembl. The longest protein sequence was selected to represent a gene with multiple alternative splicing isoforms. Gene family clusters from all 16 species were first assigned using OrthoMCL (version 2.0.9) (Li et al., 2003) with an inflation value of 1.5. Gene family expansion and contraction under a maximum likelihood framework were analyzed using CAFE (version 3) (De Bie et al., 2006).

Phylogenetic inferences of the 16 species were conducted using 337 single-copy orthologs. Multiple sequences were aligned for the protein sequences of each ortholog gene using MUSCLE (version 3.8.31) (Edgar, 2004). The alignments of all the orthologs were then concatenated into a super alignment matrix. RAxML (version 8.2.12) (Stamatakis, 2014) was used to infer the alignment matrix via a maximum likelihood method with the substitution model PROTGAMMAAUTO. Node support was provided by bootstrapping with 100 replicates. The time when species diverged was estimated using MCMCTree in the PAML package (version 4.7a) (Yang, 1997) with the parameters of “burn-in = 1,000, sample-number = 1,000,000, sample-frequency = 2.”

Transcriptomic Sequencing and Analysis

Embryos (eggs, four cells, blastulae, gastrulae), larvae (trochophore, D-shaped, umbo larvae and juvenile) and adults of S. constricta were collected for RNA sequencing. Eight tissues/organs (gill, liver, foot, mantle, adductor muscle, siphon, ovary, and testis) were dissected from three individuals. All the samples were flash frozen in liquid nitrogen and stored at −80°C until use. Full-length RNA was sequenced using mixed RNA from samples of the eight development stages and eight adult tissues. Three libraries with different insert lengths (e.g., 1–2k, 2–3k, and 3–6k) were constructed and sequenced on a PacBio Sequel platform. The full-length RNA transcriptome was analyzed using the SMRT Link software (v4.0.0). After redundant sequence clustering using the ICE algorithm, consensus sequence building using the DAGCon algorithm and polishing with Quiver. The transcripts were then polished and corrected using Illumina reads with LoRDEC (Salmela and Rivals, 2014). Finally, the corrected transcripts were further clustered with CD-HIT (version 4.6) (Li and Godzik, 2006).

To identify genes and pathways involved in the tolerance to salt fluctuations and ammonia. S. constricta were subjected to salt stress for 16 h under 22°C at low-salinity (3 psu), high-salinity (38 psu), and normal-salinity (25 psu). Three replicate tanks were set for each group with 10 individuals for each replicate. Gills were dissected from three individuals of each replicate and stored at −80°C. For the ammonia challenge, two groups of S. constricta were subjected to total ammonia nitrogen (TAN) at the concentration of 180 and 0.31 mg/L for 72 h (15°C, 23%0 sea water and pH 8.17). Three replicate tanks were set for each group with 80 individuals per tank. Gill and liver samples were dissected from three individuals of each replicate and stored at −80°C. Total mRNA were extracted using TRIzol (Omega Bio-Tek Inc., Norcross, GA, USA). RNA libraries were constructed and sequenced on an Illumina X Ten system. Raw reads were then filtered using Trimmomatic (Bolger et al., 2014) (Table S8). The clean reads were aligned to the indexed reference genome using Hisat2 version 2.0.5 (Kim et al., 2015). Read numbers mapped to each gene were counted using featureCounts version 1.5.0 (Liao et al., 2014). Differentially expressed genes (DEG) between different experiment groups were identified using the DESeq2 R package version 1.16.1 with an adjusted P < 0.05 (Love et al., 2014). Gene Ontology (GO) enrichment of DEG was analyzed using the cluster Profiler 3.4.4 R package with corrected P < 0.05 being taken as significantly enriched GO terms (Yu et al., 2012).

Data Availability Statement

The datasets generated for this study can be found in the NCBI under accession numbers WSYO00000000.1. The raw data of the PacBio, Illumina, and Hi-C sequencing for genome assembly are deposited at the NCBI Sequence Read Archive (SRA) with the accession numbers SRR10605359SRR10605381, SRR10583055, and SRR10586373. The raw data of transcriptome sequencing are available at the NCBI SRA with the accession numbers SRR10753889SRR10753895, SRR2162883, SRR2162887, SRR2162892, SRR2162895, SRR2162898, SRR2162902, SRR9937008SRR9937013, SRR10097413SRR10097424, SRR9959746SRR9959754, and SRR9943679SRR9943690.

Ethics Statement

All experimental procedures were approved by the Institutional Animal Care and Use Committee (IACUC) of Zhejiang Wanli University, China.

Author Contributions

YD, ZL, and ZB conceived the project. HY, WR, and LH conducted the environmental stress and collected the samples. QZ, SW, JR, and LL performed the genome assembly, annotation, transcriptome analysis and other bioinformatics analysis. YD, JR, QZ, SW, and QX wrote and revised the manuscript. All authors read and approved the final manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This manuscript has been released as a pre-print at bioRxiv (Dong et al., 2019).

Footnotes

Funding. This work was financially supported by the National Key Research and Development Program of China (No. 2018YFD0901405), Zhejiang Major Program of Science and Technology (No. 2016C02055-9), Modern Agro-industry Technology Research System (No. CARS-49), Ningbo Major Project of Science and Technology (No. 2019B10005), and Taishan Scholar Project Fund of Shandong Province of China (to SW).

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2020.00664/full#supplementary-material

References

  1. Bao W., Kojima K., Kohany O. (2015). Repbase update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6:11. 10.1186/s13100-015-0041-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Benson G. (1999). Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580. 10.1093/nar/27.2.573 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bolger A., Lohse M., Usadel B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Burge C., Karlin S. (1997). Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94. 10.1006/jmbi.1997.0951 [DOI] [PubMed] [Google Scholar]
  5. Burton J., Adey A., Patwardhan R., Qiu R., Kitzman J., Shendure J. (2013). Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125. 10.1038/nbt.2727 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. De Bie T., Cristianini N., Demuth J., Hahn M. (2006). CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22, 1269–1271. 10.1093/bioinformatics/btl097 [DOI] [PubMed] [Google Scholar]
  7. Doerks T., Copley R., Schultz J., Ponting C., Bork P. (2002). Systematic identification of novel protein domain families associated with nuclear functions. Genome Res. 12, 47–56. 10.1101/gr.203201 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Dong Y., Zeng Q., Ren J., Yao H., Ruan W., Lv L., et al. (2019). The chromosomal-level genome assembly and comprehensive transcriptomes of Chinese razor clam (Sinonovacula constricta) with deep-burrowing life style and broad-range salinity adaptation. BioRxiv [Preprint] 735142 10.1101/735142 [DOI] [Google Scholar]
  9. Edgar R. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797. 10.1093/nar/gkh340 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. FAO (2019). FAO Yearbook. Fishery and Aquaculture Statistics 2017. Available online at: http://www.fao.org/fishery/statistics/yearbook/en (accessed June 24, 2020).
  11. Green M., Sambrook J. (2012). Molecular Cloning: A Laboratory Manual. 4th Edn. Vol. II. New York, NY: Cold Spring Harbor Laboratory Press. [Google Scholar]
  12. Haas B. J., Salzberg S. L., Zhu W., Pertea M., Allen J. E., Orvis J., et al. (2008). Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9:R7. 10.1186/gb-2008-9-1-r7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Jones P., Binns D., Chang H. Y., Fraser M., Li W., McAnulla C., et al. (2014). InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240. 10.1093/bioinformatics/btu031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Kalvari I., Nawrocki E. P., Argasinska J., Quinones-Olvera N., Finn R. D., Bateman A., et al. (2018). Non-coding RNA analysis using the RFAM database. Curr. Protoc. Bioinformatics 62:e51. 10.1002/cpbi.51 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Kim D., Langmead B., Salzberg S. L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360. 10.1038/nmeth.3317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Leskovec J., Sosic R. (2016). SNAP: A general purpose network analysis and graph mining library. ACM Trans. Intell. Syst. Technol. 8:8361. 10.1145/2898361 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Li L., Stoeckert C. J., Jr., Roos D. S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189. 10.1101/gr.1224503 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Li W., Godzik A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659. 10.1093/bioinformatics/btl158 [DOI] [PubMed] [Google Scholar]
  19. Liao Y., Smyth G. K., Shi W. (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930. 10.1093/bioinformatics/btt656 [DOI] [PubMed] [Google Scholar]
  20. Love M. I., Huber W., Anders S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15:550. 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Lowe T. M., Chan P. P. (2016). tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 44, W54–W57. 10.1093/nar/gkw413 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Majoros W. H., Pertea M., Salzberg S. L. (2004). TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879. 10.1093/bioinformatics/bth315 [DOI] [PubMed] [Google Scholar]
  23. Mohri H., Inaba K., Ishijima S., Baba S. A. (2012). Tubulin-dynein system in flagellar and ciliary movement. Proc. Jpn. Acad. Ser. B Phys. Biol. Sci. 88, 397–415. 10.2183/pjab.88.397 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Morton B. (1984). The functional morphology of Sinonovucufu constvictu with a discussion on the taxonomic status of the Novaculininae (Bivalvia). J. Zool. Lond. 202, 299–325. 10.1111/j.1469-7998.1984.tb05085.x [DOI] [Google Scholar]
  25. Niu D., Wang L., Sun F., Liu Z., Li J. (2013). Development of molecular resources for an intertidal clam, Sinonovacula constricta, using 454 transcriptome sequencing. PLoS ONE 8:e67456. 10.1371/journal.pone.0067456 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Parra G., Blanco E., Guigo R. (2000). GeneID in Drosophila. Genome Res. 10, 511–515. 10.1101/gr.10.4.511 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Parra G., Bradnam K., Korf I. (2007). CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067. 10.1093/bioinformatics/btm071 [DOI] [PubMed] [Google Scholar]
  28. Peng M., Liu X., Niu D., Ye B., Lan T., Dong Z., et al. (2019). Survival, growth and physiology of marine bivalve (Sinonovacula constricta) in long-term low-salt culture. Sci. Rep. 9:2819. 10.1038/s41598-019-39205-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Ran Z., Li Z., Yan X., Liao K., Kong F., Zhang L., et al. (2019). Chromosome-level genome assembly of the razor clam Sinonovacula constricta (Lamarck, 1818). Mol. Ecol. Resour. 19, 1647–1658. 10.1111/1755-0998.13086 [DOI] [PubMed] [Google Scholar]
  30. Ruan J., Li H. (2019). Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158. 10.1038/s41592-019-0669-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Salmela L., Rivals E. (2014). LoRDEC: accurate and efficient long read error correction. Bioinformatics 30, 3506–3514. 10.1093/bioinformatics/btu538 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Servant N., Varoquaux N., Lajoie B. R., Viara E., Chen C. J., Vert J. P., et al. (2015). HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259. 10.1186/s13059-015-0831-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Simakov O., Marletaz F., Cho S. J., Edsinger-Gonzales E., Havlak P., Hellsten U., et al. (2013). Insights into bilaterian evolution from three spiralian genomes. Nature 493, 526–531. 10.1038/nature11696 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Stamatakis A. (2014). RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313. 10.1093/bioinformatics/btu033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Stanke M., Keller O., Gunduz I., Hayes A., Waack S., Morgenstern B. (2006). AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439. 10.1093/nar/gkl200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Tarailo-Graovac M., Chen N. (2009). Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinforma. 25:4.10.1–4.10.14. 10.1002/0471250953.bi0410s25 [DOI] [PubMed] [Google Scholar]
  37. Trapnell C., Pachter L., Salzberg S. L. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111. 10.1093/bioinformatics/btp120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Trapnell C., Williams B. A., Pertea G., Mortazavi A., Kwan G., van Baren M. J., et al. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515. 10.1038/nbt.1621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Varshney R. K., Chen W., Li Y., Bharti A. K., Saxena R. K., Schlueter J. A., et al. (2011). Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat. Biotechnol. 30, 83–89. 10.1038/nbt.2022 [DOI] [PubMed] [Google Scholar]
  40. Walker B. J., Abeel T., Shea T., Priest M., Abouelliel A., Sakthikumar S., et al. (2014). Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9:e112963. 10.1371/journal.pone.0112963 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Wang J. X., Zhao X. F., Zhou L. H., Xiang J. H. (1998). Chromosome study of Sinonovacula constricta (Bivalvia). Oceanol. Limnol. Sinica 29, 191–196. [Google Scholar]
  42. Wang S., Zhang J., Jiao W., Li J., Xun X., Sun Y., et al. (2017). Scallop genome provides insights into evolution of bilaterian karyotype and development. Nat. Ecol. Evol. 1:0120. 10.1038/s41559-017-0120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Wang W. X., Xu Z. Z. (1997). Larval swimming and postlarval drifting behavior in the infaunal bivalve Sinonovacula constricta. Mar. Ecol. Prog. Ser. 148, 71–81. 10.3354/meps148071 [DOI] [Google Scholar]
  44. Waterhouse R. M., Seppey M., Simao F. A., Manni M., Ioannidis P., Klioutchnikov G., et al. (2017). BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol. Biol. Evol. 35, 543–548. 10.1093/molbev/msx319 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Xu Z., Wang H. (2007). LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268. 10.1093/nar/gkm286 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Yang Z. (1997). PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556. 10.1093/bioinformatics/13.5.555 [DOI] [PubMed] [Google Scholar]
  47. Yu G., Wang L. G., Han Y., He Q. Y. (2012). clusterProfiler: an R package for comparing biological themes among gene clusters. Omics 16, 284–287. 10.1089/omi.2011.0118 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The datasets generated for this study can be found in the NCBI under accession numbers WSYO00000000.1. The raw data of the PacBio, Illumina, and Hi-C sequencing for genome assembly are deposited at the NCBI Sequence Read Archive (SRA) with the accession numbers SRR10605359SRR10605381, SRR10583055, and SRR10586373. The raw data of transcriptome sequencing are available at the NCBI SRA with the accession numbers SRR10753889SRR10753895, SRR2162883, SRR2162887, SRR2162892, SRR2162895, SRR2162898, SRR2162902, SRR9937008SRR9937013, SRR10097413SRR10097424, SRR9959746SRR9959754, and SRR9943679SRR9943690.


Articles from Frontiers in Genetics are provided here courtesy of Frontiers Media SA

RESOURCES