First draft genome for the sand-hopper Trinorchestia longiramus

Ajit Kumar Patra; Oksung Chung; Ji Yong Yoo; Min Seop Kim; Moon Geun Yoon; Jeong-Hyeon Choi; Youngik Yang

doi:10.1038/s41597-020-0424-8

. 2020 Mar 9;7:85. doi: 10.1038/s41597-020-0424-8

First draft genome for the sand-hopper Trinorchestia longiramus

Ajit Kumar Patra ¹, Oksung Chung ², Ji Yong Yoo ³, Min Seop Kim ³, Moon Geun Yoon ³, Jeong-Hyeon Choi ³, Youngik Yang ^3,^✉

PMCID: PMC7062882 PMID: 32152293

Abstract

Crustacean amphipods are important trophic links between primary producers and higher consumers. Although most amphipods occur in or around aquatic environments, the family Talitridae is the only family found in terrestrial and semi-terrestrial habitats. The sand-hopper Trinorchestia longiramus is a talitrid species often found in the sandy beaches of South Korea. In this study, we present the first draft genome assembly and annotation of this species. We generated ~380.3 Gb of sequencing data assembled in a 0.89 Gb draft genome. Annotation analysis estimated 26,080 protein-coding genes, with 89.9% genome completeness. Comparison with other amphipods showed that T. longiramus has 327 unique orthologous gene clusters, many of which are expanded gene families responsible for cellular transport of toxic substances, homeostatic processes, and ionic and osmotic stress tolerance. This first talitrid genome will be useful for further understanding the mechanisms of adaptation in terrestrial environments, the effects of heavy metal toxicity, as well as for studies of comparative genomic variation across amphipods.

Subject terms: Phylogenetics, DNA sequencing, Evolutionary ecology, Sequence annotation, Genome

Measurement(s)	DNA • RNA • sequence_assembly • sequence feature annotation
Technology Type(s)	DNA sequencing • RNA sequencing • sequence assembly process • sequence annotation
Sample Characteristic - Organism	Trinorchestia longiramus

Open in a new tab

Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.11897430

Background & Summary

Amphipoda is an order of malacostracan crustaceans, composed of more than 228 families with over 10,200 species¹. Most members of Amphipoda are found in aquatic environments, with both freshwater and marine species that occur in diverse habitats^2–6. However, only a few amphipods in the family Talitridae are found in terrestrial regions close to the water, and others are “semi-terrestrial,” with both littoral and terrestrial representatives⁷.

Talitrids are one of the prevailing macrofaunal groups in coastal regions that live along the interface between the water and land. The coastal talitrids, also known as “sand-hoppers,” are considered key species for energy flow to higher trophic levels⁸. They play a crucial role in food web dynamics by feeding on algal-biomass⁹ and detritus along sandy beaches. They then become the source of food for many invertebrates, fish, and birds^4,8. Unfortunately, anthropogenic activity contributes to various types of pollutants in the coastal ecosystem, which impacts the survival of talitrids^10–12 and other macrofauna^13–15. For this reason, many talitrids are used as model organisms for studies of metal toxicity^10–12. In addition, previous work on talitrids examined levels of genetic variation^16,17, behavioral adaptations¹⁸, osmoregulation¹⁹, and orientation studies²⁰. Most of these studies were carried out along the North Sea and the Mediterranean Sea regions.

Despite such biological and ecological significance, no genome studies have been performed on any talitrid species, and only three genomes have been studied among the entire amphipod order. These included (1) Eulimnogammarus verrucosus (Family: Eulimnogammaridae)²¹, a freshwater amphipod from Baikal Lake; (2) Hyalella azteca (Family: Hyalellidae)²², another freshwater amphipod that lives by burrowing in the sediments; and (3) Parhyale hawaiensis (Family: Hyalidae)²³. Trinorchestia longiramus Jo, 1988²⁴ is in the family Talitridae and is highly abundant in sandy beaches of South Korea^24–26 and Japan²⁷. Because of its widespread range, simplicity to rear in the laboratory, and relatively small genome size, T. longiramus can be a useful model organism for developmental biology, ecology, evolution, and studies of metal bioaccumulation.

In this study, we present the first draft genome of T. longiramus using high-throughput sequencing. We isolated genomic DNA from whole tissues, constructed two paired-end (PE) and four mate pair (MP) libraries, which were then sequenced with the Illumina HiSeq. 2500 platform. The estimated genome size of T. longiramus is ~1.116 Gb. The draft genome was assembled into 30,897 scaffolds (N50 = 120.57 kb), with a total size of 0.89 Gb, which corresponds to approximately 79.43% of the estimated genome size. Structural annotation of the genome yielded 26,080 genes. BUSCO analysis revealed gene space completeness of 89.9%. Of the total genes predicted, 14,959 genes were functionally annotated with InterProScan²⁸. The lineage containing T. longiramus reveals gene expansion of particular gene families, including those related to response to stress, homeostatic process, transmembrane transport, and signal transduction. A phylogenetic analysis with related amphipod and arthropod species suggests that T. longiramus diverged from the H. azteca during the Late Cenozoic era. This first talitrid genome will be useful for further understanding the mechanisms of adaptation in terrestrial environments, the effects of heavy metal toxicity, as well as for studies of comparative genomic variation across amphipods.

Methods

Sample collection and extraction of DNA and RNA

T. longiramus samples were collected from the coast (37°41′29″N, 129°2′2.7″E) of South Korea. They were captured by hand from exposed and sheltered sandy beaches. Samples were preserved immediately in 95% ethanol for genome sequencing and stored in liquid nitrogen for RNA extraction.

DNA was extracted from a pool of seven individuals using a conventional phenol-chloroform protocol²⁹. The purified DNA was resuspended in Tris-EDTA (TE) buffer (TE; 10 mM Tris–HCl, 1 mM EDTA, pH 7.5). For RNA isolation, several frozen whole bodies were mortar-pulverized in liquid nitrogen. The purified RNA was extracted in lysis buffer, containing 35 mM EDTA, 0.7 M LiCl, 7.0% SDS, and 200 mM Tris–Cl (pH 9.0), following the protocol by Woo et al.³⁰. The purified RNA was eluted in DEPC-treated water and stored at −20 °C.

Short and long DNA fragment library construction

Two PE libraries were prepared with insert size 350 bp using the TruSeq DNA Sample Prep kit (Illumina). In addition, four MP libraries were prepared with insert sizes 3, 5, 8, and 10 kb using the Nextera Mate Pair Sample Preparation kit (Illumina). All libraries were sequenced on an Illumina HiSeq. 2500 instrument, with 251 bp reads for the PE libraries and 101 bp reads for the MP libraries. We generated a total of 592,854,944 (149 Gbp) PE reads and 2,291,660,676 (231 Gbp) MP reads (Table 1).

Table 1.

Sequence libraries and data yield from Illumina DNA and RNA sequencing.

	Library type	Insert Size (bp)	Read Length (bp)	Raw bases (Gb)	Raw reads	SRA accessions
DNA	Paired-end (PE)	350	251	37.616	149,863,175	SRR9098167
		350	251	37.616	149,863,175	SRR9098167
		350	251	36.788	146,564,297	SRR9098168
		350	251	36.788	146,564,297	SRR9098168
	Total			148.808	592,854,944
	Mate-pair (MP)	3 K	101	28.942	286,552,798	SRR9098169
		3 K	101	28.942	286,552,798	SRR9098169
		5 K	101	29.710	294,156,030	SRR9098170
		5 K	101	29.710	294,156,030	SRR9098170
		8 K	101	27.904	276,279,897	SRR9098171
		8 K	101	27.904	276,279,897	SRR9098171
		10 K	101	29.173	288,841,613	SRR9098172
		10 K	101	29.173	288,841,613	SRR9098172
	Total			231.458	2,291,660,676
RNA	PE	140	101	6.204	61,429,733	SRR9112990
	PE	140	101	6.204	61,429,733	SRR9112990
	Total			12.408	122,859,466

Open in a new tab

RNA short fragment and PacBio Iso-seq sequencing

For short fragment sequencing, a PE library was prepared with the Truseq mRNA Prep kit (Illumina) from total mRNA, which was subsequently sequenced on an Illumina Hiseq. 2500 with read lengths of 101 bp (Table 1). A total of 122,859,466 (12 Gbp) PE reads were sequenced.

For PacBio Iso-Seq sequencing, three sequencing libraries (1–2, 2–3, and 3–6 kb) were prepared from polyA+ RNA according to the PacBio ISO-sequencing protocol. A total of six Single-Molecule Real-Time cells were run on a PacBio RS II system by DNALink Co. From a total of 350,860 reads, 72,517 high-quality transcripts were generated (Table 2).

Table 2.

Sequencing libraries and data yields from PacBio RNA sequencing.

Library size (Kb)	Average read Length (bp)	Raw bases (Gb)	Raw reads	Polished high-quality isoforms	SRA accession
1–2	1,238	0.027	21,522	72,517	SRR9112991
1–2	2,070	0.219	105,671
2–3	2,209	0.070	31,546
2–3	2,522	0.251	99,339
3–6	2,810	0.029	10,278
3–6	3,656	0.302	82,504
Total	2,418	0.896	350,860

Open in a new tab

k-mer distribution and genome size estimation

Prior to estimating the genomic size, we processed raw reads as follows. We discarded low-quality (<Q20) PE reads and those that contained the Truseq index and universal adapters. We then merged the high-quality PE reads using FLASH³¹, with default options to avoid double counting of overlapping reads. The estimated genome size of T. longiramus was ~1.116 Gb based on a k-mer distribution (K = 17) analysis run with JELLYFISH³². The main peak exists at k-mer depth 42, which was used for genome size estimation (Fig. 1).

Fig. 1 — Genome size estimation by k-mer distribution.

Genome assembly

Assembly, adapters, low-quality reads, and uncalled bases were trimmed from PE and MP raw reads using Platanus_trim and Plantanus_internal_trim, respectively. Initial assembly was performed with Platanus³³ based on automatically optimized multiple k-mer values. We executed individual commands “assemble,” “scaffold,” and “gap_close” in the Platanus assembler suite, successively. For the “assemble” stage, we assigned the maximum memory usages as 2,048 G, but all the other stages were executed with default options. Scaffolds larger than 1,000 bp in length scaffolded using trimmed PE and MP reads in SSPACE³⁴ (Fig. 2). Finally, we filtered out two bacterial sequences with more than 500 BLASTN bit scores of 90% alignment coverage identified in MEGAN³⁵. We re-confirmed using BLASTX with a non-redundant database in DIAMOND³⁶. Table 3 shows the assembly statistics for Platanus, SSPACE, and the final assembly.

Fig. 2 — T. *longiramus* genome assembly and gene prediction workflow.

Table 3.

Statistics of the T. longiramus genome assembly.

	Platanus	SSPACE	Final
Scaffolds	1,025,695	30,899	30,897
Scaffolds (>1000)	63,362	30,899	30,897
Total Length	1,022,727,337	886,386,416	886,359,443
Total Length (>1000)	828,517,177	886,386,416	886,359,443
Maximum length	1,019,543	1,680,077	1,680,077
N50	74,013	120,570	120,570
Gap	16,045,251	73,899,800	73,869,646

Open in a new tab

Repeat annotation

To annotate repetitive elements, we first identified tandem repeats using the Tandem Repeats Finder³⁷. Transposable elements (TEs) were identified by combining de novo (RepeatModeler)³⁸ and homology-based approaches (Repbase³⁹, RepeatMasker⁴⁰, and RMBlast⁴⁰). TEs accounted for 20.35% of the genome, with tandem repeats accounting for the largest portion (6.18%) (Table 4).

Table 4.

Statistics of repetitive elements.

	Total (bp)	% of genome
DNA	45,354,677	5.12
LINE	23,869,606	2.70
LTR	11,269,516	1.27
Low_complexity	1,202,626	0.14
SINE	163,811	0.02
Satellite	308,670	0.03
Simple_repeat	10,854,020	1.22
TandemRepeat	54,776,419	6.18
Unknown	48,880,228	5.51
Unspecified	397,465	0.04
Total	180,352,209	20.35

Open in a new tab

Gene prediction and annotation

The protein-coding genes were predicted by combining ab initio and homology-based gene prediction methods (Fig. 2). For the ab initio gene prediction, BRAKER⁴¹ predicted 67,698 genes, which incorporated outputs from GeneMark-ET⁴² and AUGUSTUS⁴³. GeneMark-ET predicts genes with unsupervised training, whereas AUGUSTUS predicts genes with supervised training based on intron and protein hints. We generated two hint files from an Illumina RNA-seq and PacBio ISO-seq. Tophat⁴⁴ was used to align RNA-seq reads to the repeat-masked genome assembly. We proceeded with Iso-seq to obtain the protein sequences, as described in Minoche et al.⁴⁵: (1) run LSC⁴⁶ to correct errors for full-length transcripts, (2) align the corrected transcripts to the genome using GMAP⁴⁷, and (3) generate gene models from aligned sequences and extract the protein sequence from the generated gene model using Transdecoder⁴⁸. We obtained 1,573 protein sequences, which were used to generate protein hints for AUGUSTUS by running Exonerate⁴⁹. To remove incomplete gene sequences from genes predicted by BRAKER, we filtered out the predicted coding sequences (CDSs) using the following two criteria: 1) CDSs that contained premature stop codons and (2) CDSs that were not supported by hints. Finally, a total of 23,985 protein-coding genes were estimated by ab initio prediction (Table 5).

Table 5.

Statistics of predicted protein-coding genes.

	Number	Average transcript length (bp)	Average CDS length (bp)	Average intron length (bp)
De novo	23,985	8,060.4	242.1	1,616.3
Homology	9,913	7,836.5	200.3	1,744.8
Merged	26,080	7,720.7	242.9	1,744.8

Open in a new tab

For the homology gene predictions, we searched the assembly of T. longiramus against Daphnia pulex, Drosophila melanogaster, Folsomia candida, H. azteca, Lepeophtheirus salmonis, Parasteatoda tepidariorum, P. hawaiensis, and arthropoda in orthoDB using TBLASTN⁵⁰ with an E-value cutoff of 1E-5. Matching sequences were clustered using GenBlastA⁵¹, and only best-matched regions were retained. Then, gene models were predicted using Exonerate⁴⁹. Predicted gene sequences that did not meet the above criteria were discarded. As a result, a total of 9,913 genes were predicted by a homology-based approach (Table 5).

Finally, we combined the two outputs by placing homology predictions to ab initio prediction only when there is no conflict. As a result, 26,080 protein-coding genes were predicted for the T. longiramus draft genome (Table 5). Gene Ontology for the predicted genes were annotated using InterProScan with various databases⁵², including Hamap⁵³, Pfam⁵⁴, PIRSF⁵⁵, PRINTS⁵⁶, ProDom⁵⁷, PROSITE⁵⁸, SUPERFAMILY⁵⁹, and TIGRFAM⁶⁰ (Gene Ontology annotation of T. longiramus)⁶¹.

Data Records

All DNA and RNA raw reads have been deposited in the NCBI SRA (Table 1) under the SRA study accession SRP199018⁶². The whole genome shotgun sequencing project was deposited in GenBank under accession VCRD01000000⁶³. In addition, the assembled genome was submitted to NCBI Assembly and is available with accession no. GCA_006783055.1⁶⁴. Gene Ontology annotation table has been deposited to Figshare⁶¹ 10.6084/m9.figshare.8217854.

Technical Validation

DNA and RNA sample quality

DNA quality was assessed using Nanodrop, 1% agarose gels, Qubit fluorometer, and the Qubit HS DNA assay reagents. The RNA integrity was assessed using Nanodrop and an Agilent 2100 Bioanalyzer electrophoresis system (Agilent, Santa Clara, CA, USA).

Illumina libraries

Ready-to-sequence Illumina libraries were quantified by qPCR using the SYBR Green PCR Master Mix (Applied Biosystems), and library profiles were evaluated with an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA).

Genome assembly and gene prediction quality assessment

The length statistics of the genome assembly were assessed by QUAST⁶⁵. The total assembly length is 0.89 Gb, which corresponds to 79.43% of the estimated genome size. The final scaffold N50 is 120.57 kb (Table 3). Genome completeness was evaluated using BUSCO⁶⁶, with Arthropoda conserved genes databases. The genome assembly, after removing bacteria sequences from SSPACE, revealed a complete BUSCO value of 88.3%. However, in predicted genes, BUSCO completeness was higher (89.9%) (Table 6).

Table 6.

BUSCO assessment of genome assembly and gene prediction.

Genome assembly	# Scaffolds	BUSCO (Arthropoda)
Platanus	63,362	C:86.0%[S:84.3%,D:1.7%],F:6.3%,M:7.7%,n:1066
SSPACE	30,899	C:88.3%[S:86.8%,D:1.5%],F:4.5%,M:7.2%,n:1066
Final	30,897	C:88.3%[S:86.8%,D:1.5%],F:4.5%,M:7.2%,n:1066
Gene prediction	# Genes
Final	26,080	C:89.9%[S:85.3%,D:4.6%],F:6.6%,M:3.5%,n:1066

Open in a new tab

Comparison with other arthropod genomes

We performed an extensive comparison of orthologous genes among 12 arthropod genomes (Trinorchestia longiramus, Daphnia pulex, Drosophila melanogaster, Folsomia candida, H. azteca, Lepeophtheirus salmonis, Parasteatoda tepidariorum, P. hawaiensis, Oithona nana, Eulimnadia texana, Strigamia maritima, and Tigriopus kingsejongensis) using OrthoMCL⁶⁷.

After orthologous gene clustering, 490 single-copy protein sequences were aligned using MUSCLE⁶⁸. Low alignment quality regions were filtered using trimAl⁶⁹. A phylogenetic tree was constructed using RAxML⁷⁰, with the PROTGAMMAJTT model (100 bootstrap replicates). Divergence time was calculated using MEGA7⁷¹ with the Jones–Taylor–Thornton model and the previously determined topology (Fig. 3a). Calibration times of Parasteatoda–Drosophila divergence (601 MYA) and Strigamia–Drosophila divergence (583 MYA) were taken from the TimeTree database⁷². We found that T. longiramus diverged from H. azteca during the Early Cenozoic era, approximately 55 million years ago.

Fig. 3 — Comparison of orthologous genes. (a) Gene family expansion and contraction in arthropod species. Numbers designate the gene families that have expanded (green) and contracted (red) after the split from the common ancestor. Divergence time is scaled in millions of years. (b) A Venn diagram of unique and shared orthologous gene clusters in T. *longiramus*, P. *hawaiensis*, and H. *azteca*.

A gene expansion and contraction analysis was conducted using the CAFE program⁷³ with the estimated phylogenetic information. A total of 122 gene families have expanded, and 388 gene families were contracted in T. longiramus. Fisher’s exact test (p-value ≤ 0.05) was used to identify functionally enriched categories among expanded genes relative to the “genome background,” as annotated by Pfam (Supplementary Table 1). We observed that gene families associated with transferring glycosyl and acyl groups, ATPase activity, response to stress, homeostatic process, and transmembrane transport have expanded. Among transmembrane transport activities, we found that sodium/hydrogen exchanger genes were responsible for a wide range of cellular functions, such as cation movement, homeostasis, regulation of pH, and tolerating ionic and osmotic stress⁷⁴. We also found several genes, such as ABC transporters responsible for efflux toxicants out of the cells⁷⁵, sodium-independent organic anion transporter required for uptake of organic amphipathic compounds, and xenobiotic drugs⁷⁶.

A Venn diagram of orthologous gene clusters was drawn on the basis of the protein sequences from T. longiramus (26,080 proteins) and two amphipods: H. azteca (17,509 proteins) and P. hawaiensis (28,617 proteins) (Fig. 3b). T. longiramus has 327 unique orthologous gene clusters found among these three genomes. Among these unique gene clusters, the top three gene clusters are DNA- and RNA-mediated transposition, iron ion binding, and DNA metabolic process. Several unique genes also were found in expanded gene families mentioned above (Supplementary Table 1).

Usage Notes

All analyses were conducted on Linux systems, and optimal parameters are given in the Code availability section.

Supplementary information

Supplementary Table 1^{(16.6KB, xlsx)}

Acknowledgements

This study was financially supported by the National Marine Biodiversity Institute of Korea Research Program (2020M00100 and 2020M00600).

Author contributions

Y.Y. and M.G.Y. conceived concept, M.S.K. and M.G.Y. provided the sample, Y.Y. and J.H.C. designed the experiments, A.K.P., O.C., J.Y.Y. and Y.Y. analyzed the genomic data, A.K.P., O.C. and Y.Y. wrote the paper. All authors reviewed the manuscript.

Code availability

The software versions, settings, and parameters are described in Table 7. If not mentioned otherwise, the command line at each step was executed using default settings.

Table 7.

A list of software and parameters used for genome analysis.

Softwares	Version	Parameters/Commands
FLASH	1.2.11	default
JELLYFISH	2.2.6	-C -m 17
Platanus trim	1.0.7	platanus_trim (for PE reads), platanus_internal_trim (for MP reads)
Platanus	1.2.4	step-1: assemble -m 2048, step-2: scaffold, step-3: gap_close
SSPACE Standard	3.0	default
DIAMOND	0.9.24	default
MEGAN	6.15.2	default
QUAST	4.5	default
BUSCO	3.0.2	-l arthropoda_odb9
RepeatMasker	4.0.7	-e ncbi -pa 4
RepeatModeler	1.0.10	-engine ncbi -pa 4
LSC	2.0	default
GMAP	2018-07-04	-B 5
derive-gene-models-from-PacBio.pl		default
TransDecoder	3.0.1	step-1: TransDecoder.LongOrfs, step-2: TransDecoder.Predict
Tophat	2.1.1	–microexon-search–mate-std-dev 26–mate-inner-dist 38–min-intron-length 30–min-coverage-intron 30–min-segment-intron 30
GenBlastA	1.0.4	-p T -e 1e-5 -g T -f F -a 0.5 -d 100000 -r 100 -c 0.01 -s -100
Exonerate	2.2.0	–model protein2genome –percent 30 –showvulgar no –showalignment yes–showquerygff no –showtargetgff yes –targetchunkid 1–targetchunktotal 100
BRAKER	2.0	–species = T. longiramus – AUGUSTUS_CONFIG_PATH = augustus/config – AUGUSTUS_BIN_PATH = augustus/bin – AUGUSTUS_SCRIPTS_PATH = augustus/scripts – GENEMARK_PATH = gm_et/gmes_petap – bam = tophat/accepted_hits.bam–prot_seq = PacBio-derived.gene-models.transdecoder.pep.fasta –alternatives-from-evidence = true –prg = exonerate
InterProscan	5.16–55.0	-appl HAMAP,ProDom,PRINTS,Pfam,TIGRFAM,SUPERFAMILY,ProSitePatterns,ProSiteProfiles -goterms -iprlookup
OrthoMCL	2.0.9	-I 1.5
MUSCLE	3.8.31	default
ETE	3.1.1	trimal -gappyout
RAxML	8.2.10	-m PROTGAMMAJTT
MEGA	7.00	megacc
CAFE	4.0	default

Open in a new tab

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

is available for this paper at 10.1038/s41597-020-0424-8.

References

1.Horton, T., Lowry, J. & De Broyer, C. World amphipoda database, http://www.marinespecies.org/amphipoda (2017).
2.Copilaș‐Ciocianu D, Zimța AA, Petrusek A. Integrative taxonomy reveals a new Gammarus species (Crustacea, Amphipoda) surviving in a previously unknown southeast European glacial refugium. J. Zool. Syst. and Evol. Res. 2019;57:272–297. doi: 10.1111/jzs.12248. [DOI] [Google Scholar]
3.Holsinger JR. Pattern and process in the biogeography of subterranean amphipods. Hydrobiologia. 1994;287:131–145. doi: 10.1007/BF00006902. [DOI] [Google Scholar]
4.Jelassi R, Khemaissia H, Zimmer M, Garbe-Schönberg D, Nasri-Ammar K. Biodiversity of Talitridae family (Crustacea, Amphipoda) in some Tunisian coastal lagoons. Zool. Stud. 2015;54:17. doi: 10.1186/s40555-014-0096-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Romanova EV, et al. Evolution of mitochondrial genomes in Baikalian amphipods. BMC Genomics. 2016;17:1016. doi: 10.1186/s12864-016-3357-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Tomikawa K, Nakano T. Two new subterranean species of Pseudocrangonyx Akatsuka & Komai, 1922 (Amphipoda: Crangonyctoidea: Pseudocrangonyctidae), with an insight into groundwater faunal relationships in western Japan. J. Crustacean Biol. 2018;38:460–474. doi: 10.1093/jcbiol/ruy031. [DOI] [Google Scholar]
7.Wildish D. Reproductive consequences of the terrestrial habit in Orchestia (Crustacea: Amphipoda) Int. J. Invert. Reprod. 1979;1:9–20. doi: 10.1080/01651269.1979.10553295. [DOI] [Google Scholar]
8.Griffiths, C., Stenton-Dozey, J. & Koop, K. In Sandy beaches as ecosystems 547–556 (Springer, 1983).
9.Duarte C, Navarro J, Acuña K, Gómez I. Feeding preferences of the sandhopper Orchestoidea tuberculata: the importance of algal traits. Hydrobiologia. 2010;651:291–303. doi: 10.1007/s10750-010-0309-5. [DOI] [Google Scholar]
10.Rainbow P, Malik I, O’brien P. Physicochemical and physiological effects on the uptake of dissolved zinc and cadmium by the amphipod crustacean Orchestia gammarellus. Aquat. Toxicol. 1993;25:15–30. doi: 10.1016/0166-445X(93)90017-U. [DOI] [Google Scholar]
11.Casini S, Depledge M. Influence of copper, zinc, and iron on cadmium accumulation in the talitrid amphipod, Platorchestia platensis. Bull. Environ. Contam. and Toxicol. 1997;59:500–506. doi: 10.1007/s001289900506. [DOI] [PubMed] [Google Scholar]
12.Ungherese G, et al. Relationship between heavy metals pollution and genetic diversity in Mediterranean populations of the sandhopper Talitrus saltator (Montagu)(Crustacea, Amphipoda) Environ. Pollut. 2010;158:1638–1643. doi: 10.1016/j.envpol.2009.12.007. [DOI] [PubMed] [Google Scholar]
13.Bickham JW, Sandhu S, Hebert PD, Chikhi L, Athwal R. Effects of chemical contaminants on genetic diversity in natural populations: implications for biomonitoring and ecotoxicology. Mutat. Res. 2000;463:33–51. doi: 10.1016/S1383-5742(00)00004-1. [DOI] [PubMed] [Google Scholar]
14.De Wolf H, Blust R, Backeljau T. The population genetic structure of Littorina littorea (Mollusca: Gastropoda) along a pollution gradient in the Scheldt estuary (The Netherlands) using RAPD analysis. Sci. Total Environ. 2004;325:59–69. doi: 10.1016/j.scitotenv.2003.11.004. [DOI] [PubMed] [Google Scholar]
15.Mohapatra A, Rautray T, Patra AK, Vijayan V, Mohanty RK. Elemental composition in mud crab Scylla serrata from Mahanadi estuary, India: in situ irradiation analysis by external PIXE. Food Chem. Toxicol. 2009;47:119–123. doi: 10.1016/j.fct.2008.10.016. [DOI] [PubMed] [Google Scholar]
16.Pavesi L, Tiedemann R, De Matthaeis E, Ketmaier V. Genetic connectivity between land and sea: the case of the beachflea Orchestia montagui (Crustacea, Amphipoda, Talitridae) in the Mediterranean Sea. Front. Zool. 2013;10:21. doi: 10.1186/1742-9994-10-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Ketmaier V, Matthaeis ED, Fanini L, Rossano C, Scapini F. Variation of genetic and behavioural traits in the sandhopper Talitrus saltator (Crustacea Amphipoda) along a dynamic sand beach. Ethol. Ecol. Evol. 2010;22:17–35. doi: 10.1080/03949370903515919. [DOI] [Google Scholar]
18.Fanini L, Marchetti GM, Baczewska A, Sztybor K, Scapini F. Behavioural adaptation to different salinities in the sandhopper Talitrus saltator (Crustacea: Amphipoda): Mediterranean vs Baltic populations. Mar. Freshwat. Res. 2012;63:275–281. doi: 10.1071/MF11127. [DOI] [Google Scholar]
19.Ugolini A, Cincinelli A, Martellini T, Doumett S. Salt concentration and solar orientation in two supralittoral sandhoppers: Talitrus saltator (Montagu) and Talorchestia ugolinii Bellan Santini and Ruffo. J. Comp. Physiol. A. 2015;201:455–460. doi: 10.1007/s00359-015-0992-9. [DOI] [PubMed] [Google Scholar]
20.Nourisson D, Scapini F. Seasonal variation in the orientation of Talitrus saltator on a Mediterranean sandy beach: an ecological interpretation. Ethol. Ecol. Evol. 2015;27:277–293. doi: 10.1080/03949370.2014.946538. [DOI] [Google Scholar]
21.Rivarola‐Duarte L, et al. A first glimpse at the genome of the Baikalian amphipod Eulimnogammarus verrucosus. J. Exp. Zool. B: Mol. Dev. Evol. 2014;322:177–189. doi: 10.1002/jez.b.22560. [DOI] [PubMed] [Google Scholar]
22.Poynton HC, et al. The Toxicogenome of Hyalella azteca: A Model for Sediment Ecotoxicology and Evolutionary Toxicology. Environ. Sci. Technol. 2018;52:6009–6022. doi: 10.1021/acs.est.8b00837. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Zeng V, et al. De novo assembly and characterization of a maternal and developmental transcriptome for the emerging model crustacean Parhyale hawaiensis. BMC Genomics. 2011;12:581. doi: 10.1186/1471-2164-12-581. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Yo YWT. (Crustacea–Amphipoda) of the Korean coasts. Beaufortia. 1988;38:153–178. [Google Scholar]
25.Kumar Patra A, et al. The complete mitochondrial genome of the sand-hopper Trinorchestia longiramus (Amphipoda: Talitridae) Mitochon. DNA B. 2019;4:2104–2105. doi: 10.1080/23802359.2019.1623100. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Woo J, et al. Demographic history of Trinorchestia longiramus (Amphipoda, Talitridae) in South Korea inferred from mitochondrial DNA sequence variation. Crustaceana. 2016;89:1559–1573. doi: 10.1163/15685403-00003608. [DOI] [Google Scholar]
27.Sasago, Y. Study for distribution and molecular phylogenetic analysis of the talitrid amphipods in Japan, M. Sc. Thesis, Mie University, Tsu, (2011).
28.Quevillon E, et al. InterProScan: protein domains identifier. Nucleic Acids Res. 2005;33:W116–W120. doi: 10.1093/nar/gki442. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Sambrook, J., Fritsch, E. F. & Maniatis, T. Molecular Cloning: a laboratory manual. (Cold Spring Harbor Laboratory Press, 1989).
30.Woo S, et al. Efficient isolation of intact RNA from the soft coral Scleronephthya gracillimum (Kükenthal) for gene expression analyses. Integr. Biosci. 2005;9:205–209. doi: 10.1080/17386357.2005.9647272. [DOI] [Google Scholar]
31.Magoč T, Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27:2957–2963. doi: 10.1093/bioinformatics/btr507. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–770. doi: 10.1093/bioinformatics/btr011. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Kajitani R, et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 2014;24:1384–1395. doi: 10.1101/gr.170720.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2010;27:578–579. doi: 10.1093/bioinformatics/btq683. [DOI] [PubMed] [Google Scholar]
35.Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res. 2007;17:377–386. doi: 10.1101/gr.5969107. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 2015;12:59. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
37.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Abrusán G, Grundmann N, DeMester L, Makalowski W. TEclass—a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics. 2009;25:1329–1330. doi: 10.1093/bioinformatics/btp084. [DOI] [PubMed] [Google Scholar]
39.Jurka J, et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 2005;110:462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]
40.Bedell JA, Korf I, Gish W. MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics. 2000;16:1040–1041. doi: 10.1093/bioinformatics/16.11.1040. [DOI] [PubMed] [Google Scholar]
41.Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2016;32:767–769. doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Lomsadze A, Burns PD, Borodovsky M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res. 2014;42:e119–e119. doi: 10.1093/nar/gku557. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008;24:637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]
44.Kim D, et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Minoche AE, et al. Exploiting single-molecule transcript sequencing for eukaryotic gene prediction. Genome Biol. 2015;16:184. doi: 10.1186/s13059-015-0729-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Au KF, Underwood JG, Lee L, Wong WH. Improving PacBio long read accuracy by short read alignment. Plos One. 2012;7:e46679. doi: 10.1371/journal.pone.0046679. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Wu TD, Watanabe CK. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005;21:1859–1875. doi: 10.1093/bioinformatics/bti310. [DOI] [PubMed] [Google Scholar]
48.Haas BJ, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8:1494. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Slater GSC, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31. doi: 10.1186/1471-2105-6-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.She R, Chu JS-C, Wang K, Pei J, Chen N. GenBlastA: enabling BLAST to identify homologous gene sequences. Genome Res. 2009;19:143–149. doi: 10.1101/gr.082081.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Jones P, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Lima T, et al. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. 2008;37:D471–D478. doi: 10.1093/nar/gkn661. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Punta M, et al. The Pfam protein families database. Nucleic Acids Res. 2011;40:D290–D301. doi: 10.1093/nar/gkr1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Nikolskaya AN, Arighi CN, Huang H, Barker WC, Wu CH. PIRSF family classification system for protein functional and evolutionary analysis. Evol. Bioinform. 2006;2:117693430600200033. doi: 10.1177/117693430600200033. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Attwood TK, et al. PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res. 2000;28:225–227. doi: 10.1093/nar/28.1.225. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Bru C, et al. The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 2005;33:D212–D215. doi: 10.1093/nar/gki034. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Sigrist CJ, et al. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 2009;38:D161–D166. doi: 10.1093/nar/gkp885. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Madera M, Vogel C, Kummerfeld SK, Chothia C, Gough J. The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res. 2004;32:D235–D239. doi: 10.1093/nar/gkh117. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Haft DH, et al. TIGRFAMs and genome properties in 2013. Nucleic Acids Res. 2012;41:D387–D395. doi: 10.1093/nar/gks1234. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Patra AK, 2020. First draft genome for the sand-hopper Trinorchestia longiramus. figshare. [DOI] [PMC free article] [PubMed]
62.2019. NCBI Sequence Read Archive. SRP199018
63.Patra AK, 2020. Trinorchestia longiramus isolate TLONG-mixed, whole genome shotgun sequencing project. GenBank. VCRD00000000
64.2019. NCBI Assembly. GCA_006783055.1
65.Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
67.Li L, Stoeckert CJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. doi: 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. doi: 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 2016;33:1870–1874. doi: 10.1093/molbev/msw054. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Hedges SB, Dudley J, Kumar S. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics. 2006;22:2971–2972. doi: 10.1093/bioinformatics/btl505. [DOI] [PubMed] [Google Scholar]
73.Han MV, Thomas GW, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol. 2013;30:1987–1997. doi: 10.1093/molbev/mst100. [DOI] [PubMed] [Google Scholar]
74.Francia ME, et al. A Toxoplasma gondii protein with homology to intracellular type Na+/H+ exchangers is important for osmoregulation and invasion. Exp. Cell Res. 2011;317:1382–1396. doi: 10.1016/j.yexcr.2011.03.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Dermauw W, Van Leeuwen T. The ABC gene family in arthropods: comparative genomics and role in insecticide transport and resistance. Insect Biochem. Mol. Biol. 2014;45:89–110. doi: 10.1016/j.ibmb.2013.11.001. [DOI] [PubMed] [Google Scholar]
76.Radulović Ž, Porter LM, Kim TK, Mulenga A. Comparative bioinformatics, temporal and spatial expression analyses of Ixodes scapularis organic anion transporting polypeptides. Ticks Tick Borne Dis. 2014;5:287–298. doi: 10.1016/j.ttbdis.2013.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Patra AK, 2020. First draft genome for the sand-hopper Trinorchestia longiramus. figshare. [DOI] [PMC free article] [PubMed]
2019. NCBI Sequence Read Archive. SRP199018
Patra AK, 2020. Trinorchestia longiramus isolate TLONG-mixed, whole genome shotgun sequencing project. GenBank. VCRD00000000
2019. NCBI Assembly. GCA_006783055.1

Supplementary Materials

Supplementary Table 1^{(16.6KB, xlsx)}

Data Availability Statement

The software versions, settings, and parameters are described in Table 7. If not mentioned otherwise, the command line at each step was executed using default settings.

Table 7.

A list of software and parameters used for genome analysis.

Softwares	Version	Parameters/Commands
FLASH	1.2.11	default
JELLYFISH	2.2.6	-C -m 17
Platanus trim	1.0.7	platanus_trim (for PE reads), platanus_internal_trim (for MP reads)
Platanus	1.2.4	step-1: assemble -m 2048, step-2: scaffold, step-3: gap_close
SSPACE Standard	3.0	default
DIAMOND	0.9.24	default
MEGAN	6.15.2	default
QUAST	4.5	default
BUSCO	3.0.2	-l arthropoda_odb9
RepeatMasker	4.0.7	-e ncbi -pa 4
RepeatModeler	1.0.10	-engine ncbi -pa 4
LSC	2.0	default
GMAP	2018-07-04	-B 5
derive-gene-models-from-PacBio.pl		default
TransDecoder	3.0.1	step-1: TransDecoder.LongOrfs, step-2: TransDecoder.Predict
Tophat	2.1.1	–microexon-search–mate-std-dev 26–mate-inner-dist 38–min-intron-length 30–min-coverage-intron 30–min-segment-intron 30
GenBlastA	1.0.4	-p T -e 1e-5 -g T -f F -a 0.5 -d 100000 -r 100 -c 0.01 -s -100
Exonerate	2.2.0	–model protein2genome –percent 30 –showvulgar no –showalignment yes–showquerygff no –showtargetgff yes –targetchunkid 1–targetchunktotal 100
BRAKER	2.0	–species = T. longiramus – AUGUSTUS_CONFIG_PATH = augustus/config – AUGUSTUS_BIN_PATH = augustus/bin – AUGUSTUS_SCRIPTS_PATH = augustus/scripts – GENEMARK_PATH = gm_et/gmes_petap – bam = tophat/accepted_hits.bam–prot_seq = PacBio-derived.gene-models.transdecoder.pep.fasta –alternatives-from-evidence = true –prg = exonerate
InterProscan	5.16–55.0	-appl HAMAP,ProDom,PRINTS,Pfam,TIGRFAM,SUPERFAMILY,ProSitePatterns,ProSiteProfiles -goterms -iprlookup
OrthoMCL	2.0.9	-I 1.5
MUSCLE	3.8.31	default
ETE	3.1.1	trimal -gappyout
RAxML	8.2.10	-m PROTGAMMAJTT
MEGA	7.00	megacc
CAFE	4.0	default

Open in a new tab

[CR1] 1.Horton, T., Lowry, J. & De Broyer, C. World amphipoda database, http://www.marinespecies.org/amphipoda (2017).

[CR2] 2.Copilaș‐Ciocianu D, Zimța AA, Petrusek A. Integrative taxonomy reveals a new Gammarus species (Crustacea, Amphipoda) surviving in a previously unknown southeast European glacial refugium. J. Zool. Syst. and Evol. Res. 2019;57:272–297. doi: 10.1111/jzs.12248. [DOI] [Google Scholar]

[CR3] 3.Holsinger JR. Pattern and process in the biogeography of subterranean amphipods. Hydrobiologia. 1994;287:131–145. doi: 10.1007/BF00006902. [DOI] [Google Scholar]

[CR4] 4.Jelassi R, Khemaissia H, Zimmer M, Garbe-Schönberg D, Nasri-Ammar K. Biodiversity of Talitridae family (Crustacea, Amphipoda) in some Tunisian coastal lagoons. Zool. Stud. 2015;54:17. doi: 10.1186/s40555-014-0096-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Romanova EV, et al. Evolution of mitochondrial genomes in Baikalian amphipods. BMC Genomics. 2016;17:1016. doi: 10.1186/s12864-016-3357-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Tomikawa K, Nakano T. Two new subterranean species of Pseudocrangonyx Akatsuka & Komai, 1922 (Amphipoda: Crangonyctoidea: Pseudocrangonyctidae), with an insight into groundwater faunal relationships in western Japan. J. Crustacean Biol. 2018;38:460–474. doi: 10.1093/jcbiol/ruy031. [DOI] [Google Scholar]

[CR7] 7.Wildish D. Reproductive consequences of the terrestrial habit in Orchestia (Crustacea: Amphipoda) Int. J. Invert. Reprod. 1979;1:9–20. doi: 10.1080/01651269.1979.10553295. [DOI] [Google Scholar]

[CR8] 8.Griffiths, C., Stenton-Dozey, J. & Koop, K. In Sandy beaches as ecosystems 547–556 (Springer, 1983).

[CR9] 9.Duarte C, Navarro J, Acuña K, Gómez I. Feeding preferences of the sandhopper Orchestoidea tuberculata: the importance of algal traits. Hydrobiologia. 2010;651:291–303. doi: 10.1007/s10750-010-0309-5. [DOI] [Google Scholar]

[CR10] 10.Rainbow P, Malik I, O’brien P. Physicochemical and physiological effects on the uptake of dissolved zinc and cadmium by the amphipod crustacean Orchestia gammarellus. Aquat. Toxicol. 1993;25:15–30. doi: 10.1016/0166-445X(93)90017-U. [DOI] [Google Scholar]

[CR11] 11.Casini S, Depledge M. Influence of copper, zinc, and iron on cadmium accumulation in the talitrid amphipod, Platorchestia platensis. Bull. Environ. Contam. and Toxicol. 1997;59:500–506. doi: 10.1007/s001289900506. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Ungherese G, et al. Relationship between heavy metals pollution and genetic diversity in Mediterranean populations of the sandhopper Talitrus saltator (Montagu)(Crustacea, Amphipoda) Environ. Pollut. 2010;158:1638–1643. doi: 10.1016/j.envpol.2009.12.007. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Bickham JW, Sandhu S, Hebert PD, Chikhi L, Athwal R. Effects of chemical contaminants on genetic diversity in natural populations: implications for biomonitoring and ecotoxicology. Mutat. Res. 2000;463:33–51. doi: 10.1016/S1383-5742(00)00004-1. [DOI] [PubMed] [Google Scholar]

[CR14] 14.De Wolf H, Blust R, Backeljau T. The population genetic structure of Littorina littorea (Mollusca: Gastropoda) along a pollution gradient in the Scheldt estuary (The Netherlands) using RAPD analysis. Sci. Total Environ. 2004;325:59–69. doi: 10.1016/j.scitotenv.2003.11.004. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Mohapatra A, Rautray T, Patra AK, Vijayan V, Mohanty RK. Elemental composition in mud crab Scylla serrata from Mahanadi estuary, India: in situ irradiation analysis by external PIXE. Food Chem. Toxicol. 2009;47:119–123. doi: 10.1016/j.fct.2008.10.016. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Pavesi L, Tiedemann R, De Matthaeis E, Ketmaier V. Genetic connectivity between land and sea: the case of the beachflea Orchestia montagui (Crustacea, Amphipoda, Talitridae) in the Mediterranean Sea. Front. Zool. 2013;10:21. doi: 10.1186/1742-9994-10-21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Ketmaier V, Matthaeis ED, Fanini L, Rossano C, Scapini F. Variation of genetic and behavioural traits in the sandhopper Talitrus saltator (Crustacea Amphipoda) along a dynamic sand beach. Ethol. Ecol. Evol. 2010;22:17–35. doi: 10.1080/03949370903515919. [DOI] [Google Scholar]

[CR18] 18.Fanini L, Marchetti GM, Baczewska A, Sztybor K, Scapini F. Behavioural adaptation to different salinities in the sandhopper Talitrus saltator (Crustacea: Amphipoda): Mediterranean vs Baltic populations. Mar. Freshwat. Res. 2012;63:275–281. doi: 10.1071/MF11127. [DOI] [Google Scholar]

[CR19] 19.Ugolini A, Cincinelli A, Martellini T, Doumett S. Salt concentration and solar orientation in two supralittoral sandhoppers: Talitrus saltator (Montagu) and Talorchestia ugolinii Bellan Santini and Ruffo. J. Comp. Physiol. A. 2015;201:455–460. doi: 10.1007/s00359-015-0992-9. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Nourisson D, Scapini F. Seasonal variation in the orientation of Talitrus saltator on a Mediterranean sandy beach: an ecological interpretation. Ethol. Ecol. Evol. 2015;27:277–293. doi: 10.1080/03949370.2014.946538. [DOI] [Google Scholar]

[CR21] 21.Rivarola‐Duarte L, et al. A first glimpse at the genome of the Baikalian amphipod Eulimnogammarus verrucosus. J. Exp. Zool. B: Mol. Dev. Evol. 2014;322:177–189. doi: 10.1002/jez.b.22560. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Poynton HC, et al. The Toxicogenome of Hyalella azteca: A Model for Sediment Ecotoxicology and Evolutionary Toxicology. Environ. Sci. Technol. 2018;52:6009–6022. doi: 10.1021/acs.est.8b00837. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Zeng V, et al. De novo assembly and characterization of a maternal and developmental transcriptome for the emerging model crustacean Parhyale hawaiensis. BMC Genomics. 2011;12:581. doi: 10.1186/1471-2164-12-581. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Yo YWT. (Crustacea–Amphipoda) of the Korean coasts. Beaufortia. 1988;38:153–178. [Google Scholar]

[CR25] 25.Kumar Patra A, et al. The complete mitochondrial genome of the sand-hopper Trinorchestia longiramus (Amphipoda: Talitridae) Mitochon. DNA B. 2019;4:2104–2105. doi: 10.1080/23802359.2019.1623100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Woo J, et al. Demographic history of Trinorchestia longiramus (Amphipoda, Talitridae) in South Korea inferred from mitochondrial DNA sequence variation. Crustaceana. 2016;89:1559–1573. doi: 10.1163/15685403-00003608. [DOI] [Google Scholar]

[CR27] 27.Sasago, Y. Study for distribution and molecular phylogenetic analysis of the talitrid amphipods in Japan, M. Sc. Thesis, Mie University, Tsu, (2011).

[CR28] 28.Quevillon E, et al. InterProScan: protein domains identifier. Nucleic Acids Res. 2005;33:W116–W120. doi: 10.1093/nar/gki442. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Sambrook, J., Fritsch, E. F. & Maniatis, T. Molecular Cloning: a laboratory manual. (Cold Spring Harbor Laboratory Press, 1989).

[CR30] 30.Woo S, et al. Efficient isolation of intact RNA from the soft coral Scleronephthya gracillimum (Kükenthal) for gene expression analyses. Integr. Biosci. 2005;9:205–209. doi: 10.1080/17386357.2005.9647272. [DOI] [Google Scholar]

[CR31] 31.Magoč T, Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27:2957–2963. doi: 10.1093/bioinformatics/btr507. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–770. doi: 10.1093/bioinformatics/btr011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Kajitani R, et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 2014;24:1384–1395. doi: 10.1101/gr.170720.113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2010;27:578–579. doi: 10.1093/bioinformatics/btq683. [DOI] [PubMed] [Google Scholar]

[CR35] 35.Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res. 2007;17:377–386. doi: 10.1101/gr.5969107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 2015;12:59. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]

[CR37] 37.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Abrusán G, Grundmann N, DeMester L, Makalowski W. TEclass—a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics. 2009;25:1329–1330. doi: 10.1093/bioinformatics/btp084. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Jurka J, et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 2005;110:462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]

[CR40] 40.Bedell JA, Korf I, Gish W. MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics. 2000;16:1040–1041. doi: 10.1093/bioinformatics/16.11.1040. [DOI] [PubMed] [Google Scholar]

[CR41] 41.Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2016;32:767–769. doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Lomsadze A, Burns PD, Borodovsky M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res. 2014;42:e119–e119. doi: 10.1093/nar/gku557. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008;24:637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]

[CR44] 44.Kim D, et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Minoche AE, et al. Exploiting single-molecule transcript sequencing for eukaryotic gene prediction. Genome Biol. 2015;16:184. doi: 10.1186/s13059-015-0729-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Au KF, Underwood JG, Lee L, Wong WH. Improving PacBio long read accuracy by short read alignment. Plos One. 2012;7:e46679. doi: 10.1371/journal.pone.0046679. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Wu TD, Watanabe CK. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005;21:1859–1875. doi: 10.1093/bioinformatics/bti310. [DOI] [PubMed] [Google Scholar]

[CR48] 48.Haas BJ, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8:1494. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Slater GSC, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31. doi: 10.1186/1471-2105-6-31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.She R, Chu JS-C, Wang K, Pei J, Chen N. GenBlastA: enabling BLAST to identify homologous gene sequences. Genome Res. 2009;19:143–149. doi: 10.1101/gr.082081.108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Jones P, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Lima T, et al. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. 2008;37:D471–D478. doi: 10.1093/nar/gkn661. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Punta M, et al. The Pfam protein families database. Nucleic Acids Res. 2011;40:D290–D301. doi: 10.1093/nar/gkr1065. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Nikolskaya AN, Arighi CN, Huang H, Barker WC, Wu CH. PIRSF family classification system for protein functional and evolutionary analysis. Evol. Bioinform. 2006;2:117693430600200033. doi: 10.1177/117693430600200033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.Attwood TK, et al. PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res. 2000;28:225–227. doi: 10.1093/nar/28.1.225. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Bru C, et al. The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 2005;33:D212–D215. doi: 10.1093/nar/gki034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR58] 58.Sigrist CJ, et al. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 2009;38:D161–D166. doi: 10.1093/nar/gkp885. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR59] 59.Madera M, Vogel C, Kummerfeld SK, Chothia C, Gough J. The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res. 2004;32:D235–D239. doi: 10.1093/nar/gkh117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR60] 60.Haft DH, et al. TIGRFAMs and genome properties in 2013. Nucleic Acids Res. 2012;41:D387–D395. doi: 10.1093/nar/gks1234. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR61] 61.Patra AK, 2020. First draft genome for the sand-hopper Trinorchestia longiramus. figshare. [DOI] [PMC free article] [PubMed]

[CR62] 62.2019. NCBI Sequence Read Archive. SRP199018

[CR63] 63.Patra AK, 2020. Trinorchestia longiramus isolate TLONG-mixed, whole genome shotgun sequencing project. GenBank. VCRD00000000

[CR64] 64.2019. NCBI Assembly. GCA_006783055.1

[CR65] 65.Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR66] 66.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[CR67] 67.Li L, Stoeckert CJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. doi: 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR68] 68.Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR69] 69.Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. doi: 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR70] 70.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR71] 71.Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 2016;33:1870–1874. doi: 10.1093/molbev/msw054. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR72] 72.Hedges SB, Dudley J, Kumar S. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics. 2006;22:2971–2972. doi: 10.1093/bioinformatics/btl505. [DOI] [PubMed] [Google Scholar]

[CR73] 73.Han MV, Thomas GW, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol. 2013;30:1987–1997. doi: 10.1093/molbev/mst100. [DOI] [PubMed] [Google Scholar]

[CR74] 74.Francia ME, et al. A Toxoplasma gondii protein with homology to intracellular type Na+/H+ exchangers is important for osmoregulation and invasion. Exp. Cell Res. 2011;317:1382–1396. doi: 10.1016/j.yexcr.2011.03.020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR75] 75.Dermauw W, Van Leeuwen T. The ABC gene family in arthropods: comparative genomics and role in insecticide transport and resistance. Insect Biochem. Mol. Biol. 2014;45:89–110. doi: 10.1016/j.ibmb.2013.11.001. [DOI] [PubMed] [Google Scholar]

[CR76] 76.Radulović Ž, Porter LM, Kim TK, Mulenga A. Comparative bioinformatics, temporal and spatial expression analyses of Ixodes scapularis organic anion transporting polypeptides. Ticks Tick Borne Dis. 2014;5:287–298. doi: 10.1016/j.ttbdis.2013.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

First draft genome for the sand-hopper Trinorchestia longiramus

Ajit Kumar Patra

Oksung Chung

Ji Yong Yoo

Min Seop Kim

Moon Geun Yoon

Jeong-Hyeon Choi

Youngik Yang

Abstract

Background & Summary

Methods

Sample collection and extraction of DNA and RNA

Short and long DNA fragment library construction

Table 1.

RNA short fragment and PacBio Iso-seq sequencing

Table 2.

k-mer distribution and genome size estimation

Fig. 1.

Genome assembly

Fig. 2.

Table 3.

Repeat annotation

Table 4.

Gene prediction and annotation

Table 5.

Data Records

Technical Validation

DNA and RNA sample quality

Illumina libraries

Genome assembly and gene prediction quality assessment

Table 6.

Comparison with other arthropod genomes

Fig. 3.

Usage Notes

Supplementary information

Acknowledgements

Author contributions

Code availability

Table 7.

Competing interests

Footnotes

Supplementary information

References

Associated Data

Data Citations

Supplementary Materials

Data Availability Statement

Table 7.

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases