Chromosome assembly of Collichthys lucidus, a fish of Sciaenidae with a multiple sex chromosome system

Mingyi Cai; Yu Zou; Shijun Xiao; Wanbo Li; Zhaofang Han; Fang Han; Junzhu Xiao; Fujiang Liu; Zhiyong Wang

doi:10.1038/s41597-019-0139-x

. 2019 Jul 24;6:132. doi: 10.1038/s41597-019-0139-x

Chromosome assembly of Collichthys lucidus, a fish of Sciaenidae with a multiple sex chromosome system

Mingyi Cai ^1,^✉,^#, Yu Zou ^1,^#, Shijun Xiao ^3,^4,^#, Wanbo Li ¹, Zhaofang Han ¹, Fang Han ¹, Junzhu Xiao ¹, Fujiang Liu ¹, Zhiyong Wang ^1,^2,^✉

PMCID: PMC6656731 PMID: 31341172

Abstract

Collichthys lucidus (C. lucidus) is a commercially important marine fish species distributed in coastal regions of East Asia with the X₁X₁X₂X₂/X₁X₂Y multiple sex chromosome system. The karyotype for female C. lucidus is 2n = 48, while 2n = 47 for male ones. Therefore, C. lucidus is also an excellent model to investigate teleost sex-determination and sex chromosome evolution. We reported the first chromosome genome assembly of C. lucidus using Illumina short-read, PacBio long-read sequencing and Hi-C technology. An 877 Mb genome was obtained with a contig and scaffold N50 of 1.1 Mb and 35.9 Mb, respectively. More than 97% BUSCOs genes were identified in the C. lucidus genome and 28,602 genes were annotated. We identified potential sex-determination genes along chromosomes and found that the chromosome 1 might be involved in the formation of Y specific metacentric chromosome. The first C. lucidus chromosome-level reference genome lays a solid foundation for the following population genetics study, functional gene mapping of important economic traits, sex-determination and sex chromosome evolution studies for Sciaenidae and teleosts.

Subject terms: Molecular evolution, Genome

Design Type(s)	sequence assembly objective • sequence annotation objective • transcription profiling design
Measurement Type(s)	whole genome sequencing assay • transcript expression assay
Technology Type(s)	DNA sequencing • RNA sequencing
Factor Type(s)	organism part
Sample Characteristic(s)	Collichthys lucidus

Open in a new tab

Machine-accessible metadata file describing the reported data (ISA-Tab format)

Background & Summary

Collichthys lucidus (C. lucidus, FishBase ID: 23635, NCBI Taxonomy ID: 240159, Fig. 1), also called spiny head croaker or big head croaker, belongs to Perciformes, Sciaenidae, Collichthys and is mainly distributed in the shore waters of the northwestern Pacific, covering from the South China Sea to Sea of Japan¹. C. lucidus is a commercially important marine fish species with high market value and has been widely consumed in coastal regions in China².

At present, the research on C. lucidus mostly focused on phylogeny and population genetics^3–7. C. lucidus exhibits apparent sex dimorphism on the growth rate that the female grow much faster than male ones; therefore, the understanding of its sex-determination would facilitate the development of the sex control technique in aquaculture industry to increase the annual yield. More interesting, our previous cytogenetic study showed that female C. lucidus had 24 pairs of acrocentric chromosomes (2n = 48a, NF = 48), while male ones had 22 pairs of acrocentric chromosomes, two monosomic acrocentric chromosomes and one metacentric chromosome (2n = 1 m + 46a, NF = 48)⁸. There is an X₁X₁X₂X₂/X₁X₂Y mechanism of the sex-chromosome type in C. lucidus, while Y is a unique metacentric chromosome in the male karyotype. Although multiple sex chromosome systems are found in several Perciformes species⁹, C. lucidus is the first reported case in the Sciaenidae species. At present, researches on the sex determination and differentiation mechanism in the Sciaenidae species are still lacking. Previous studies showed that no heterotropic chromosome was found in large yellow croaker (Larimichthys crocea) and spotted maigre (Nibea albiflora)^10,11. As a close-related species in the same family, the chromosome comparison might provide insights into chromosome evolution among the species and the relationship to the evolution of sex-determination in Sciaenidae.

To obtain high-quality chromosome sequences of C. lucidus, we applied a combined strategy of Illumina, PacBio and Hi-C technology¹² to sequence the genome of C. lucidus and reported the first chromosome-level assembly of this important species. The genome will be used for the functional gene mapping of the economic traits and the sex-determination of C. lucidus, as well as in the chromosome evolution investigations among Sciaenidae and teleosts.

Methods

Sample collection

A female wild-caught adult C. lucidus in Baima Harbor, Ningde, Fujian, China (26.7328°N, 119.7329°E) was used for the genome sequencing and assembly. The reason we chose a female sample is that the heterotropic chromosome in male might increase the technical challenge of genome assembly, especially for X₁ and X₂ chromosomes. Muscle, eye, brain, heart, liver, spleen, kidney, head kidney, gonad, stomach and intestines of the fish were harvested. All samples were rinsed with 1×PBS (Phosphate Buffered Solution) solution quickly, frozen with liquid nitrogen over 24 hours and then stored in −80 °C before sample preparation.

DNA extraction and sequencing

Phenol/chloroform extraction method was used in DNA molecules extraction from muscle tissues. The DNA molecules were used for sequencing on the Illumina (Illumina Inc., San Diego, CA, USA) and PacBio sequencing platform (Pacific Biosciences of California, Menlo Park, CA, USA). DNA library construction and sequencing in the Illumina sequencing platform were carried out according to the manufacturer’s instruction as in the previous study¹³. Briefly, the DNA extracted from muscle samples were randomly sheared to 300–350 bp fragments using an ultrasonic processor and paired-end library was constructed through the steps of end repair, poly(A) addition, barcode index, purification, and PCR amplification. The constructed DNA library was sequenced by Illumina HiSeq X platform in 150 PE mode. As a result of Illumina sequencing, we obtained 52.0 Gb raw genome data for C. lucidus. After the quality filtering, 51.35 Gb clean reads were retained as summarized in Table 1. Meanwhile, Genomic DNA molecules of C. lucidus were also used for one 20 kb library construction. Eleven flow cells were used in the PacBio Sequel platform to generate 90.7 Gb (109.3× coverage) polymerase sequencing data. After filtering adaptors in the sequencing reads, 90.5 Gb long reads were obtained for the following genome assembly (Table 1).

Table 1.

Sequencing data used for the C. lucidus genome assembly.

Types	Method	Library size (bp)	Clean data (Gb)	length (bp)	coverage (×)
Genome	Illumina	300–350	52.0	150	62.6
Genome	Pacbio	20,000	90.5	14,002	109.0
Genome	Hi-C	—	193.1	150	232.7
Transcriptome	Illumina	250–300	9.8	150	—

Open in a new tab

The coverage was calculated using an estimated genome size of 830 Mb.

RNA extraction and sequencing

Transcriptome of C. lucidus was also sequenced in this work for the gene prediction after the genome assembly. Muscle, eye, brain, heart, liver, spleen, kidney, head kidney, gonad, stomach and intestines tissues collected before from the same individual were used for RNA extraction with TRIZOL Reagent (Invitrogen, USA). The RNA molecules extracted from tissues were then equally mixed for RNA sequencing. According to the protocol suggested by the manufacturer, RNA sequencing library was constructed as the previous study¹⁴ and sequenced by Illumina HiSeq X Ten in 150PE mode (Illumina Inc., San Diego, CA, USA). Finally, ~9.8 Gb RNA-seq data were obtained (Table 1).

Genome survey and contig assembly

The genome size of the genome of C. lucidus was estimated with Illumina sequencing data using Kmer-based method implemented in GCE (v1.0.0)¹⁵ before genome assembly. Using Kmer size of 17, we obtain a Kmer frequency distribution for C. lucidus (Fig. 2). The genome size was estimated using the following equation: G = (L − K + 1) × n_base/(C_Kmer × L), Where G is the estimated genome size, n_base is the total count of bases, C_Kmer is the expectation of Kmer depth, L and K is the read length and Kmer size. Since Kmers with the depth smaller than three were likely from sequencing errors, we, therefore, revise the genome size by the following method: G_revise = G × (1 – Error Rate). As a result, we estimated female C. lucidus genome size of 830 Mb with the heterozygosity of 0.81% and the whole-genome average GC content of 42%.

Fig. 2 — Kmer frequency of C. *lucidus*. Note that the first, second and third peak was composed of the homozygous, heterozygous and repeated Kmers, respectively.

To assembly contig sequences using long-read data, the software Falcon v0.30¹⁶ was used for the contig assembling of the female genome of C. lucidus with default parameters. The genome assembly was performed by following steps in Falcon: First, daligner¹⁷ was used to generate read alignments, and the consensus reads were generated. Then, the overlap information among error-corrected reads were generated by daligner. Finally, a directed string graph was constructed from overlap data, and contig path were resolved by the string graph. Two round of sequence polishing was performed as follows: the assembled genome sequence was first polished with arrow¹⁸ using PacBio long reads, and Pilon¹⁹ was then used with Illumina sequencing data. In the end, we yielded a final genome contig assembly of C. lucidus with a total length 877.4 Mb with 2,912 contigs and a contig N50 of 1.10 Mb. (Table 2).

Table 2.

Assembly statistics of C. lucidus.

Sample ID	Contig Length (bp)	Contig number
Total	877,428,965	2,912
Max	9,855,977	—
Number >=2000bp	—	2,853
N50	1,098,566	210
N60	794,488	305
N70	545,261	437
N80	319,460	646
N90	152,174	1,044

Open in a new tab

Chromosome assembly using Hi-C data

To obtain a chromosome assembly of C. lucidus, we applied the Hi-C technique to generate the interaction information among contigs. 1 g muscle tissue was used for Hi-C library construction. The processes of crosslinking, lysis, chromatin digestion, biotin marking, proximity ligations, crosslinking reversal, and DNA purification steps were used in previous studies²⁰. The Hi-C library was sequenced in Illumina HiSeq X Ten platform, and 193.1 Gb Hi-C reads were generated (Table 1). The reads were aligned to the assembled contig sequences using Bowtie software, and the alignment was filtered as our previous study²¹. The interaction matrix among contig was generated, and Lachesis²² was then applied to anchor contigs into chromosomes with the agglomerative hierarchical clustering method. Finally, we successfully scaffolded 2,134 contigs into 24 chromosomes, representing 96.86% of the total assembled genome. The contig and scaffold N50 of the chromosome assembly was 1.1and 35.9 Mb, respectively. We noted that there are 865 contigs cannot reliably be anchored to any chromosome, and the N50 length of unanchored contigs was 49.4 kb, which was significantly smaller than that of 1.16 Mb for anchored contigs.

Gene prediction and functional annotation

The repetitive sequences in the C. lucidus genome sequences were annotated through a combination of homology prediction and ab initio prediction. RepeatMasker (http://www.repeatmasker.org/)²³ and RepeatProteinMask were applied for searching against RepBase database (http://www.girinst.org/repbase). We used Tandem Repeats Finder (TRF)²⁴ and LTR-FINDER²⁵ with default parameters for ab initio prediction. As a result, we identified 304.40 Mb of the assembled C. lucidus genome as repetitive elements, accounting for 34.68% of the total genome sequences. The repetitive elements were masked in the C. lucidus genome sequences, and the repeat-masked genome was used for the gene prediction.

The protein-coding gene annotation was identified by a combined strategy of homology-based prediction, ab initio prediction, and transcriptome-based prediction method. The protein sequences of several teleosts, including Danio rerio (GCF_000002035.6), Dicentrarchus labrax (GCA_000689215.1), Gasterosteus aculeatus (GCA_000180675.1), Oryzias latipes (GCF_002234675.1) and Takifugu rubripes (GCF_000180615.1) were mapped upon the assembled C. lucidus genome using TBLASTN²⁶. The alignments were conjoined by Solar software²⁷. GeneWise²⁸ was used to predict the exact gene structure of the corresponding genomic region on each BLAST hit. Furthermore, the sequences from RNA-seq were aligned to the assembled C. lucidus genome to identify potential exon regions by TopHat²⁹ and Cufflinks³⁰. Then, Augustus³¹ was also used to predict coding regions in the repeat-masked genome sequences. All these results were merged by MAKER³², leading to a total 28,602 protein-coding genes (Table 3). After homolog searching against to NCBI non-redundant protein (NR)³³, TrEMBL³⁴, Gene Ontology (GO)³⁵, SwissProt³⁴, Kyoto Encyclopedia of Genes and Genomes (KEGG)³⁶, InterPro³⁷, 28,032 (98.01%) protein-coding genes were annotated with at least one public functional database (Table 4).

Table 3.

General statistics of predicted protein-coding genes.

Gene set		Number	Average transcript length (bp)	Average CDS length (bp)	Average exons per gene	Average exon length (bp)	Average intron length (bp)
*De novo*	Augustus	32,502	11,378.88	1,494.29	8.52	175.44	1,314.88
*De novo*	Genscan	40,805	15.596.28	1,560.39	8.56	182.21	1,855.72
Homolog	D. *rerio*	52,244	9,049.21	1,076.27	5.56	193.69	1,749.76
	D. *labrax*	48,861	7,508.49	1,028.16	5.79	177.46	1,351.80
	G. *aculeatu*	45,957	7,811.18	1,035.02	6.04	171.27	1,447.46
	O. *latipes*	44,650	8,137.02	1,036.88	5.91	175.59	1,405.38
	T. *rubripes*	43,159	8,366.10	1,046.02	6.21	168.48	1,401.06
trans.orf/RNAseq		18,058	11,694.21	1,095.81	7.62	317.99	1,401.06
MAKER		28,602	13,241.72	1,673.58	9.74	207.05	1,284.21

Open in a new tab

Table 4.

General statistics of gene function annotation.

Type		Number	Percent(%)
Total		28,602	100
Annotated	InterPro	24,918	87.12
	GO	18,942	66.23
	KEGG	17,806	62.25
	Swissprot	26,038	91.04
	TrEMBL	27,883	97.49
	NR	27,996	97.88
Annotated		28,032	98.01
Unannotated		570	1.99

Open in a new tab

Repeat distribution and potential sex-determination gene identification

The distribution of repetitive elements along chromosomes was plot in Fig. 3. The repeats were generally concentrated at the two ends of the chromosomes, especially on the beginning end of the chromosome 1 in the assembled C. lucidus genome. Our previous cytogenetic analysis revealed that a chromosome with ending massive repeats was involved in the formation of Y specific metacentric chromosome⁸, we therefore speculated that chromosome 1 might be one of the two chromosomes in the sex chromosome fusion. Twenty one potential key genes in sex development of teleost were identified along the assembled C. lucidus genome (Fig. 3), facilitating the gene expression and functional studies aiming to the deciphering the sex-determination of C. lucidus. We identified the only one copy of Dmrt1 gene (dsx- and mab-3 related transcription factor 1) in the chromosome 11. Our previous studies on the studies of L. crocea¹⁰ and N. albiflora¹¹ revealed that Dmrt1 was a key gene in sex-determination of two species, we therefore speculated the Dmrt1 gene might also play an central role in sex-determination process of C. lucidus. The sequences of chromosomes and genes provided valuable resource for the following sex-determination investigations.

Data Records

The genomic Illumina sequencing data were deposited in the Sequence Read Archive at NCBI SRR8208332³⁸.

The genomic PacBio sequencing data were deposited in the Sequence Read Archive at NCBI SRR8142901³⁹.

The transcriptome Illumina sequencing data were deposited in the Sequence Read Archive at NCBI SRR8208331⁴⁰.

The Hi-C sequencing data were were deposited in the Sequence Read Archive at NCBI SRR8208301⁴¹.

The final chromosome assembly were deposited in the GenBank at NCBI SCMI00000000⁴².

The genome annotation file is available within figshare⁴³.

The sequences of potential sex-determination genes identified from the assembled C. lucidus genome is available within figshare⁴⁴.

Technical Validation

The quality of the DNA molecules was checked by agarose gel electrophoresis, showing the main band around 20 kb, and the extracted DNA spectrophotometer ratios (SP) were 260/280 ≥ 1.8.

The quality of the purified RNA molecules were checked by Nanodrop ND-1000 spectrophotometer (LabTech, USA) as the absorbance >1.7 at 260 nm/280 nm and 2100 Bioanalyzer (Agilent Technologies, USA) as the RIN of 8.0.

The raw reads from Illumina sequencing platform were cleaned using FastQC⁴⁵ and HTQC⁴⁶ by the following steps: (a) filtered reads with adapter sequence; (b) filter PE reads with one reads more than 10% N bases; (c) filtered PE reads with any end has more than 50% inferior quality (< = 5) bases.

The quality of the assembled genome were validated on terms of the completeness, accuracy and conservation synteny. Firstly, the completeness of the genome sequences was validated by the alignments of PacBio long reads.Minimap2⁴⁷ with default parameters was applied to map the CLR (Continuous Long Reads) subreads of C. lucidus back to the final chromosome assembly. We found that about 96.2% of the long reads could be aligned to the assembled genome, and the average depth of the alignment along the genome was 103 × . More than 99.78% and 98.1% of the genome sequences were aligned by at least 1× and 20× coverage, respectively. Secondly, we further confirmed the completeness of the assembled genome using BUSCO v3.0⁴⁸. As a result, 97.6% and 97.4% BUSCO genes were completely or partially identified in the assembled C. lucidus genome with the vertebrate and actinopterygii database, respectively. Thirdly, the accuracy of the genome assembly was evaluated by variants calling using Illumina data. The short reads were mapped to the genome sequences with BWA⁴⁹. The insertion length distribution with one peak agreed well with our experimental design, suggesting the accuracy of the genome assembly. SNP calling with read alignments in GATK⁵⁰ resulted in 2,593,807 heterozygous and 11,282 homozygous SNP loci along the genome sequences, suggesting the base-level accuracy of 99.999% for the genome assembly. Fourthly, the conservation synteny between C. lucidus and L. crocea⁵¹ were compared to validate the chromosome assembly. We observed a highly conserved synteny and strict correspondence of chromosome assignment (Fig. 4).

Fig. 4 — Chromosome comparison of C. *lucidus* to L. *corcea* using protein-coding genes synteny. The chromosome id of C. *lucidus* were sorted by the sequence lengths.

ISA-Tab metadata file

Download metadata file^{(3.5KB, zip)}

Acknowledgements

This work was supported by the National Key Research and Development Program of China (No. 2016YFC1200500), the National Natural Science Foundation of China (No. 31872553; No.31602207; No. 41706157; No. 31272653) and China Agriculture Research System (CARS-47-G04).

Author Contributions

Mingyi Cai and Zhiyong Wang conceived the study; Yu Zou, Fang Han, Junzhu Xiao, Fujiang Liu collected the samples and performed sequencing and Hi-C experiments; Yu Zou, Shijun Xiao, Wanbo Li, Zhaofang Han estimated the genome size and assembled the genome; Yu Zou, Shijun Xiao assessed the assembly quality; Shijun Xiao, Yu Zou carried out the genome annotation and functional genomic analysis,Mingyi Cai, Yu Zou, Shijun Xiao, Zhiyong Wang wrote the manuscript. Also, all authors read, edited, and approved the final manuscript.

Code Availability

No specific code were developed in this work. The data analysis were performed according to the manuals and protocols provided by the developer of the corresponding bioinformatics tools.

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Mingyi Cai, Yu Zou and Shijun Xiao.

Contributor Information

Mingyi Cai, Email: mycai@jmu.edu.cn.

Zhiyong Wang, Email: zywang@jmu.edu.cn.

ISA-Tab metadata

is available for this paper at 10.1038/s41597-019-0139-x.

References

1.Cheng J, Ma G, Miao Z, Shui B, Gao T. Complete mitochondrial genome sequence of the spinyhead croaker Collichthys lucidus (Perciformes, Sciaenidae) with phylogenetic considerations. Mol Biol Rep. 2012;39:4249–4259. doi: 10.1007/s11033-011-1211-6. [DOI] [PubMed] [Google Scholar]
2.Ma C, Ma H, Ma L, Cui H, Ma Q. Development and characterization of 19 microsatellite markers for Collichthys lucidus. Conservation Genetics Resources. 2011;3:503–506. doi: 10.1007/s12686-011-9389-4. [DOI] [Google Scholar]
3.Liu H, et al. Estuarine dependency in Collichthys lucidus of the Yangtze River Estuary as revealed by the environmental signature of otolith strontium and calcium. Environmental Biology of Fishes. 2014;98:165–172. doi: 10.1007/s10641-014-0246-7. [DOI] [Google Scholar]
4.Zhang S, et al. Cytogenetic characterization and description of an X1 X1 X2 X2 /X1 X2 Y sex chromosome system in Collichthys lucidus (Richardson, 1844). ActaOceanologica Sinica. 2018;37:34–39. doi: 10.1007/s13131-018-1152-1. [DOI] [Google Scholar]
5.He Z, Xue L, Jin H. On feeding habits and trophic level of Collichthys lucidus in inshore waters of northern East China Sea. Marine Fisheries. 2011;33:265–273. [Google Scholar]
6.Huang L, Xie Y, Li J, Zhang Y, Ji A. Biological Characteristics of Collichthys lucidus in Minjiang River Estuary and Its Adjacent Waters. Journal ofJimei Universit. 2010;15:248–253. [Google Scholar]
7.Ma G, Gao T, Sun D. Discussion of relationship between Collichthys lucidus and C. niveatus based on 16S rRNA and Cyt b gene sequences. South ChinaFisheries Science. 2010;6:13–20. [Google Scholar]
8.Zhang S, et al. Cytogenetic characterization and description of an X1 X1 X2 X2 /X1 X2 Y sex chromosome system in Collichthys lucidus (Richardson, 1844). ActaOceanologica Sinica. 2018;37:34–39. doi: 10.1007/s13131-018-1152-1. [DOI] [Google Scholar]
9.Kitano J, Peichel CL. Turnover of sex chromosomes and speciation in fishes. Environ Biol Fishes. 2012;94:549–558. doi: 10.1007/s10641-011-9853-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Lin A, et al. Identification of a male-specific DNA marker in the large yellow croaker (Larimichthys crocea) Aquaculture. 2017;480:116–122. doi: 10.1016/j.aquaculture.2017.08.009. [DOI] [Google Scholar]
11.Sun S, Lin A, Li W, Han Z, Wang Z. Genetic sex identification and the potential sex determination system in the yellow drum (Nibeaalbiflora) Aquaculture. 2018;492:253–258. doi: 10.1016/j.aquaculture.2018.03.042. [DOI] [Google Scholar]
12.Eid J, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133–138. doi: 10.1126/science.1162986. [DOI] [PubMed] [Google Scholar]
13.Xiao S, et al. Whole-genome single-nucleotide polymorphism (SNP) marker discovery and association analysis with the eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content in Larimichthys crocea. PeerJ. 2016;4:e2664. doi: 10.7717/peerj.2664. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Xiao S, et al. Functional marker detection and analysis on a comprehensive transcriptome of large yellow croaker by next generation sequencing. PLoS One. 2015;10:e0124432. doi: 10.1371/journal.pone.0124432. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Liu, B. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Preprint at http://arxiv.org/abs/1308.2012 (2012).
16.Pendleton M, et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat Methods. 2015;12:780. doi: 10.1038/nmeth.3454. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Myers G. Efficient local alignment discovery amongst noisy long reads. Algorithms Bioinform. 2014;8701:52–67. doi: 10.1007/978-3-662-44753-6_5. [DOI] [Google Scholar]
18.Chin CS, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10:563. doi: 10.1038/nmeth.2474. [DOI] [PubMed] [Google Scholar]
19.Walker BJ, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Rao SS, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Xu S, et al. A draft genome assembly of the Chinese sillago (Sillago sinica), the first reference genome for Sillaginidae fishes. Gigascience. 2018;7:giy108. doi: 10.1093/gigascience/giy108. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Burton JN, et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol. 2013;31:1119–1125. doi: 10.1038/nbt.2727. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Bergman CM, Quesneville H. Discovering and detecting transposable elements in genome sequences. Brief Bioinform. 2007;8:382–392. doi: 10.1093/bib/bbm048. [DOI] [PubMed] [Google Scholar]
24.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35:W265–W268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.AltschuP SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. Journal of molecular biology. 1990;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
27.Yu XJ, Zheng HK, Wang J, Wang W, Su B. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics. 2006;88:745–751. doi: 10.1016/j.ygeno.2006.05.008. [DOI] [PubMed] [Google Scholar]
28.Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14:988–995. doi: 10.1101/gr.1865504. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Kim D, et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome biology. 2013;14:R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7:562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19:ii215–ii225. doi: 10.1093/bioinformatics/btg1080. [DOI] [PubMed] [Google Scholar]
32.Campbell MS, Holt C, Moore B, Yandell M. Genome Annotation and Curation Using MAKER and MAKER-P. Curr Protoc Bioinformatics. 2014;48:4.11. 1–4.11. 39. doi: 10.1002/0471250953.bi0411s48. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61–D65. doi: 10.1093/nar/gkl842. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Boeckmann B. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research. 2003;31:365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Ashburner M, Ball CA, Blake JA. Gene Ontology: tool for the unification of biology. Nature genetics. 2000;25:25. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Zdobnov EM, Apweiler R. InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001;17:847–848. doi: 10.1093/bioinformatics/17.9.847. [DOI] [PubMed] [Google Scholar]
38.2018. NCBI Sequence Read Archive. SRP169630
39.2018. NCBI Sequence Read Archive. SRP167395
40.2018. NCBI Sequence Read Archive. SRP169629
41.2018. NCBI Sequence Read Archive. SRP169627
42.Cai MY, Xiao SJ. 2019. Collichthys lucidus isolate JT15FE1705JMU, whole genome shotgun sequencing project. GenBank. SCMI00000000
43.Cai MY, Xiao SJ, Zou Y. 2019. genome annotation of Collichthys lucidus. figshare. [DOI]
44.Cai MY, Xiao SJ, Zou Y. 2019. potentialsex-determination genes of Collichthys lucidus. figshare. [DOI]
45.Andrews, S. FastQC: a quality control tool for high throughput sequence data (2010).
46.Yang X, et al. HTQC: a fast quality control toolkit for Illumina sequencing data. BMC Bioinformatics. 2013;14:33. doi: 10.1186/1471-2105-14-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Waterhouse RM, et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol. 2017;35:543–548. doi: 10.1093/molbev/msx319. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv. 2013;1303:3997. [Google Scholar]
50.McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Xiao S, et al. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth. Sci Rep. 2015;5:18661. doi: 10.1038/srep18661. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

2018. NCBI Sequence Read Archive. SRP169630
2018. NCBI Sequence Read Archive. SRP167395
2018. NCBI Sequence Read Archive. SRP169629
2018. NCBI Sequence Read Archive. SRP169627
Cai MY, Xiao SJ. 2019. Collichthys lucidus isolate JT15FE1705JMU, whole genome shotgun sequencing project. GenBank. SCMI00000000
Cai MY, Xiao SJ, Zou Y. 2019. genome annotation of Collichthys lucidus. figshare. [DOI]
Cai MY, Xiao SJ, Zou Y. 2019. potentialsex-determination genes of Collichthys lucidus. figshare. [DOI]

Supplementary Materials

Download metadata file^{(3.5KB, zip)}

Data Availability Statement

No specific code were developed in this work. The data analysis were performed according to the manuals and protocols provided by the developer of the corresponding bioinformatics tools.

[CR1] 1.Cheng J, Ma G, Miao Z, Shui B, Gao T. Complete mitochondrial genome sequence of the spinyhead croaker Collichthys lucidus (Perciformes, Sciaenidae) with phylogenetic considerations. Mol Biol Rep. 2012;39:4249–4259. doi: 10.1007/s11033-011-1211-6. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Ma C, Ma H, Ma L, Cui H, Ma Q. Development and characterization of 19 microsatellite markers for Collichthys lucidus. Conservation Genetics Resources. 2011;3:503–506. doi: 10.1007/s12686-011-9389-4. [DOI] [Google Scholar]

[CR3] 3.Liu H, et al. Estuarine dependency in Collichthys lucidus of the Yangtze River Estuary as revealed by the environmental signature of otolith strontium and calcium. Environmental Biology of Fishes. 2014;98:165–172. doi: 10.1007/s10641-014-0246-7. [DOI] [Google Scholar]

[CR4] 4.Zhang S, et al. Cytogenetic characterization and description of an X1 X1 X2 X2 /X1 X2 Y sex chromosome system in Collichthys lucidus (Richardson, 1844). ActaOceanologica Sinica. 2018;37:34–39. doi: 10.1007/s13131-018-1152-1. [DOI] [Google Scholar]

[CR5] 5.He Z, Xue L, Jin H. On feeding habits and trophic level of Collichthys lucidus in inshore waters of northern East China Sea. Marine Fisheries. 2011;33:265–273. [Google Scholar]

[CR6] 6.Huang L, Xie Y, Li J, Zhang Y, Ji A. Biological Characteristics of Collichthys lucidus in Minjiang River Estuary and Its Adjacent Waters. Journal ofJimei Universit. 2010;15:248–253. [Google Scholar]

[CR7] 7.Ma G, Gao T, Sun D. Discussion of relationship between Collichthys lucidus and C. niveatus based on 16S rRNA and Cyt b gene sequences. South ChinaFisheries Science. 2010;6:13–20. [Google Scholar]

[CR8] 8.Zhang S, et al. Cytogenetic characterization and description of an X1 X1 X2 X2 /X1 X2 Y sex chromosome system in Collichthys lucidus (Richardson, 1844). ActaOceanologica Sinica. 2018;37:34–39. doi: 10.1007/s13131-018-1152-1. [DOI] [Google Scholar]

[CR9] 9.Kitano J, Peichel CL. Turnover of sex chromosomes and speciation in fishes. Environ Biol Fishes. 2012;94:549–558. doi: 10.1007/s10641-011-9853-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Lin A, et al. Identification of a male-specific DNA marker in the large yellow croaker (Larimichthys crocea) Aquaculture. 2017;480:116–122. doi: 10.1016/j.aquaculture.2017.08.009. [DOI] [Google Scholar]

[CR11] 11.Sun S, Lin A, Li W, Han Z, Wang Z. Genetic sex identification and the potential sex determination system in the yellow drum (Nibeaalbiflora) Aquaculture. 2018;492:253–258. doi: 10.1016/j.aquaculture.2018.03.042. [DOI] [Google Scholar]

[CR12] 12.Eid J, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133–138. doi: 10.1126/science.1162986. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Xiao S, et al. Whole-genome single-nucleotide polymorphism (SNP) marker discovery and association analysis with the eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content in Larimichthys crocea. PeerJ. 2016;4:e2664. doi: 10.7717/peerj.2664. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Xiao S, et al. Functional marker detection and analysis on a comprehensive transcriptome of large yellow croaker by next generation sequencing. PLoS One. 2015;10:e0124432. doi: 10.1371/journal.pone.0124432. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Liu, B. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Preprint at http://arxiv.org/abs/1308.2012 (2012).

[CR16] 16.Pendleton M, et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat Methods. 2015;12:780. doi: 10.1038/nmeth.3454. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Myers G. Efficient local alignment discovery amongst noisy long reads. Algorithms Bioinform. 2014;8701:52–67. doi: 10.1007/978-3-662-44753-6_5. [DOI] [Google Scholar]

[CR18] 18.Chin CS, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10:563. doi: 10.1038/nmeth.2474. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Walker BJ, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Rao SS, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Xu S, et al. A draft genome assembly of the Chinese sillago (Sillago sinica), the first reference genome for Sillaginidae fishes. Gigascience. 2018;7:giy108. doi: 10.1093/gigascience/giy108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Burton JN, et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol. 2013;31:1119–1125. doi: 10.1038/nbt.2727. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Bergman CM, Quesneville H. Discovering and detecting transposable elements in genome sequences. Brief Bioinform. 2007;8:382–392. doi: 10.1093/bib/bbm048. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35:W265–W268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.AltschuP SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. Journal of molecular biology. 1990;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Yu XJ, Zheng HK, Wang J, Wang W, Su B. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics. 2006;88:745–751. doi: 10.1016/j.ygeno.2006.05.008. [DOI] [PubMed] [Google Scholar]

[CR28] 28.Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14:988–995. doi: 10.1101/gr.1865504. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Kim D, et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome biology. 2013;14:R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7:562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19:ii215–ii225. doi: 10.1093/bioinformatics/btg1080. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Campbell MS, Holt C, Moore B, Yandell M. Genome Annotation and Curation Using MAKER and MAKER-P. Curr Protoc Bioinformatics. 2014;48:4.11. 1–4.11. 39. doi: 10.1002/0471250953.bi0411s48. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61–D65. doi: 10.1093/nar/gkl842. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Boeckmann B. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research. 2003;31:365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Ashburner M, Ball CA, Blake JA. Gene Ontology: tool for the unification of biology. Nature genetics. 2000;25:25. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Zdobnov EM, Apweiler R. InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001;17:847–848. doi: 10.1093/bioinformatics/17.9.847. [DOI] [PubMed] [Google Scholar]

[CR38] 38.2018. NCBI Sequence Read Archive. SRP169630

[CR39] 39.2018. NCBI Sequence Read Archive. SRP167395

[CR40] 40.2018. NCBI Sequence Read Archive. SRP169629

[CR41] 41.2018. NCBI Sequence Read Archive. SRP169627

[CR42] 42.Cai MY, Xiao SJ. 2019. Collichthys lucidus isolate JT15FE1705JMU, whole genome shotgun sequencing project. GenBank. SCMI00000000

[CR43] 43.Cai MY, Xiao SJ, Zou Y. 2019. genome annotation of Collichthys lucidus. figshare. [DOI]

[CR44] 44.Cai MY, Xiao SJ, Zou Y. 2019. potentialsex-determination genes of Collichthys lucidus. figshare. [DOI]

[CR45] 45.Andrews, S. FastQC: a quality control tool for high throughput sequence data (2010).

[CR46] 46.Yang X, et al. HTQC: a fast quality control toolkit for Illumina sequencing data. BMC Bioinformatics. 2013;14:33. doi: 10.1186/1471-2105-14-33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Waterhouse RM, et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol. 2017;35:543–548. doi: 10.1093/molbev/msx319. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv. 2013;1303:3997. [Google Scholar]

[CR50] 50.McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Xiao S, et al. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth. Sci Rep. 2015;5:18661. doi: 10.1038/srep18661. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Chromosome assembly of Collichthys lucidus, a fish of Sciaenidae with a multiple sex chromosome system

Mingyi Cai

Yu Zou

Shijun Xiao

Wanbo Li

Zhaofang Han

Fang Han

Junzhu Xiao

Fujiang Liu

Zhiyong Wang

Abstract

Background & Summary

Fig. 1.

Methods

Sample collection

DNA extraction and sequencing

Table 1.

RNA extraction and sequencing

Genome survey and contig assembly

Fig. 2.

Table 2.

Chromosome assembly using Hi-C data

Gene prediction and functional annotation

Table 3.

Table 4.

Repeat distribution and potential sex-determination gene identification

Fig. 3.

Data Records

Technical Validation

Fig. 4.

ISA-Tab metadata file

Acknowledgements

Author Contributions

Code Availability

Competing Interests

Footnotes

Contributor Information

ISA-Tab metadata

References

Associated Data

Data Citations

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases