The genome of the sapphire damselfish Chrysiptera cyanea: a new resource to support further investigation of the evolution of Pomacentrids

Emma Gairin; Saori Miura; Hiroki Takamiyagi; Marcela Herrera; Vincent Laudet

doi:10.46471/gigabyte.144

. 2024 Dec 31;2024:gigabyte144. doi: 10.46471/gigabyte.144

The genome of the sapphire damselfish Chrysiptera cyanea: a new resource to support further investigation of the evolution of Pomacentrids

Emma Gairin ^1,^*, Saori Miura ¹, Hiroki Takamiyagi ¹, Marcela Herrera ^1,^†, Vincent Laudet ^1,^2,^3,^†

PMCID: PMC11711634 PMID: 39791000

Abstract

The number of high-quality genomes is rapidly increasing across taxa. However, it remains limited for coral reef fish of the Pomacentrid family, with most research focused on anemonefish. Here, we present the first assembly for a Pomacentrid of the genus Chrysiptera. Using PacBio long-read sequencing with 94.5× coverage, the genome of the Sapphire Devil, Chrysiptera cyanea, was assembled and annotated. The final assembly comprises 896 Mb pairs across 91 contigs, with a BUSCO completeness of 97.6%, and 28,173 genes. Comparative analyses with chromosome-scale assemblies of related species identified contig-chromosome correspondences. This genome will be useful as a comparison to study specific adaptations linked to the symbiotic life of closely related anemonefish. Furthermore, C. cyanea is found in most tropical coastal areas of the Indo-West Pacific and could become a model for environmental monitoring. This work will expand coral reef research efforts, highlighting the power of long-read assemblies to retrieve high quality genomes.

Introduction

Damselfish (Pomacentridae family) are highly abundant and common demersal reef fish throughout temperate, subtropical, and tropical waters. Many damselfish species, such as anemonefish, are prominent in the aquariology trade worldwide. They are also commonly studied to answer a number of scientific questions, notably at the ecological, behavioural, and developmental levels. As a key resource to deepen scientific analyses, genomes have been generated for fifteen species of Pomacentrids, including ten Amphiprion species [1–7]. Chromosome-level genomes are currently available for five species: four Pomacentrinae (Acanthochromis polyacanthus [8], Amphiprion ocellaris [4], Amphiprion percula [1], and Amphiprion clarkii [9]) and one Chrominae, Dascyllus trimaculatus [3]. To expand the genomic resources of Pomacentrids and reduce the bias for Amphiprion, we present here the first genome for the Cheiloprionini Chrysiptera cyanea, a member of the Pomacentrinae subfamily.

The Sapphire Devil C. cyanea [10] is a strikingly blue damselfish (Figure 1A) that can be encountered in shallow (0–10 m) Indo-West Pacific tropical coral reefs [11] (Figure 1C). C. cyanea is territorial, reef-associated, and non-migratory. It is most generally found around rubble and corals in subtidal reef flats and tide pools (Figure 1B). C. cyanea has an omnivorous diet, commonly consisting of plankton, algae, and small benthic crustaceans. The size of females typically ranges between 38 and 54 mm total length (TL), while males measure 49–73 mm TL [12]. C. cyanea live in small to large schools that typically consist of one or a few males and several females [13, 14] (Figure 1B). The reproductive season of C. cyanea in Okinawa lasts from April to August [15, 16], and large densities of juvenile fish can be found in clusters along the coastline during the reproduction season in the presence or absence of adult conspecifics.

The life cycle of C. cyanea is akin to the majority of damselfish species, with a pelagic larval duration of 17–21 days [19], followed by a return to the reef coupled with the metamorphosis from larvae to juveniles. The juveniles then grow and eventually become sexually mature females (Figure 1D). The presence of a sex-specific size distribution and bias in the sex ratio of C. cyanea on the reef is indicative of protogynous hermaphroditism, with females turning into males [14, 20].

In Okinawa, females have transparent caudal fins and males have opaque blue caudal fins (Figure 1A). This phenotype is different from that of other locations in the Indo-Pacific, where males exhibit a yellow to orange tail. Studies of the cytochrome c oxidase subunit I (COI) showed a divergence between C. cyanea from Indonesian/Australian, Philippines, and Tonga, suggesting the presence of allopatric divergence with possible new damselfish species (different COI reported in Tonga [21]). C. cyanea was initially described in Timor as a blue fish with yellow fins [10]. A synonymous species, Chrysiptera punctatoperculare (Fowler, 1946), was described in Aguni, Okinawa. Here, we identified the species living in Okinawa, with blue fins, as C. cyanea on the basis of a recent taxonomy guide [22].

This damselfish species is widespread along the coastal zones of Okinawa, making it an easily accessible study model for various behavioural, developmental, ecological, and ecotoxicological questions. To tackle functional and molecular-level questions, in particular those based on gene expression, the availability of a genome is a considerable resource. For instance, in the case of transcriptomic studies, genome availability allows to perform reference-based alignments: such alignments are robust methods to examine RNA sequencing results while minimising issues, such as incorrect assemblies of paralogous sequences (from gene duplication and diversification, leading to families of genes with similar sequences), preventing an underestimation of gene expression levels, as well as avoiding incomplete and fragmented assemblies produced by de novo assembly methods [23]. To date, no genomic data is available for any Chrysiptera species. Here, we present a highly complete genome obtained from PacBio long-read sequencing, making this study the first to publish a genome for this coral reef fish genus, and adding to the number of genomes available for damselfishes. These genomes can also be useful to improve the analysis of the unique adaptations present in anemonefishes, by comparison with other damselfish [24].

Methods

Fish collection and DNA sequencing

A male C. cyanea (total length: 5.6 cm) was collected using SCUBA and hand nets on Sesoko beach in Okinawa, Japan (26.6509 N, 127.8564 E) on September 30th, 2022. The fish was kept under natural conditions in a 270 L flow-through outdoor tank at the OIST Marine Science Station until October 19th, 2022. It was euthanised in a 200 mg/L Tricaine Methanesulfonate (MS222) solution following the guidelines for animal use issued by the Animal Resources Section of OIST Graduate University. Liver tissue destined for genome sequencing was snap-frozen in liquid nitrogen and stored at −80 °C. Eye, gill, liver, and muscle tissue from the same male fish, and brain tissue from another male from the same site, were preserved in RNAlater for transcriptomic sequencing and stored at −30 °C.

Genomic DNA was extracted from the snap-frozen liver using the Monarch HMW DNA Extraction Kit for tissues (New England BioLabs). Library preparation and sequencing were performed by the OIST Sequencing Section. First, the genomic DNA was sheared to 25 kb using the Megaruptor 3 (Diagenode), as longer reads improve the ease of genome assembly. Library preparation was conducted using the SMRTbell Express Template Prep Kit 3.0 (PacBio, #102-182-700) following the manufacturer’s protocol. Library size selection was carried out using the BluePippin system (SageScience, MA USA). After preparation with the Sequel Binding Kit 3.2 to bind the polymerase and sequencing primers, sequencing was performed on two SMRT cells with Circular Consensus Sequencing runs using PacBio Sequel IIe HiFi (Pacific Biosciences, CA, USA).

RNA was extracted from the eye, gill, liver, muscle, and brain tissue using the Maxwell^® RSC simplyRNA tissue kits and an automated Maxwell^® RSC instrument, following manufacturer recommendations (Promega, Cat. No. AS2000, Wisconsin, USA). The quality and concentration of RNA were assessed with an Invitrogen Qubit Flex benchtop fluorometer and an Agilent 4200 TapeStation. Library preparation was performed by the OIST Sequencing section using the NEBNext Ultra II Directional RNA library prep kit for Illumina (New England BioLabs, USA). Sequencing was performed using an Illumina NovaSeq6000 platform at OIST.

Sequencing data processing and genome assembly

The complete list of software, version, and parameters is provided in Table 1. Quality control was performed with FastQC 0.11.9 (RRID:SCR_014583) on the raw reads from each SMRT cell. The genome size was estimated based on a k-mer approach using Jellyfish 2.2.7 (RRID:SCR_005491) and GenomeScope 2.0 (RRID:SCR_017014). The DNA sequences were assembled with the Improved Phased Assembler 1.3.1 (IPA) [25] without phasing – i.e., without separating the parental alleles into haplotypes, as this option yielded the best assembly results out of different IPA and Flye 2.9.1 (RRID:SCR_017016) [26] assemblies based on Quast 5.2.0 (RRID:SCR_001228) statistics [27], BUSCO (Benchmarking Universal Single-Copy Orthologs) (RRID:SCR_015008) scores (actinopterygii_odb10, BUSCO 4.1.2) [28], and similarity to the k-mer estimation of the genome size. Merqury 1.3 (RRID:SCR_022964) was used to estimate the genome completeness and error rate. Purge Haplotigs 1.1.3 was used to improve contiguity by removing allelic contigs from the non-phased IPA assembly [29]. The repeats were annotated with RepeatModeler 2.0.3 (RRID:SCR_015027) with the parameter -LTRStruct [30]. RepeatMasker 4.1.1 (RRID:SCR_012954) [31] was run to identify repetitive elements in the RepeatModeler output and the vertebrate library of Dfam. Repetitive elements were then softmasked with BEDTools 2.30.0 (RRID:SCR_006646) [32].

Table 1.

Software and parameters used for the genome assembly and annotation of Chrysiptera cyanea.

Purpose		Software and version	Parameters
Raw reads quality check		FastQC 0.11.9
Estimate genome size		Jellyfish 2.2.7	k = 21, s = 100 M
Reference-free characterization		GenomeScope 2.0
De novo assembly		Improved Phased Assembler 1.3.1 Flye 2.9.1	Phase and no phase With and without haplotypes, with and without scaffolding
Haplotigs removal		Purge Haplotigs 1.1.3	low = 10, m = 75, h = 195
Repeat modeling and masking		RepeatModeler 2.0.3 RepeatMasker 4.1.4	LTRStruct
Genome annotation		Hisat 2.2.1 Braker 2.1.6 Diamond 2.0.14 InterProScan 5.60	Protein evidence from UniProtKB/Swiss-Prot, ten related genomes Swissprot Pfam
Functional annotation		NCBI Blast 2.10.0
File processing		Bedtools 2.29.2 Samtools 1.12
Mapping		Minimap 0.2
Assembly statistics	General metrics Completeness Quality and error rate	Quast 5.2.0 BUSCO 4.1.2 Merqury 1.3	actinopterygii_odb10
Check for contamination		Blobtools 1.1

Open in a new tab

Contig scaffolding on clownfish reference genomes

Using MUMmer 3.23 (RRID:SCR_018171) [33], the contigs of the C. cyanea assembly were mapped to the chromosome-scale genome assemblies for A. clarkii, A. ocellaris, A. percula, Ac. polyacanthus, and D. trimaculatus. Dot plots were generated using ggplot2 (RRID:SCR_014601) [34] after data filtering to remove any alignments shorter than 10,000 bases [35].

Transcriptome sequencing data processing

Quality control of the RNA sequencing data from the eye, gill, liver, muscle, and brain tissue was performed with FastQC 0.11.9 [36]. Following this quality check, the sequences were processed by trimming the adapters used by Illumina on each transcript as well as dropping low quality sections with Trimmomatic 0.39 (RRID:SCR_011848) [37]. The transcriptomic reads were mapped to the contig sequences with HISAT2 2.2.1 (RRID:SCR_015530) [38], then converted to BAM format from SAM format with SAMtools 1.12 (RRID:SCR_002105) [39, 40]. Lastly, the number of transcripts per gene was quantified using Kallisto 0.46.2 (RRID:SCR_016582) [41].

Prediction of gene models

The position of protein-coding gene structures in the soft-masked assembly was predicted by BRAKER 2.1.6 (RRID:SCR_018964) [39, 42–53] using the transcriptome of the eye, gill, liver, and muscle of the fish used for the genome. The position of the protein-coding gene structures was also predicted using protein evidence from UniProtKB/Swiss-Prot [54] as well as selected fish proteomes from the NCBI database (A. ocellaris: 48,668 sequences; A. clarkii: 25,025; Danio rerio: 88,631; Ac. polyacanthus: 36,648; Oreochromis niloticus: 63,760; Oryzias latipes: 47,623; Poecilia reticulata: 45,692; Stegastes partitus: 31,760; Takifugu rubripes: 49,529; and Salmo salar: 112,302). We selected genes with evidence from the transcriptome or protein hints, and with homology to the Swiss-Prot protein database and Pfam domains identified with Diamond (RRID:SCR_016071) [45] and InterProScan (RRID:SCR_005829) [55]. NCBI BLAST (RRID:SCR_004870f) 2.10.0 [56] was used to perform the functional annotation of the final gene models.

The output was assessed using Quast 5.2.0 [27] as well as by calculating the number of BUSCO genes present in the assembly [28].

Gene expression analysis

The tissue-specify index (τ) was calculated for each gene using the R package tispec v0.99 [57]. The relationship between τ and the transcripts per million (TPM) gene-expression values was visualized on a 2D histogram with ggplot2 [34]. An upset plot from UpSetR v1.4.0 [58] was used to visualise the TPM values per tissue (brain, eye, gill, liver, and muscle).

Results

Genome assembly of C. cyanea

We assembled the genome of the damselfish C. cyanea by sampling one individual from Okinawa and obtaining 3,335,935 PacBio reads. The average read length was 25,387 for a total of 84,688,690,513 sequenced bases. FastQC 0.11.9 did not detect low-quality reads (Figure 2).

Figure 2. — (A) Distribution of the raw read length from the two SMRT cells. (B) Distribution of the GC content across the sraw reads from the two SMRT cells. (C) Per base sequence quality of the raw reads from the two SMRT cells; data obtained using FastQC 0.11.9.

Multiple de novo assembly options were tested (Table 2) using Flye and IPA. We chose the IPA no-phasing primary assembly as it yielded the highest N50 (30.6 Mb), the lowest number of contigs (103), and a genome length of 899.7 Mb. From the raw reads, the genome size had been estimated with a k-mer approach to be 753–754 Mb (using Jellyfish and GenomeScope with k-mer sizes 21 and 31; Figure 3, Supplementary Table 1 in Figshare [60]). This estimate was 16% lower than the assembly; the discrepancy might be due to the high repeat content of teleost fish genomes, leading to an underestimation of the genome size using a k-mer approach. Similarly, the estimated size of the genome using the k-mer approach was lower than the final assembly for A. ocellaris [4], A. clarkii [9], and D. trimaculatus [3]. Here, the assembly size was close to the reported lengths of other damselfish genomes. For instance, the genome size of A. ocellaris is 861.4 Mb [4], of A. clarkii is 843.6 Mb [9], of A. percula is 908.8 Mb [1], and of D. trimaculatus is 910.8 Mb [3].

Table 2.

Statistics generated with Quast 5.2.0 and BUSCO 4.1.2 about the different primary genome assemblies produced by the Flye 2.9.1 and the IPA 1.3.1, as well as about the final assembly, which was obtained from the non-phased purged assembly (read depth cutoff: low = 10, medium = 70, high = 195) generated with Purge Haplotigs 1.1.3, with soft-masked repeats obtained from RepeatModeler 2.0.3 and RepeatMasker 4.1.4.

	Flye				IPA
	With haplotypes	Without haplotypes	With haplotypes, scaffold	Without haplotypes, scaffold	Phase A	Phase P	No phase A	No phase P	No phase P purged
Number of contigs	3,392	1,705	2,788	1,677	1,506	126	5,516	103	91
Largest contig	16,926,528	31,064,569	24,238,470	32,994,236	31,083,763	70,778,723	5,766,530	56,011,215	56,011,215
Total length	895,607,061	907,362,611	902,966,492	907,614,972	559,243,152	1,252,663,990	609,931,055	899,747,371	896,528,091
GC (%)	38.72	38.72	38.73	38.72	38.08	38.92	38.4	38.74	38.73
N50	3,366,676	6,245,761	2,428,049	7,852,259	8,678,642	22,721,003	153,437	30,591,387	30,591,387
N90	256,067	687,672	268,683	710,351	89,761	5,200,624	47,219	4,184,431	4,251,551
L50	66	34	86	32	19	20	708	12	12
L90	476	198	524	179	176	61	3,718	39	38
Number of N’s per 100 kb	0	0	0	0.34	0.02	0.02	0.01	0.03	0.03
Number of contigs (≥1,000 bp)	3,295	1,701	2,751	1,673	1,506	126	5,516	103	91
Total length (≥1,000 bp)	895,534,291	907,359,480	902,937,845	907,612,138	559,243,152	1,252,663,990	609,931,055	899,747,371	896,528,091
Number of contigs (≥5,000 bp)	2,631	1,591	2,435	1,562	1,505	126	5,509	103	91
Total length (≥5,000 bp)	893,580,031	907,017,243	901,992,706	9072,72,034	559,241,105	1,252,663,990	609,902,971	899,747,371	896,528,091
Number of contigs (≥10,000 bp)	1,909	1,396	2,110	1,359	1,495	126	5,455	103	91
Total length (≥10,000 bp)	888,503,131	905,565,959	899,619,340	905,751,218	559,155,404	1,252,663,990	609,468,326	899,747,371	896,528,091
Number of contigs (≥25,000 bp)	1,256	875	1,418	842	1,326	126	5,087	103	91
Total length (≥25,000 bp)	878,176,247	897,047,373	888,532,498	897,267,975	556,013,220	1,252,663,990	602,874,429	899,747,371	896,528,091
Number of contigs (≥50,000 bp)	978	572	1,040	542	539	126	3,463	103	91
Total length (≥50,000 bp)	868,491,235	886,407,758	875,155,475	886,689,354	525,520,384	1,252,663,990	536,558,481	899,747,371	8,965,28,091
BUSCO Complete (%)	97.4	97.7	97.3	97.6	52.8	97.7	62.0	97.5	97.6
BUSCO Single copy (%)	96.4	96.3	95.8	96.2	51.2	53.6	60.6	96.5	96.9
BUSCO Duplicated (%)	1.0	1.4	1.5	1.4	1.6	44.1	1.4	1	0.7
BUSCO Fragmented (%)	0.9	0.8	1.2	0.7	0.5	0.4	3.9	0.7	0.6
BUSCO Missing (%)	1.7	1.5	1.5	1.7	46.7	1.9	34.1	1.8	1.8

Open in a new tab

Figure 3. — (A) Raw coverage plot of the frequency at which k-mers are covered within the raw reads, following k-mer assessment by Jellyfish (k-mer size = 21 bases). Figure generated by GenomeScope. (B) Log-transformed coverage plot of the frequency at which k-mers are covered within the raw reads, following k-mer assessment by Jellyfish (k-mer size = 21 bases). Figure generated by GenomeScope. (C) Key statistics about the ﬁnal assembly (IPA no phase, purged) from Mercury. (D) Boxplot of the quality value of the contigs based on their size. The boxplot displays the median, 25th and 75th percentiles, with whiskers extending to the maximum and minimum values, within a distance from the box up to 1.5 times the interquartile range from Mercury. (E) Distribution of the contigs in the assembly based on their size.

The de novo assembly was curated using Purge Haplotigs (removing one contig when syntenic pairs of contigs were detected) [29]. 12 contigs were removed to obtain a curated assembly with 91 contigs, with a total length of 896.5 Mb. The GC content of the genome was 38.73%, and the mean base-level coverage was 94.5× (Table 2). We retrieved 97.6% of BUSCO genes in the final genome assembly, with 96.9% single copy and 0.7% duplicated. We found 0.6% of BUSCO genes to be fragmented and 1.8% missing. The completeness of the genome estimated by Merqury was 90.129%, with a mean quality value of 48.9 and an error rate of 0.0000138 (one nucleotide error per 72.6kb).

A total of 324,222,639 bp (36.16% of sequences) were identified as being repeat content using RepeatMasker, with 20% DNA repeat elements, 18% simple repeats (microsatellites), 6% long interspersed nuclear elements, 2% low complexity repeats, 2% long terminal repeats, and under 1% each of short interspersed nuclear elements, rolling circles, tRNA, retroposon, rRNA, snRNA, and Satellite elements. We could not identify 50% of the repetitive elements in the C. cyanea genome.

As a reference, a comparison of key statistics of this genome assembly with previously published chromosome-scale genomes of Pomacentrid fish is provided in Table 3. Overall, while our coverage was slightly lower than other genomes, we were able to obtain a much more contiguous de novo assembly (91 contigs vs. 951 in Ac. polyacanthus, and over 1,400 in all other assemblies) by using very long 25 kb DNA fragments for sequencing (rather than the recommended 10–20 kb). Our contig N50 of 30.6 Mb was six to thirty times larger than those of the other assemblies. The GC, repeat contents, and BUSCO completeness scores were similar to those of other genomes (Table 3).

Table 3.

Comparison of the genome assembly strategy and output for C. cyanea with other chromosome-scale genome assemblies of Pomacentrid fish.

	Amphiprion ocellaris (false clownfish)	Amphiprion clarkii (yellow tail clownfish)	Amphiprion percula (orange clownfish)	Acanthochromis polyacanthus (spiny chromis)	Dascyllus trimaculatus (threespot damselfish)	Chrysiptera cyanea (blue damselfish)
Sequencing strategy	PacBio Sequel II, HiC	PacBio Sequel II, HiC	PacBio RSII, HiC	PacBio RSII, HiC	HiSeq4000, 2 × 150 PE	PacBio Sequel II
Mean base-level coverage	103.9×	250.4×	121×	131×	103×	94.5×
De novo assembly method	Falcon	Flye with haplotypes, Purge Haplotigs	Falcon	Falcon	MaSuRCA	IPA non-phased primary assembly, purge haplotigs
Assembly size	856.6 Mb	843.6 Mb	908.8 Mb	956.8 Mb	910.7 Mb	896.6 Mb
Number of chromosomes	24	24	24	24	24	NA
Number of scaffolds including chromosomes	353	192	365	81	2156	NA
Scaffold N50	36.9 Mb	26.7 Mb	38.4 Mb	41.7 Mb	34.9 Mb	NA
Number of contigs	1,551	1,840	1,414	951	3,501	91
Contig N50	0.9 Mb	1.2 Mb	1.9 Mb	5.5 Mb	1.1 Mb	30.6 Mb
GC content (%)	39.58	39.71	39.5	40.5	41.5	38.7
Repeat content (%)	45	44	19	38	not reported	36
BUSCO genome completeness (%)	97	98.7	97.1	96.7	97.9	97.6
Complete and single copy (%)	96.2	97.8	96.1	94.3	96.2	96.9
Complete and duplicated (%)	0.8	0.9	1.0	2.4	1.7	0.7
Fragmented (%)	0.5	0.4	0.5	0.8	0.7	0.6
Missing (%)	2.4	0.9	2.4	2.5	1.4	1.8
Number of protein-coding genes	26,797	25,050	26,597	25,468	not reported	28,173
BUSCO gene annotation completeness (%)	96.6	97.0	96.2	94.2	not reported	96.6
Complete and single copy (%)	95.5	96.1	84.8	76.0	not reported	96.0
Complete and duplicated (%)	1.1	0.9	11.4	18.2	not reported	0.6
Fragmented (%)	1.0	1.1	2.0	3.2	not reported	1.5
Missing (%)	2.3	1.9	1.7	2.6	not reported	1.9
Data release	Ryu et al. [4]	Moore et al. [59]	Lehman et al. [1]	Lehman et al. [8]	Roberts et al. [3]	This study

Open in a new tab

The 91 contigs were mapped against the chromosome-scale genomes of A. clarkii, A. ocellaris, A. percula, D. trimaculatus, and Ac. polyacanthus using MUMmer. After filtering to remove aligned segments shorter than 10,000 bases, over 40% of the bases of the assembly of C. cyanea were matched with the reference genomes of A. clarkii, A. ocellaris, A. percula, and Ac. polyacanthus. However, only 20.3% were retrieved in the alignment with D. trimaculatus (Supplementary Table 4 in Figshare [60]); the latter genome was reported to have undergone rearrangements [3].

48 contigs were mapped onto the reference genomes (filtering out alignments shorter than 10,000 bases). The correspondences are provided in Table 4. 16, 18, 14, 16, and 0 contigs had over 50% of the bases aligning to the reference genomes of A. clarkii, A. ocellaris, A. percula, Ac. polyacanthus, and D. trimaculatus (Figure 4A). 46 contigs were aligned with a single chromosome (Figure 4A). 43 contigs did not robustly map to any chromosome of the reference genomes, among which contig-36 was the largest contig that could not be matched (4.6 Mb).

Table 4.

Correspondence of the contigs of the genome assembly for C. cyanea and the chromosomes of the reference genomes of Dascyllus trimaculatus, Acanthochromis polyacanthus, Amphiprion clarkii, Amphiprion ocellaris, and Amphiprion percula. A total of 223 chromosome-contig matches were retrieved. Matches with less than 100,000 bases are not indicated in the table (50 matches removed). Scaffolding was performed using the nucmer function of the MUMmer 3.23 package [33], removing alignments shorter than 10,000 bases.

Chrysiptera cyanea contig	A. clarkii chromosome	A. ocellaris chromosome	Ac. polyacanthus chromosome	A. percula chromosome	D. trimaculatus chromosome
1	10, 16	11, 16	14, 15	12, 14, 20	12, 20
2	7, 18	9, 20	1, 12	7, 18	7
3	9	7	13	9	9
4	2	4	3	3	3
5	1	1	2	1	1
6	14	15	11	17	17
7	3	2	4	2	2
8	8	8	6	8	8
9	20	18	19	19	19
10	19	17	18	16	16
11	22	22	20	22	22
12	17	19	21	14	14
13	5	5	5	6	6
14	11	10	9	10	10
15	21	21	1	21	21
16	4	3	8	4	4
17	6	6	7	5	5
18	15	14	17	15	15
19	12	13	10	13	13
20	18	20	1	18	18
21	13	12	16	11	11
22	15	14	17	15	NA
23	10	11	14	12	12
24	12	13	10	13	13
25	23	23	23	23	23
26	13	12	16	11	NA
27	4	3	8	4	NA
29	24	24	22	24	NA
30	6	6	7	5	5
31	18	20	1	18	18
32	13	12	16	11	NA
41	11	10	9	10	NA
47	23	NA	NA	NA	NA
51	23	23	23	23	NA
54	24	24	22	24	NA

Open in a new tab

Figure 4. — (A) Percentage of bases in each contig aligning to the reference genomes. Only contigs aligning to at least one chromosome, with at least 10,000 bases retrieved from the reference genome, are displayed (48 of the total 91 contigs). Based on this threshold, most contigs align to only one reference chromosome. In the case of an alignment to more than one chromosome, the number of retrieved chromosomes is indicated in the corresponding tile of the plot. (B) Dotplot of the alignment of the chromosome-scale *Amphiprion ocellaris* genome with the contigs from *Chrysiptera cyanea* (alignments longer than 10,000 bases). Contig-1 is indicated with a green frame. (C) Dotplot of the alignment of contig-1 from *Chrysiptera cyanea* with chromosomes 11 and 16 of *A. ocellaris* (alignments longer than 10,000 bases). This dotplot is a larger version of the green frame in panel (B). (D) Dotplot of the alignment of contig-1 from *Chrysiptera cyanea* with chromosomes 4, 10, 11, 16, and 24 of *A. ocellaris* (alignments longer than 5,000 bases). (E) Dotplot of the alignment of contig-1 from *Chrysiptera cyanea* with chromosomes 6, 11, 12, 14, and 20 of *Amphiprion percula* (alignments longer than 5,000 bases).

Two C. cyanea contigs were mapped to more than one chromosome from the reference assemblies: contigs 1 and 2 (the two largest contigs, of sizes 56.0 Mb and 45.2 Mb, respectively). This result may indicate a chimeric assembly in the analysis of the data or genomic rearrangement. For contig-1, the alignment against the A. ocellaris genome highlighted a missing zone between the 19,943,083 and 33,394,537th bases with no alignments longer than 10,000 bases (Figure 4C). Similar results were found for comparisons with the other reference genomes. Lowering the filtering threshold to include alignments between 5,000 and 10,000 bases against A. ocellaris retrieved some relatively scrambled alignments in the gap mentioned above (Figure 4D). This could indicate a chimeric assembly of contig-1, which may in fact consist of two chromosomes in C. cyanea. However, in a comparison with A. percula for alignments longer than 5,000 bases, it was found that the missing zone maps onto A. percula’s chromosome 20 (Figure 4E). This may point towards genomic rearrangement from two or more ancestral chromosomes, with divergence through evolution leading to the low mapping rate between 19 Mb and 33 Mb on contig-1.

C. cyanea gene annotation

The genome was annotated using BRAKER v.2.1.6 based on mRNA and protein evidence, leading to the prediction of 46,873 gene models. These were filtered to keep only the longest isoform of each gene model. We retained genes with mRNA evidence or homology to the Swiss-Prot protein database and Pfam domains. The completeness of the final set of 28,173 genes was assessed using BUSCO, with a final score of 96.6%, including 96.0% single-copy genes and 0.6% duplicated genes. We found that 1.5% of BUSCO genes were fragmented, and 1.9% were missing from the final transcriptome. Of the final set of 28,173 genes, 19,356 (69%) had at least one associated GO term. Lastly, 1,802 genes (6.4% of all genes) were located on contigs that could not be matched to any chromosome from the five reference genomes.

Tissue specificity of the gene expression

The tissue specificity of the 28,173 identified genes, or the “transcriptomic atlas” of C. cyanea, was assessed using the transcriptomes of the eye, gill, liver, and muscle of the fish used for the genome assembly, as well as the brain of another fish collected on the same day and at the same site. In total, 1,639 genes were expressed in all five tissues (with TPM values above 10 in each tissue, Figure 5), similarly to previous reports in anemonefish [4, 59]. The tissue specificity index was calculated for each gene (for which near 0 indicates broadly expressed genes, and near 1 specific genes) [61, 62]. A total of 3,090 genes had a value below 0.2, corresponding to housekeeping genes (similar expression levels across most tissues without bias) [61]. This number was relatively similar to those reported for Amphiprion species, notably for A. clarkii (3,697 genes) [59] and A. ocellaris (3,431 genes) [4]. Following log-transformation and normalisation, 7,396 genes with values beyond 0.85, indicative of high tissue specificity, were identified. Of these, 4,258 genes were absolutely specific with a = 1 (expressed in only one type of tissue). This is a relatively high number and may be linked to the low amount of tissue types considered here, compared to other studies of damselfish (five here against, e.g., twelve in [59]). The gill tissue had the highest number of highly specific genes, with 1,170 genes with = 1, followed by the brain tissue (1,094), eye (780), liver (480), and muscle (134). These values were proportional to the total number of genes expressed in each tissue, which was higher in the brain, eye, and gill, whereas it was lower in the liver and muscle, similarly to other damselfish. Lastly, tissue-specific genes tended to have low expression levels (negative correlation of value with gene expression level, R = 0.41, p < 10⁻¹⁵, Figure 5), as previously reported for A. clarkii [59] and A. ocellaris [4].

Figure 5. — (A) Upset plot displaying number of genes expressed (intersection size) in individuals and combinations of different tissues (brain, eye, gill, liver, and muscle) results section. Transcripts per million (TPM) values above 10 were used as a threshold for gene expression. (B) Two-dimensional histogram displaying the relationship between the maximum TPM (log-transformed) and the tissue specificity index (τ) of each gene. The trendline displays the Pearson’s correlation between τ and log-transformed TPM.

Discussion

The Sapphire Devil C. cyanea is a widely distributed damselfish in the Indo-Pacific area. In Okinawa, it is among the most common species on reef flats [63]. It has been mainly studied to elucidate the roles of various environmental controls on their reproduction and investigate related hormonal processes [12, 15, 16]. To further the potential of biomolecular analyses based on this species, this study generated the first genome of a Chrysiptera fish from a male individual collected in Okinawa, Japan. This genome will be of high value for future genetic-based approaches, from population structure to gene expression analyses. Among hot topics in research, the difference between anemonefish and other damselfish is particularly examined. Here, we provide a new high-quality non-anemonefish genome, which will be of relevance to further the depth of such analyses.

Using PacBio HiFi, we assembled a high-quality genome of size 896.5 Mb, within the range of other related species, with a high completeness of 97.6% BUSCO genes. Of particular interest, although Hi-C was not generated for this study, the use of PacBio HiFi long-read technology, the IPA, and Purge Haplotigs allowed us to identify 91 contigs. This is a low number of contigs, particularly compared to those for other reef fishes with scaffold-level genomes based on Illumina short-read approaches. The C. cyanea genome contigs closely mapped against chromosome-scale genomes from other Pomacentrids, with several homologous contig-chromosome pairs (Figure 4, Table 4). The high quality of the genome assembled here is indicative that the sequencing depth, choice of technology, and downstream pipelines were sufficient to retrieve long contigs and achieve a near chromosome-scale genome assembly.

Several architectural variations in the genome of C. cyanea were detected by comparison with the five reference genomes. Some contigs showed homology with their reference chromosomes (Figure 4, Table 4, Supplementary Tables 3, 4, and 5 in Figshare [60]). However, others showed rearrangements that ranged from inversions with or without frame shifts to a possible rearrangement of ancestral chromosomes in the Cheiloprionini lineage (Figure 4B). In particular, over 50% of the bases of contig-1 were aligned to at least two chromosomes of all reference genomes, with about half of the bases mapping to one chromosome and half to another (Figure 4, Supplementary Table 5 in Figshare [60]). No C. cyanea contigs robustly scaffolded to chromosome 24 of Ac. polyacanthus and D. trimaculatus (Supplementary Table 3 in Figshare [60]). As chromosome 24 of D. trimaculatus was reported to be highly rearranged [3], the absence of alignment with this chromosome highlights that the rearrangement occurred in the Chrominae branch, which separated from the Pomacentrinae branch, to which Amphiprion Acanthochromis, and Chrysiptera belong, over 50 million years ago [64, 65]. The relative distance of Chrysiptera to D. trimaculatus compared with the other reference species is also illustrated by contig-18, which is the contig showing the highest rate of alignment with most genomes. Indeed, 67.7–75.1% of the bases of contig-18 were aligned to the A. clarkii, A. ocellaris, A. percula, and Ac. polyacanthus reference genomes but only 34% with D. trimaculatus (Figure 4A). Lastly, the similarity in results when comparing the C. cyanea assembly with each of the four Amphiprion and Acanthochromis reference genomes can also be linked to the evolutive history of Pomacentrids. C. cyanea is part of the Cheiloprionini, which split from the other Pomacentrinae, notably from the branch to which Amphiprion and Acanthochromis belong, over 35 million years ago, leading to similar differences in the comparisons to all of these genomes [64, 65].

Conclusion

In this study, the first genome assembly for a Chrysiptera species was generated with 91 contigs. The contigs were successfully aligned to related reference genomes, allowing for forays into the genomic architecture of Pomacentrids with respect to their evolutive relationship. C. cyanea is easy to breed and maintain in aquaria and is highly abundant in Indo-Pacific coastal waters, particularly in Okinawa, Japan. The generation of this high-quality genome will further the potential of this species as a coral reef model species for research questions requiring biomolecular approaches.

Acknowledgements

The authors would like to thank the OIST Sequencing Section for the library preparation and sequencing, the Scientific Computing and Data Analysis section of Core Facilities at OIST, as well as Stefano Vianello for providing the drawings of Chrysiptera cyanea.

Funding Statement

Funding for this research was provided by the Okinawa Institute of Science and Technology Graduate University SHINKA Grant as well as the Iwatani Naoji Memorial Foundation through the 49th Iwatani Science and Technology Research Grant FY2023.

Data availability

The genome and transcriptome sequencing reads are deposited in the NCBI GenBank database under the BioProject PRJNA1167451. The genome assembly, annotation, and proteome for C. cyanea, as well as supplemental tables, BUSCO outputs and detailed scripts for each genome assembly and annotation step, are available in FigShare [60].

Abbreviations

COI, cytochrome c oxidase subunit I; IPA, Improved Phased Assembler; TL, total length; TPM, transcripts per million.

Declarations

Ethics approval and consent to participate

The experimental procedure was conducted in accordance with ethical procedures recommended by the Okinawa Institute of Science and Technology Graduate University.

Competing interests

The authors declare no competing interests.

Authors’ contributions

EG: Investigation, Formal Analysis, Funding Acquisition, Writing, SM: Project Administration, Investigation, HT: Investigation, MH: Supervision, Formal Analysis, Software, Writing, VL: Conceptualization, Supervision, Project Administration, Funding Acquisition, Writing.

Funding

References

1.Lehmann R, Lightfoot DJ, Schunter C et al. Finding Nemo’s genes: a chromosome-scale reference assembly of the genome of the orange clownfish Amphiprion percula . Mol. Ecol. Resour., 2019; 19: 570–585. doi: 10.1111/1755-0998.12939. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Marcionetti A, Rossier V, Roux N et al. Insights into the genomics of clownfish adaptive radiation: genetic basis of the mutualism with sea anemones. Genome Biol. Evol., 2019; 11: 869–882. doi: 10.1093/gbe/evz042. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Roberts MB, Schultz DT, Gatins R et al. Chromosome-level genome of the three-spot damselfish, Dascyllus trimaculatus . G3: Genes Genom. Genet., 2023; 13: jkac339. doi: 10.1093/g3journal/jkac339. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Ryu T, Herrera M, Moore B et al. A chromosome-scale genome assembly of the false clownfish, Amphiprion ocellaris . G3: Genes Genom. Genet., 2022; 12: jkac074. doi: 10.1093/g3journal/jkac074. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Schunter C, Welch MJ, Ryu T et al. Molecular signatures of transgenerational response to ocean acidification in a species of reef fish. Nat. Clim. Change, 2016; 6: 1014–1018. doi: 10.1038/nclimate3087. [DOI] [Google Scholar]
6.Tan MH, Austin CM, Hammer MP et al. Finding Nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (Amphiprion ocellaris) genome assembly. GigaScience, 2018; 7: gix137. doi: 10.1093/gigascience/gix137. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Sayers EW, Bolton EE, Brister JR et al. Database resources of the national center for biotechnology information. Nucleic Acids Res., 2022; 50: D20–D26. doi: 10.1093/nar/gkab1112. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Lehmann R, Schunter C, Welch MJ et al. Genetic architecture of behavioural resilience to ocean acidification. bioRxiv, 2022; 10.1101/2022.10.18.512656. [DOI]
9.Moore B, Herrera M, Gairin E et al. The chromosome-scale genome assembly of the yellowtail clownfish Amphiprion clarkii provides insights into melanic pigmentation of anemonefish. bioRxiv, 2022; 10.1101/2022.07.21.500941. [DOI] [PMC free article] [PubMed]
10.Quoy J, Gaimard J. . Zoologie. Voyage autour du monde, exécuté sur les corvettes de S.M. l’Uranie et la Physicienne pendant les années 1817, 1818, 1819 et 1820. Pillet Aîné, Paris: Freycinet L, 1825; p. 392. [Google Scholar]
11.Allen GR. . Damselfishes of the World. Melle, Germany, Mentor, Ohio: Mergus; Aquarium Systems [distributor], 1991; ISBN-10:3882440082. [Google Scholar]
12.Gronell AM. . Visiting behaviour by females of the sexually dichromatic damselfish, Chrysiptera cyanea (Teleostei: Pomacentridae): a probable method of assessing male quality. Ethology, 1989; 81: 89–122. doi: 10.1111/j.1439-0310.1989.tb00760.x. [DOI] [Google Scholar]
13.Tamilmani G, Gopakumar G. . Chrysiptera cyanea (Quoy & Gaimard, 1825). Kochi: ICAR - Central Marine Fisheries Research Institute, 2017; pp. 301–305, ISBN 978-93-82263-14-2. [Google Scholar]
14.Wacker S, Ness MH, Östlund-Nilsson S et al. Social structure affects mating competition in a damselfish. Coral Reefs, 2017; 36: 1279–1289. doi: 10.1007/s00338-017-1623-4. [DOI] [Google Scholar]
15.Bapary MAJ, Fainuulelei P, Takemura A. . Environmental control of gonadal development in the tropical damselfish Chrysiptera cyanea . Mar. Biol. Res., 2009; 5: 462–469. doi: 10.1080/17451000802644722. [DOI] [Google Scholar]
16.Bapary JMA, Nurul Amin Md, Takemura A. . Food availability as a possible determinant for initiation and termination of reproductive activity in the tropical damselfish Chrysiptera cyanea . Mar. Biol. Res., 2012; 8: 154–162. doi: 10.1080/17451000.2011.605146. [DOI] [Google Scholar]
17.Kass JM, Vilela B, Aiello-Lammens ME et al. Wallace: a flexible platform for reproducible modeling of species niches and distributions built for community expansion. Meth. Ecol. Evol., 2018; 9: 1151–1156. doi: 10.1111/2041-210X.12945. [DOI] [Google Scholar]
18.QGIS Association . QGIS Geographic Information System 3.22.9, 2022; https://download.qgis.org/downloads/.
19.Gopakumar G, Santhosi I, Ramamoorthy N. . Breeding and larviculture of the sapphire devil damselfish Chrysiptera cyanea . J. Mar. Biol. Assoc. India, 2009; 51: 130–136. [Google Scholar]
20.Thresher RE, Moyer JT. . Male success, courtship complexity and patterns of sexual selection in three congeneric species of sexually monochromatic and dichromatic damselfishes (Pisces: Pomacentridae). Anim. Behav., 1983; 31: 113–127. doi: 10.1016/S0003-3472(83)80179-1. [DOI] [Google Scholar]
21.Steinke D, Zemlak TS, Hebert PDN. . Barcoding Nemo: DNA-based identifications for the ornamental fish trade. PLoS One, 2009; 4: e6300. doi: 10.1371/journal.pone.0006300. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Nakabo T. . Fishes of Japan with Pictorial Keys to the Species [Japanese]. 3rd ed, Tokai University Press, 2013. [Google Scholar]
23.Lee S-G, Na D, Park C. . Comparability of reference-based and reference-free transcriptome analysis approaches at the gene expression level. BMC Bioinform., 2021; 22: 310. doi: 10.1186/s12859-021-04226-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Herrera M, Ravasi T, Laudet V. . Anemonefishes: a model system for evolutionary genomics. F1000 Res., 2023; 12: 204. doi: 10.12688/f1000research.130752.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Sovic I, Kronenberg Z, Dunn C et al. Improved phased assembler - PacBio HiFi genome assembly, 2020; https://raw.githubusercontent.com/ucdavis-bioinformatics-training/ucdavis-bioinformatics-training.presentations/master/assembly/UC_Davis-IPA_HiFi_Assembler_Sovic.pdf.
26.Kolmogorov M, Yuan J, Lin Y et al. Assembly of long error-prone reads using repeat graphs. Nat. Biotechnol., 2019; 37: 540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]
27.Gurevich A, Saveliev V, Vyahhi N et al. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 2013; 29: 1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Seppey M, Manni M, Zdobnov EM. . BUSCO: assessing genome assembly and annotation completeness. In: Kollmar M. (ed.), Gene Prediction: Methods and Protocols. New York, NY: Springer, 2019; pp. 227–245, doi: 10.1007/978-1-4939-9173-0_14. [DOI] [PubMed] [Google Scholar]
29.Roach MJ, Schmidt SA, Borneman AR. . Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinform., 2018; 19: 460. doi: 10.1186/s12859-018-2485-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Flynn JM, Hubley R, Goubert C et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA, 2020; 117: 9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Tempel S. . Using and understanding RepeatMasker. Methods Mol. Biol., 2012; 859: 29–51. doi: 10.1007/978-1-61779-603-6_2. [DOI] [PubMed] [Google Scholar]
32.Quinlan AR, Hall IM. . BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 2010; 26: 841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Kurtz S, Phillippy A, Delcher AL et al. Versatile and open software for comparing large genomes. Genome Biol., 2004; 5(2): R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Wickham H. . ggplot2: Elegant Graphics for Data Analysis. Cham: Springer, 2016; doi: 10.1007/978-3-319-24277-4. [DOI] [Google Scholar]
35.Monlong J. . MUMmerplots with ggplot2, Hippocamplus, 2017; https://jmonlong.github.io/Hippocamplus/2017/09/19/mummerplots-with-ggplot2/.
36.Andrews S, Biggins L, Inglesfield S et al. Babraham bioinformatics - FastQC a quality control tool for high throughput sequence data, 2019; https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (accessed March 1, 2023).
37.Bolger AM, Lohse M, Usadel B. . Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014; 30: 2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Kim D, Langmead B, Salzberg SL. . HISAT: a fast spliced aligner with low memory requirements. Nat. Meth., 2015; 12: 357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Li H, Handsaker B, Wysoker A et al. The sequence alignment/map format and SAMtools. Bioinformatics, 2009; 25: 2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Danecek P, Bonfield JK, Liddle J et al. Twelve years of SAMtools and BCFtools. GigaScience, 2021; 10: giab008. doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Bray NL, Pimentel H, Melsted P et al. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol., 2016; 34: 525–527. doi: 10.1038/nbt.3519. [DOI] [PubMed] [Google Scholar]
42.Barnett DW, Garrison EK, Quinlan AR et al. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, 2011; 27: 1691–1692. doi: 10.1093/bioinformatics/btr174. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Bruna T, Hoff KJ, Lomsadze A et al. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom. Bioinform., 2021; 3: lqaa108. doi: 10.1093/nargab/lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Bruna T, Lomsadze A, Borodovsky M. . GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom. Bioinform., 2020; 2: lqaa026. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Buchfink B, Reuter K, Drost H-G. . Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Meth., 2021; 18: 366–368. doi: 10.1038/s41592-021-01101-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Gotoh O. . A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence. Nucleic Acids Res., 2008; 36: 2630–2638. doi: 10.1093/nar/gkn105. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Hoff KJ, Lomsadze A, Borodovsky M et al. Whole-Genome Annotation with BRAKER. Gene Prediction. New York, NY: Humana, 2019; pp. 65–95, doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Hoff KJ, Lange S, Lomsadze A et al. BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics, 2016; 32: 767–769. doi: 10.1093/nar/gks708. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Iwata H, Gotoh O. . Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucleic Acids Res., 2012; 40: e161. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Lomsadze A, Burns PD, Borodovsky M. . Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res., 2014; 42: e119. doi: 10.1093/nar/gku557. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Lomsadze A, Ter-Hovhannisyan V, Chernoff YO et al. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res., 2005; 33: 6494–6506. doi: 10.1093/nar/gki937. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Stanke M, Diekhans M, Baertsch R et al. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics, 2008; 24: 637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]
53.Stanke M, Schöffmann O, Morgenstern B et al. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinform., 2006; 7: 62. doi: 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.UniProt Consortium . UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res., 2021; 49: D480–D489. doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Zdobnov EM, Apweiler R. . InterProScan – an integration platform for the signature-recognition methods in InterPro. Bioinformatics, 2001; 17: 847–848. doi: 10.1093/bioinformatics/17.9.847. [DOI] [PubMed] [Google Scholar]
56.Altschul SF, Madden TL, Schäffer AA et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 1997; 25: 3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Condon K. . tispec: calculates tissue specificity from RNA-seq data, 2020; https://github.com/BioinfGuru/tispec.
58.Conway JR, Lex A, Gehlenborg N. . UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics, 2017; 33: 2938–2940. doi: 10.1093/bioinformatics/btx364. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Moore B, Herrera M, Gairin E et al. The chromosome-scale genome assembly of the yellowtail clownfish Amphiprion clarkii provides insights into the melanic pigmentation of anemonefish. G3: Genes Genom. Genet., 2023; 13: jkad002. doi: 10.1093/g3journal/jkad002. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Gairin E, Miura S, Takamiyagi H et al. Chrysiptera cyanea genome annotation, transcriptome, proteome, and script for manuscript figure production, Figshare. [Dataset], 2024; 10.6084/m9.figshare.27143571 [DOI]
61.Kryuchkova-Mostacci N, Robinson-Rechavi M. . A benchmark of gene expression tissue-specificity metrics. Briefings Bioinform., 2017; 18: 205–214. doi: 10.1093/bib/bbw008. [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Yanai I, Benjamin H, Shmoish M et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics, 2005; 21: 650–659. doi: 10.1093/bioinformatics/bti042. [DOI] [PubMed] [Google Scholar]
63.Lecchini D, Adjeroud M, Pratchett MS et al. Spatial structure of coral reef fish communities in the Ryukyu Islands, southern Japan. Oceanol. Acta, 2003; 26: 537–547. doi: 10.1016/S0399-1784(03)00048-3. [DOI] [Google Scholar]
64.McCord CL, Nash CM, Cooper WJ et al. Phylogeny of the damselfishes (Pomacentridae) and patterns of asymmetrical diversification in body size and feeding ecology. PLoS One, 2021; 16: e0258889. doi: 10.1371/journal.pone.0258889. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Tang KL, Stiassny MLJ, Mayden RL et al. Systematics of Damselfishes. Cope1, 2021; 109: 258–318. doi: 10.1643/i2020105. [DOI] [Google Scholar]

GigaByte. 2024 Dec 31;2024:gigabyte144.

Article Submission

Emma Gairin

GigaByte.

Assign Handling Editor

Editor: Scott Edmunds

GigaByte.

Editor Assess MS

Editor: Hongfang Zhang

GigaByte.

Curator Assess MS

Editor: Christopher Hunter

GigaByte.

Review MS

Editor: Yue Song

Reviewer name and names of any other individual's who aided in reviewer	Yue Song
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.)	Yes
Is the language of sufficient quality?	Yes
Please add additional comments on language quality to clarify if needed
Are all data available and do they match the descriptions in the paper?	No
Additional Comments	The authors have provided clues for accessing the data in public databases such as NCBI, but it seems that the data has not been released; At least, I haven't been able to obtain available data using the provided accession number (e.g. PRJNA1167451). I'm not sure if I've missed any information, but I believe it would be better if the data could be easily accessible to the public.
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a>	Yes
Additional Comments
Is the data acquisition clear, complete and methodologically sound?	No
Additional Comments	The authors used PacBio's third-generation sequencing technology for genome sequencing, which has become a "necessary option" for obtaining high-quality genomes in current genomic research. However, they did not further advance on the path of "assembling a chromosome-level genome" based on this version. Providing a chromosome-level genome would likely be more meaningful.
Is there sufficient detail in the methods and data-processing steps to allow reproduction?	No
Additional Comments	Regarding the genome assembly and annotation process, the method described by the authors is overly simplistic and lacks detailed information on the parameters and procedures used. This makes it difficult for other researchers to effectively replicate the results described in the article.
Is there sufficient data validation and statistical analyses of data quality?	No
Additional Comments	The authors have calculated the N50 of contigs and the completeness of BUSCO genes, which are indeed two commonly used indicators for assessing the quality of genome assemblies. However, it is still challenging to gain a clear understanding of the assembly quality based solely on these two indicators. Could other measurements be added, such as comparing the continuity and completeness of the assembly with those of closely related species or other comparable species' genomes? Additionally, there is a point that is difficult to understand: the authors report a BUSCO completeness of approximately 94% for the genome, yet a BUSCO completeness of 97% for the gene set. It is puzzling how BUSCO genes that are not annotated in the genome can still be present in the gene set.
Is the validation suitable for this type of data?	Yes
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data?	No
Additional Comments	As I mentioned earlier, the authors did not provide detailed information about the processing procedures and parameters, which makes it difficult for other researchers to replicate their results.
Any Additional Overall Comments to the Author	It is recommended that the authors provide a detailed description of the methods and easily accessible data retrieval methods. It would be even better if the authors could further provide a chromosome-level genome, as T2T (telomere-to-telomere) level genomes are becoming increasingly popular.
Recommendation	Minor Revision

Open in a new tab

GigaByte.

Review MS

Editor: Darrin Schultz

Reviewer name and names of any other individual's who aided in reviewer	Darrin T. Schultz
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.)	Yes
Is the language of sufficient quality?	Yes
Please add additional comments on language quality to clarify if needed
Are all data available and do they match the descriptions in the paper?	Yes
Additional Comments	The genome is also not yet on NCBI, but it would be good to upload it.
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a>	Yes
Additional Comments	I suggest later that there should be more information about the HiFi library preparation details, as the manuscript lacks them and it appears to be a non-standard (large insert size) library.
Is the data acquisition clear, complete and methodologically sound?	No
Additional Comments	See above comment-
Is there sufficient detail in the methods and data-processing steps to allow reproduction?	No
Additional Comments	No parameters are provided for the genome assembly software, for read trimming, or for other software used.
Is there sufficient data validation and statistical analyses of data quality?	No
Additional Comments	See extended comments - the read data could use more QC, as well as the genome assembly.
Is the validation suitable for this type of data?	No
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data?	Yes
Additional Comments	There is a degree of information missing about the data, but another researcher could use them for their study.
Any Additional Overall Comments to the Author	Thank you for the opportunity to review the work, The genome of the sapphire damselfish Chrysiptera cyanea: a new resource to support further investigation of the evolution of Pomacentrids, by Gairin and colleagues. In this manuscript, the authors collect an individual of the pomocentrid fish, Chrysiptera cyanea, in Okinawa, Japan. After isolating DNA, the sequencing center at OIST prepared and sequenced a SMRT sequencing library. Additionally, the authors generated some bulk RNA-seq data and sequenced it on the Illumina platform. The authors assembled the genome with two assemblers, and performed some comparisons of the C. cyanea contigs aligned to the chromosome-scale scaffolds of closely related pomacentrids. Given my background, I will mostly comment on the genomic analyses. I appreciate the authors' diligence in exploring different genome assembly methods and their efforts in running BUSCO and QUAST to QC the assemblies. The DNA sequencing data and assembly produced contigs that align well with the chromosomes of closely related species (which is convenient for comparative genomics!), and the manuscript presents a solid foundation for better understanding the chromosomal evolutionary history of the Pomacentridae. While this work represents an important step toward providing a new genomic resource for Chrysiptera cyanea, I see a few areas where the manuscript could be refined to enhance it as a community resource: (1) More information about data generation: Including additional details about the HiFi library preparation, specifically the chemistries used, the number of SMRT cells sequenced, and the bioinformatics steps used to generate the HiFi reads, would improve the manuscript's clarity and reproducibility. I have some questions regarding whether these libraries were prepared for HiFi sequencing: the reported mean read length of 25kbp is 10kbp longer than the standard HiFi library insert size; and the reported amount of bases in the reads, 84 Gbp, is more data than one would expect from a single CCS-processed SMRT cell, but could be the amount of data produced from one CLR run. Characterizing the quality score vs read length distribution could be helpful to characterize the read data. Clarifying these steps taken before the genome was assembled would strengthen the reliability of these reads as a resource. (2) Incorporating a few more important quality control (QC) steps would better clarify the completeness of the genome assembly. For instance, an estimate of genome size from the HiFi reads could be performed with jellyfish and GenomeScope, taking advantage of the k-mer fidelity of HiFi reads. This would provide a more conclusive estimate than the current comparison. Additionally, steps such as checking for contamination and providing an explanation for decisions like haplotig removal would make the assembly process more transparent. Lastly, supplementing the QC analysis with Merqury will provide a reliable answer to how complete the assembly represents the information in the individual HiFi reads in a way that complements BUSCO and QUAST. (3) The initial analyses of chromosome structure are a promising look into some yet-unexplored chromosomal changes in the Pomacentridae, and I think that incorporating a deeper phylogenetic analysis would build on this strength. Situating the chromosomal findings within a phylogenetic framework could provide stronger support, or actually resolve, the evolutionary interpretations presented. Doing this analysis likely could also help resolve whether the structures seen are genome misassemblies, or instead reflect lineage-specific chromosomal changes. The authors could supplement their beautiful figures using other tools that leverage whole-genome alignments and chromosome visualization to help answer these questions. One tool to try for two-genome comparisons, that the authors may have explored already in place of their ggplot script, is D-GENIES. Overall, this is a valuable resource, and I commend the authors for taking the steps to analyze the chromosomal evolutionary history within the pomacentrids. I look forward to seeing the authors’ future contributions to the field of genomics and chromosome evolution. Minor Points Line 125: Sharing the specific Trimmomatic settings used would enhance the reproducibility of the RNA-seq data processing. The parameters for genome assembly should also be added. Line 212: Are there any replicates for the RNA-seq data? Line 294: Consider uploading the assembly to NCBI for broader visibility and accessibility.
Recommendation	Minor Revision

Open in a new tab

GigaByte.

Editor Decision

Editor: Hongfang Zhang

GigaByte. 2024 Dec 31;2024:gigabyte144.

Minor Revision

Emma Gairin

GigaByte.

Assess Revision

Editor: Hongfang Zhang

GigaByte.

Final Data Preparation

Editor: Yannan Fan

GigaByte.

Editor Decision

Editor: Hongfang Zhang

GigaByte.

Accept

Editor: Scott Edmunds

Editor’s Assessment

Among hot topics in coral reef research, the difference between anemonefish and other damselfish is currently a popular area of research. In this study the authors provide a new high-quality non-anemonefish genome, which will be of high relevance to further the depth of such analyses. In this case of the sapphire damselfish Chrysiptera cyanea, a widely distributed damselfish in the Indo-Pacific area, often studied to elucidate the roles of various environmental controls on their reproduction, and investigate related hormonal processes To further the potential of biomolecular analyses based on this species, this study generated the first genome of a Chrysiptera fish from a male individual collected in Okinawa, Japan. Using PacBio and HiFI long-read sequencing with 94.5x coverage, a chromosome-scale genome was assembled and 28,173 genes identified and annotated. Peer review gathered more parameters and details on the quality, and the final assembly comprised of 896 Mb pairs across 91 contigs, and a BUSCO completeness of 97.6%. This reference genome should therefore be of high value for future genetic-based approaches, from population structure to gene expression analyses.

Editor’s Assessment

Open in a new tab

GigaByte.

Export to Production

Editor: Scott Edmunds

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[ref1] 1.Lehmann R, Lightfoot DJ, Schunter C et al. Finding Nemo’s genes: a chromosome-scale reference assembly of the genome of the orange clownfish Amphiprion percula . Mol. Ecol. Resour., 2019; 19: 570–585. doi: 10.1111/1755-0998.12939. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] 2.Marcionetti A, Rossier V, Roux N et al. Insights into the genomics of clownfish adaptive radiation: genetic basis of the mutualism with sea anemones. Genome Biol. Evol., 2019; 11: 869–882. doi: 10.1093/gbe/evz042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref3] 3.Roberts MB, Schultz DT, Gatins R et al. Chromosome-level genome of the three-spot damselfish, Dascyllus trimaculatus . G3: Genes Genom. Genet., 2023; 13: jkac339. doi: 10.1093/g3journal/jkac339. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] 4.Ryu T, Herrera M, Moore B et al. A chromosome-scale genome assembly of the false clownfish, Amphiprion ocellaris . G3: Genes Genom. Genet., 2022; 12: jkac074. doi: 10.1093/g3journal/jkac074. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref5] 5.Schunter C, Welch MJ, Ryu T et al. Molecular signatures of transgenerational response to ocean acidification in a species of reef fish. Nat. Clim. Change, 2016; 6: 1014–1018. doi: 10.1038/nclimate3087. [DOI] [Google Scholar]

[ref6] 6.Tan MH, Austin CM, Hammer MP et al. Finding Nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (Amphiprion ocellaris) genome assembly. GigaScience, 2018; 7: gix137. doi: 10.1093/gigascience/gix137. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] 7.Sayers EW, Bolton EE, Brister JR et al. Database resources of the national center for biotechnology information. Nucleic Acids Res., 2022; 50: D20–D26. doi: 10.1093/nar/gkab1112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] 8.Lehmann R, Schunter C, Welch MJ et al. Genetic architecture of behavioural resilience to ocean acidification. bioRxiv, 2022; 10.1101/2022.10.18.512656. [DOI]

[ref9] 9.Moore B, Herrera M, Gairin E et al. The chromosome-scale genome assembly of the yellowtail clownfish Amphiprion clarkii provides insights into melanic pigmentation of anemonefish. bioRxiv, 2022; 10.1101/2022.07.21.500941. [DOI] [PMC free article] [PubMed]

[ref10] 10.Quoy J, Gaimard J. . Zoologie. Voyage autour du monde, exécuté sur les corvettes de S.M. l’Uranie et la Physicienne pendant les années 1817, 1818, 1819 et 1820. Pillet Aîné, Paris: Freycinet L, 1825; p. 392. [Google Scholar]

[ref11] 11.Allen GR. . Damselfishes of the World. Melle, Germany, Mentor, Ohio: Mergus; Aquarium Systems [distributor], 1991; ISBN-10:3882440082. [Google Scholar]

[ref12] 12.Gronell AM. . Visiting behaviour by females of the sexually dichromatic damselfish, Chrysiptera cyanea (Teleostei: Pomacentridae): a probable method of assessing male quality. Ethology, 1989; 81: 89–122. doi: 10.1111/j.1439-0310.1989.tb00760.x. [DOI] [Google Scholar]

[ref13] 13.Tamilmani G, Gopakumar G. . Chrysiptera cyanea (Quoy & Gaimard, 1825). Kochi: ICAR - Central Marine Fisheries Research Institute, 2017; pp. 301–305, ISBN 978-93-82263-14-2. [Google Scholar]

[ref14] 14.Wacker S, Ness MH, Östlund-Nilsson S et al. Social structure affects mating competition in a damselfish. Coral Reefs, 2017; 36: 1279–1289. doi: 10.1007/s00338-017-1623-4. [DOI] [Google Scholar]

[ref15] 15.Bapary MAJ, Fainuulelei P, Takemura A. . Environmental control of gonadal development in the tropical damselfish Chrysiptera cyanea . Mar. Biol. Res., 2009; 5: 462–469. doi: 10.1080/17451000802644722. [DOI] [Google Scholar]

[ref16] 16.Bapary JMA, Nurul Amin Md, Takemura A. . Food availability as a possible determinant for initiation and termination of reproductive activity in the tropical damselfish Chrysiptera cyanea . Mar. Biol. Res., 2012; 8: 154–162. doi: 10.1080/17451000.2011.605146. [DOI] [Google Scholar]

[ref17] 17.Kass JM, Vilela B, Aiello-Lammens ME et al. Wallace: a flexible platform for reproducible modeling of species niches and distributions built for community expansion. Meth. Ecol. Evol., 2018; 9: 1151–1156. doi: 10.1111/2041-210X.12945. [DOI] [Google Scholar]

[ref18] 18.QGIS Association . QGIS Geographic Information System 3.22.9, 2022; https://download.qgis.org/downloads/.

[ref19] 19.Gopakumar G, Santhosi I, Ramamoorthy N. . Breeding and larviculture of the sapphire devil damselfish Chrysiptera cyanea . J. Mar. Biol. Assoc. India, 2009; 51: 130–136. [Google Scholar]

[ref20] 20.Thresher RE, Moyer JT. . Male success, courtship complexity and patterns of sexual selection in three congeneric species of sexually monochromatic and dichromatic damselfishes (Pisces: Pomacentridae). Anim. Behav., 1983; 31: 113–127. doi: 10.1016/S0003-3472(83)80179-1. [DOI] [Google Scholar]

[ref21] 21.Steinke D, Zemlak TS, Hebert PDN. . Barcoding Nemo: DNA-based identifications for the ornamental fish trade. PLoS One, 2009; 4: e6300. doi: 10.1371/journal.pone.0006300. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref22] 22.Nakabo T. . Fishes of Japan with Pictorial Keys to the Species [Japanese]. 3rd ed, Tokai University Press, 2013. [Google Scholar]

[ref23] 23.Lee S-G, Na D, Park C. . Comparability of reference-based and reference-free transcriptome analysis approaches at the gene expression level. BMC Bioinform., 2021; 22: 310. doi: 10.1186/s12859-021-04226-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref24] 24.Herrera M, Ravasi T, Laudet V. . Anemonefishes: a model system for evolutionary genomics. F1000 Res., 2023; 12: 204. doi: 10.12688/f1000research.130752.2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] 25.Sovic I, Kronenberg Z, Dunn C et al. Improved phased assembler - PacBio HiFi genome assembly, 2020; https://raw.githubusercontent.com/ucdavis-bioinformatics-training/ucdavis-bioinformatics-training.presentations/master/assembly/UC_Davis-IPA_HiFi_Assembler_Sovic.pdf.

[ref26] 26.Kolmogorov M, Yuan J, Lin Y et al. Assembly of long error-prone reads using repeat graphs. Nat. Biotechnol., 2019; 37: 540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]

[ref27] 27.Gurevich A, Saveliev V, Vyahhi N et al. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 2013; 29: 1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref28] 28.Seppey M, Manni M, Zdobnov EM. . BUSCO: assessing genome assembly and annotation completeness. In: Kollmar M. (ed.), Gene Prediction: Methods and Protocols. New York, NY: Springer, 2019; pp. 227–245, doi: 10.1007/978-1-4939-9173-0_14. [DOI] [PubMed] [Google Scholar]

[ref29] 29.Roach MJ, Schmidt SA, Borneman AR. . Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinform., 2018; 19: 460. doi: 10.1186/s12859-018-2485-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] 30.Flynn JM, Hubley R, Goubert C et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA, 2020; 117: 9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref31] 31.Tempel S. . Using and understanding RepeatMasker. Methods Mol. Biol., 2012; 859: 29–51. doi: 10.1007/978-1-61779-603-6_2. [DOI] [PubMed] [Google Scholar]

[ref32] 32.Quinlan AR, Hall IM. . BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 2010; 26: 841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] 33.Kurtz S, Phillippy A, Delcher AL et al. Versatile and open software for comparing large genomes. Genome Biol., 2004; 5(2): R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] 34.Wickham H. . ggplot2: Elegant Graphics for Data Analysis. Cham: Springer, 2016; doi: 10.1007/978-3-319-24277-4. [DOI] [Google Scholar]

[ref35] 35.Monlong J. . MUMmerplots with ggplot2, Hippocamplus, 2017; https://jmonlong.github.io/Hippocamplus/2017/09/19/mummerplots-with-ggplot2/.

[ref36] 36.Andrews S, Biggins L, Inglesfield S et al. Babraham bioinformatics - FastQC a quality control tool for high throughput sequence data, 2019; https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (accessed March 1, 2023).

[ref37] 37.Bolger AM, Lohse M, Usadel B. . Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014; 30: 2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref38] 38.Kim D, Langmead B, Salzberg SL. . HISAT: a fast spliced aligner with low memory requirements. Nat. Meth., 2015; 12: 357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref39] 39.Li H, Handsaker B, Wysoker A et al. The sequence alignment/map format and SAMtools. Bioinformatics, 2009; 25: 2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref40] 40.Danecek P, Bonfield JK, Liddle J et al. Twelve years of SAMtools and BCFtools. GigaScience, 2021; 10: giab008. doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref41] 41.Bray NL, Pimentel H, Melsted P et al. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol., 2016; 34: 525–527. doi: 10.1038/nbt.3519. [DOI] [PubMed] [Google Scholar]

[ref42] 42.Barnett DW, Garrison EK, Quinlan AR et al. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, 2011; 27: 1691–1692. doi: 10.1093/bioinformatics/btr174. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref43] 43.Bruna T, Hoff KJ, Lomsadze A et al. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom. Bioinform., 2021; 3: lqaa108. doi: 10.1093/nargab/lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref44] 44.Bruna T, Lomsadze A, Borodovsky M. . GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom. Bioinform., 2020; 2: lqaa026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref45] 45.Buchfink B, Reuter K, Drost H-G. . Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Meth., 2021; 18: 366–368. doi: 10.1038/s41592-021-01101-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref46] 46.Gotoh O. . A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence. Nucleic Acids Res., 2008; 36: 2630–2638. doi: 10.1093/nar/gkn105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref47] 47.Hoff KJ, Lomsadze A, Borodovsky M et al. Whole-Genome Annotation with BRAKER. Gene Prediction. New York, NY: Humana, 2019; pp. 65–95, doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref48] 48.Hoff KJ, Lange S, Lomsadze A et al. BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics, 2016; 32: 767–769. doi: 10.1093/nar/gks708. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref49] 49.Iwata H, Gotoh O. . Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucleic Acids Res., 2012; 40: e161. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref50] 50.Lomsadze A, Burns PD, Borodovsky M. . Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res., 2014; 42: e119. doi: 10.1093/nar/gku557. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref51] 51.Lomsadze A, Ter-Hovhannisyan V, Chernoff YO et al. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res., 2005; 33: 6494–6506. doi: 10.1093/nar/gki937. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref52] 52.Stanke M, Diekhans M, Baertsch R et al. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics, 2008; 24: 637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]

[ref53] 53.Stanke M, Schöffmann O, Morgenstern B et al. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinform., 2006; 7: 62. doi: 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref54] 54.UniProt Consortium . UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res., 2021; 49: D480–D489. doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref55] 55.Zdobnov EM, Apweiler R. . InterProScan – an integration platform for the signature-recognition methods in InterPro. Bioinformatics, 2001; 17: 847–848. doi: 10.1093/bioinformatics/17.9.847. [DOI] [PubMed] [Google Scholar]

[ref56] 56.Altschul SF, Madden TL, Schäffer AA et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 1997; 25: 3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref57] 57.Condon K. . tispec: calculates tissue specificity from RNA-seq data, 2020; https://github.com/BioinfGuru/tispec.

[ref58] 58.Conway JR, Lex A, Gehlenborg N. . UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics, 2017; 33: 2938–2940. doi: 10.1093/bioinformatics/btx364. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref59] 59.Moore B, Herrera M, Gairin E et al. The chromosome-scale genome assembly of the yellowtail clownfish Amphiprion clarkii provides insights into the melanic pigmentation of anemonefish. G3: Genes Genom. Genet., 2023; 13: jkad002. doi: 10.1093/g3journal/jkad002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref60] 60.Gairin E, Miura S, Takamiyagi H et al. Chrysiptera cyanea genome annotation, transcriptome, proteome, and script for manuscript figure production, Figshare. [Dataset], 2024; 10.6084/m9.figshare.27143571 [DOI]

[ref61] 61.Kryuchkova-Mostacci N, Robinson-Rechavi M. . A benchmark of gene expression tissue-specificity metrics. Briefings Bioinform., 2017; 18: 205–214. doi: 10.1093/bib/bbw008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref62] 62.Yanai I, Benjamin H, Shmoish M et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics, 2005; 21: 650–659. doi: 10.1093/bioinformatics/bti042. [DOI] [PubMed] [Google Scholar]

[ref63] 63.Lecchini D, Adjeroud M, Pratchett MS et al. Spatial structure of coral reef fish communities in the Ryukyu Islands, southern Japan. Oceanol. Acta, 2003; 26: 537–547. doi: 10.1016/S0399-1784(03)00048-3. [DOI] [Google Scholar]

[ref64] 64.McCord CL, Nash CM, Cooper WJ et al. Phylogeny of the damselfishes (Pomacentridae) and patterns of asymmetrical diversification in body size and feeding ecology. PLoS One, 2021; 16: e0258889. doi: 10.1371/journal.pone.0258889. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref65] 65.Tang KL, Stiassny MLJ, Mayden RL et al. Systematics of Damselfishes. Cope1, 2021; 109: 258–318. doi: 10.1643/i2020105. [DOI] [Google Scholar]

PERMALINK

The genome of the sapphire damselfish Chrysiptera cyanea: a new resource to support further investigation of the evolution of Pomacentrids

Emma Gairin

Saori Miura

Hiroki Takamiyagi

Marcela Herrera

Vincent Laudet

Roles

Abstract

Introduction

Figure 1.

Methods

Fish collection and DNA sequencing

Sequencing data processing and genome assembly

Table 1.

Contig scaffolding on clownfish reference genomes

Transcriptome sequencing data processing

Prediction of gene models

Gene expression analysis

Results

Genome assembly of C. cyanea

Figure 2.

Table 2.

Figure 3.

Table 3.

Table 4.

Figure 4.

C. cyanea gene annotation

Tissue specificity of the gene expression

Figure 5.

Discussion

Conclusion

Acknowledgements

Funding Statement

Data availability

Abbreviations

Declarations

Ethics approval and consent to participate

Competing interests

Authors’ contributions

Funding

References

Article Submission

Ms Emma Gairin

Roles

Assign Handling Editor

Roles

Editor Assess MS

Roles

Curator Assess MS

Roles

Review MS

Roles

Review MS

Roles

Editor Decision

Roles

Minor Revision

Ms Emma Gairin

Roles

Assess Revision

Roles

Final Data Preparation

Roles

Editor Decision

Roles

Accept

Roles

Export to Production

Roles

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases