Skip to main content
GigaByte logoLink to GigaByte
. 2025 Feb 24;2025:gigabyte150. doi: 10.46471/gigabyte.150

Draft genome of the endangered visayan spotted deer (Rusa alfredi), a Philippine endemic species

Ma Carmel F Javier 1, Albert C Noblezada 1, Persie Mark Q Sienes 2,*, Robert S Guino-o 3, Nadia Palomar-Abesamis 2, Maria Celia D Malay 4, Carmelo S del Castillo 5,6, Victor Marco Emmanuel N Ferriols 1,5,*
PMCID: PMC11876970  PMID: 40041424

Abstract

The Visayan Spotted Deer (VSD), or Rusa alfredi, is an endangered and endemic species in the Philippines. Despite its status, genomic information on R. alfredi, and the genus Rusa in general, is missing. This study presents the first draft genome assembly of the VSD using the Illumina short-read sequencing technology. The resulting RusAlf_1.1 assembly has a 2.52 Gb total length, with a contig N50 of 46 Kb and scaffold N50 size of 75 Mb. The assembly has a BUSCO complete score of 95.5%, demonstrating the genome’s completeness, and includes the annotation of 24,531 genes. Our phylogenetic analysis based on single-copy orthologs revealed a close evolutionary relationship between R. alfredi and the genus Cervus. RusAlf_1.1 represents a significant advancement in our understanding of the VSD. It opens opportunities for further research in population genetics and evolutionary biology, potentially contributing to more effective conservation and management strategies for this endangered species.

Data description

The genus Rusa is native to South and Southeast Asia, inhabiting diverse habitats ranging from dense forests to grasslands [1]. The Visayan Spotted Deer (VSD), also known as the Philippine Spotted Deer and Rusa alfredi (NCBI:txid1088129), is one of three endemic species in the Philippines and is a highly rare and endangered species indigenous to the Philippines’ Visayan Islands. This region is considered one of the country’s highest conservation priority areas, particularly due to the number of threatened endemic taxa and the degree of threats to species and habitats. Characterized by their soft dark-brown coat and unique nominal spots, R. alfredi once played a vital role as herbivores in shaping vegetation dynamics. However, its extirpation from most areas makes it difficult to determine its historical ecological impact fully. It has been classified as endangered since 1988 by the Red List of Endangered Species of the International Union for Conservation of Nature (IUCN). As of 2016, only an estimated 700 mature individuals remained in the wild. The genus Rusa is facing a significant decline in biodiversity worldwide and is under immense threat of global extinction.

The geographic range of R. alfredi formerly encompassed the Central Visayan islands of Negros, Panay, Guimaras, Masbate, and Cebu. Presently, only the islands of Panay and Negros shelter small, remnant populations of wild R. alfredi (Figure 1A) [2]. Accurate reports of the population density and distribution of the species in the wild have not yet been established. Like other cervid species in the world, the steep decline in the population of R. alfredi is mainly due to deforestation and hunting, despite being legally protected. Efforts to conserve the population of R. alfredi have been put in place, including the proposed creation of new national parks and properly structured captive breeding for reintroduction to the wild. The first captive breeding program for R. alfredi in the country was established at the Department of Biology and the Center for Tropical Conservation Studies (CENTROP) of Silliman University, in Dumaguete City, Negros Oriental, Philippines from Negros Island stock [3]. Presently, it has the largest captive-bred stock of the species globally.

Figure 1.

Figure 1.

(A) Distribution Map of R. alfredi based on the IUCN Red List of Threatened Species [2]. (B) Photo of the Visayan Spotted Deer (code name: Abraham) at the CENTROP Silliman University, Dumaguete City, Negros Oriental, Philippines. Photo taken and shared by L. Cabrera, CC-BY.

Recent advancements in genomic sequencing created the possibility of producing large-scale reference genomes, which may offer new insights into an organism’s genetic diversity and architecture. This enables researchers to identify key genetic traits, track evolutionary changes, and develop strategies for conservation and breeding programs aimed at preserving biodiversity and enhancing desirable traits in various organisms. Whereas several genetic technologies are already accessible, few are being used to their full potential. The IUCN lists 15,521 animal species as threatened, and less than 3% of these species have genomic resources that can inform and aid conservation management [4]. Currently, there is no available reference genome for R. alfredi or the genus Rusa. The generation of a reference genome would give us a better understanding of the history, diversity, and demographics of this endangered Visayan-endemic deer, which is significant for the management of the captive population. In this study, the first draft genome assembly of R. alfredi was generated using Illumina short-read sequences, and could serve as a reference for gene prediction, taxonomy, evolution, landscape genetics, and conservation genomics.

Methods

Sample collection

The sampling was conducted under the Department of Environmental and Natural Resources (DENR) Region VII Gratuitous Permit No. 2022-17. The sample was obtained from a member of the captive population at the CENTROP, Silliman University, Dumaguete City, Negros Oriental, Philippines. A male deer (Abraham; Figure 1B) was restrained using a net, and a piece of ear tissue was collected using an ear notcher, a standard tool for ear tagging in animals. Before release, wound spray was applied to the ear to prevent infection and allow faster healing. The tissue sample was cleaned with 95% ethanol, placed in a 1.5 mL microcentrifuge tube with absolute ethanol, and stored at −20 °C for future use.

DNA extraction and quantification

Extraction was performed at Silliman University using the Wizard® SV Genomic DNA Purification System following the manufacturer’s protocol (Promega, 2012). The quality of the genomic DNA was subsequently checked using gel electrophoresis, Multiskan SkyHigh Spectrophotometer, and Qubit Fluorometer.

Library preparation

The library construction was carried out with 100 ng of genomic DNA following the Illumina DNA library preparation kit manufacturer’s protocol (Illumina, 2020). The resulting amplified library was quantified and controlled on an Agilent Bioanalyzer 2100 (Agilent, Santa Clara, CA) and sequenced in 2 × 151 bp paired-end reads on an Illumina NextSeq 1000 at the Philippine Genome Center Visayas, University of the Philippines Visayas, Miagao, Iloilo. A total of 157.47 Gbp of raw data was generated after sequencing.

Genome survey

The quality of the short reads was checked using FastQC v0.12.1 (RRID:SCR_014583) [5]. To remove low-quality reads and sequencing adapters, reads were trimmed using Trimmomatic v0.39 (RRID:SCR_011848) [6] with the following parameters: ILLUMINACLIP: Nextera-PE-PE.fa:2:30:10 LEADING:30 TRAILING:30 SLIDINGWINDOW:4:20 MINLEN:36. The genome size of Rusa alfredi was estimated using a k-mer-based approach. K-mer frequencies were obtained using jellyfish (RRID:SCR_005491) [7]  and the command: jellyfish count -C -m 21 -s 1G <(zcat forwards_reads.fastq.gz) <(zcat reverse_reads.fastq.gz) -t 30. K-mer count histogram was then generated by running: jellyfish histo -t 10 mer_counts.jf > reads.histo. The resulting k-mer histogram was used in GenomeScope2 (RRID:SCR_017014) [8] to estimate the genome size and heterozygosity. GenomeScope2 was run using the command: genomescope.R -i reads.histo -o genomescope_21 -k 21.

Genome assembly and quality assessment

Using the trimmed reads, the R. alfredi genome was assembled using MaSuRCA v4.1.0 (RRID:SCR_010691) [9]. The configuration file used for running MaSuRCA included “PE = pe 500 100” as the recommended safe insert size and standard deviation values for short reads and “GRAPH_KMER_SIZE = auto” for automatic selection of k-mer size (k = 99 was selected). The MaSuRCA assembly pipeline was run using the command: “masurca config.txt”. The configuration file was uploaded to GigaDB [10].

To improve the quality and contiguity of the assembly, contigs were corrected for misassemblies and scaffolded based on sequence homology using RagTag version 2.1.0 [11] with the Cervus elaphus genome (GenBank assembly accession number: GCF_910594005.1) as reference. Assembly correction was performed using RagTag with default parameters. Corrected contigs were then used for scaffolding using RagTag with default parameters.

General metrics for assessing the quality of the assembly were determined using QUAST v5.2.0 (RRID:SCR_001228) [12]. QUAST was run with the “–large” option and with the inclusion of the paired-end reads by adding the “-1” and “-2” flags to provide results for the assembly coverage. The contigs and scaffolds were also checked for completeness using Benchmarking Universal Single-Copy Orthologs, BUSCO v5.4.4 (RRID:SCR_015008) [13] using the cetartiodactyla_odb10. The assembled genome was visualized using Blobtoolkit v4.3.5 (RRID:SCR_025882) [14].

The quality of the assembly was evaluated using Merqury (RRID:SCR_022964) [15]. K-mer count from reads was obtained using the command: meryl k=21 *fastq.gz output reads.meryl threads=30 memory=30. Assembly consensus quality value (QV), k-mer completeness, and spectra-cn plots were generated running the command:

merqury.sh reads.meryl GCA_038501445.1_RusAlf_1.1_genomic.fna abraham_merq.

The number of heterozygous sites and base coverage were determined based on the reads’ alignment to the assembled genome. Reads were mapped back to the assembly using BWA-MEM (RRID:SCR_010910) [16] with the command: bwa mem -t 12 GCA_038501445.1_RusAlf_1.1_genomic.fna forward_reads.fastq.gz reverse_reads.fastq.gz | gzip -3 > aln-pe.sam.gz. Alignment was further processed using SAMtools v1.20 (RRID:SCR_002105) [17]. Specifically, mate information was added in the alignment using samtools fixmate, followed by samtools sort, and samtools markdup for marking and removing the duplicates (with -r flag). Average base coverage was determined using samtools depth. Alignment was also used for obtaining the raw variant call format (VCF) file using BCFtools v1.21 (RRID:SCR_005227) [17] by running the command: bcftools mpileup -Ou -f GCA_038501445.1_RusAlf_1.1_genomic.fna alignment.fxm.sorted.rmdup.bam | bcftools call -mv -Ov -o raw_variants.vcf. The filtered VCF file was obtained by running the command: bcftools filter -e ‘QUAL < 30 || DP < 10’ -o filtered_variant.vcf -O v raw_variants.vcf. The number of heterozygous sites was determined using the command: bcftools view -i ‘GT=“0/1”’ filtered_variant.vcf | grep -v “^#” | wc -l.

Repeats and gene annotation

Before the annotation, the assembly was screened for contaminants and the presence of mitochondrial sequences. Detected mitochondrial sequences in the assembly were either trimmed or removed from the assembly using SeqKit v2.7.0 (RRID:SCR_018926) [18]. De novo identification of the repeats was performed in the assembly using RepeatModeler v2.0.5 (RRID:SCR_015027) [19]. The Database for RepeatModeler was first generated by running the command: BuildDatabase -name VSD GCA_038501445.1_RusAlf_1.1_genomic.fna. It was followed by de novo repeat identification using the command: RepeatModeler -database VSD -threads 12 -LTRStruct. The resulting library of repeats was then merged with the mammals repeat library extracted from the Dfam database [20] using famdb.py script. The mammalian repeat library was obtained using the command: famdb.py -i Dfam.h5 families -a -d -f fasta_name “mammals” > mammals_repeat_library.fasta. The combined libraries were then used to soft mask the repeats in the genome using RepeatMasker v4.1.5 (RRID:SCR_012954) [21] with ‘-s -xsmall’ options. For gene annotation, homology-based gene prediction was performed using the Gene Model Mapper (GeMoMa v1.9, RRID:SCR_017646) Pipeline [22] with Cervus elaphus genome (GenBank accession number: GCF_910594005.1) as reference. GeMoMa was run using the command: GeMoMa -Xmx50G GeMoMaPipeline threads=12 outdir=GeMoMa GeMoMa.Score=ReAlign AnnotationFinalizer.r=NO o=true t=RusAlf_v1.1.fna a=mCerela.gff g=GCF_910594005.1_mCerEla1.1_genomic.fna. Additional gene annotation was obtained using the BRAKER v3.0.8 annotation pipeline C (RRID:SCR_018964) [2233]. Vertebrata protein sequences from the OrthoDB v11 (RRID:SCR_011980) [34] partition were used to serve as extrinsic evidence for gene prediction in the soft-masked genome. BRAKER-annotated genes were filtered by retaining only those with hits in the Pfam database [35], identified using InterProScan v5.72-103 (RRID:SCR_005829) [36]. Verified annotated genes from BRAKER were then added to gene annotation from GeMoMa using AGAT v1.4.2 [37]. To ensure that gene annotation structures were retained, only gene annotations from BRAKER with no overlapping contained coding sequences (CDS) were added to the gene annotations from GeMoMa to generate the final gene set using the agat_sp_complement_annotations.pl script. Protein sequences from the final gene set of R. alfredi were extracted for further downstream analysis.

Phylogenetic tree

A phylogenetic tree of R. alfredi and other species of cervids was constructed based on single-copy orthologs. Protein sequences from reference genomes of Odocoileus virginianus, Rangifer tarandus, Muntiacus muntjak, Muntiacus reevesi, Dama dama, Cervus hanglu yarkandensis, C. elaphus, and Cervus canadensis were downloaded from GenBank, while sequences for Cervus nippon were downloaded from the Figshare database [38]. These sequences were used together with the predicted protein sequences of R. alfredi to create a species tree. Asian water buffalo (Bubalus bubalis) was included to serve as an outgroup. The longest transcript per gene in each species protein dataset was identified and retained using primary_transcript.py from OrthoFinder v.2.5.5 [39]. Single-copy orthologs were identified using OrthoFinder v.2.5.5 (RRID:SCR_017118) [39]. The sequences were renamed with the corresponding species ID using Seqkit v2.7.0 [18], and each ortholog was aligned using MUSCLE v5.1.0 (RRID:SCR_011812) [40]. Aligned sequences were then concatenated using Seqkit v2.7.0 [18], and trimming was performed using Gblocks v0.91b (RRID:SCR_015945) [41] with default parameters. The maximum likelihood tree was generated using IQ-TREE v2.3.6 (RRID:SCR_017254) [42] with ModelFinder [43] for model selection based on Bayesian Information Criterion (BIC) and bootstrap set at 1000. The maximum likelihood (ML) tree based on single-copy orthologs was constructed using the command: iqtree -s MSA_cervid_sco_concat_sorted_trimmed.fasta -m MFP -B 1000. The resulting ML tree was then visualized using iTOL (RRID:SCR_018174) [44].

Mitochondrial genome assembly, annotation, and phylogenetics

The R. alfredi mitochondrial genome was also assembled using MITObim v1.9 (RRID:SCR_015056) [45]. The complete cytochrome oxidase I (COI) sequence from the existing Rusa alfredi complete mitogenome (NCBI Accession number JN632698.1) was used as seed fasta for the assembly. A random sampling of 20% of reads was performed using the following command: “downsample.py -s 20 –interleave -r forward_read -r reverse_read | gzip > sampled_20.fastq.gz”. Sampled reads were then used for the assembly by running the command: MITObim.pl -start 1 -end 100 -sample mysample -ref myref -readpool sampled_20.fastq.gz -quick seed.fasta –pair. The circular topology of the assembly was checked using the command: circules.py -f assembled_mtDNA.fasta.

Annotation was then performed in the assembled mitogenome using MitoZ v3.6 [46]. Annotation of the assembled mitogenome was performed using the command: mitoz annotate –fastafiles Abraham_mtDNA_genome.fasta –outprefix annotation –thread_number 12 –clade Chordata. An ML tree was also constructed based on 13 concatenated coding sequences of mitochondrial genomes of different species of cervids. Concatenated coding sequences were aligned in MEGA11 (RRID:SCR_000667) [47] using MUSCLE [40]. After alignment, the ML tree was constructed using IQ-TREE v2.3.6 [42] with the use of ModelFinder [43] for model selection based on the BIC. The ML tree was constructed with a bootstrap of 1,000. Water buffalo (B. taurus) was selected as an outgroup. The phylogenetic tree was visualized in iTOL [44]. An ML tree based on single-copy orthologs was constructed using the command: iqtree -s MSA_cervid_sco_concat_sorted_trimmed.fasta -m MFP -B 1000. The resulting ML tree was then visualized using iTOL [44].

Results and discussion

Reference genomes play a crucial role in understanding genetic variation and the molecular underpinnings of traits across various organisms. They facilitate gene annotation, regulatory elements identification, and the elucidation of biological processes. Molecular investigations in cervid species have predominantly focused on systematic relationships using mitochondrial genomes [48], leaving gaps in understanding the adaptive potential and genetic basis of traits and the resolution of deeper nodes (above the family level) in population studies. Moreover, mitochondrial genomes alone may not provide a complete reconstruction of a species’ evolutionary history since it is maternally inherited. Furthermore, several species are underrepresented in genomic databases due to their threatened conservation status or lack of available data, hindering sample collection [49]. This study presents the draft genome assembly of Rusa alfredi (RusAlf_1.1), marking the first genome assembly for the genus Rusa. This contribution is pivotal for conducting integrative analyses essential for the conservation and management strategies of R. alfredi amidst the threats of human, environmental, and emerging diseases.

Genome survey

The genome of R. alfredi (codename: Abraham) was estimated to be 2.37 GB in length with a low level of heterozygosity (0.30%) based on k-mer analysis using GenomeScope [8] (Figure 2). Based on the analysis using the mapped reads, a total of 4,305,197 (0.17%) heterozygous sites were identified, confirming the low heterozygosity of the genome. The genome size was also similar to the one estimated by MaSuRCA (2.37 GB) and close to the actual total length of the assembled contigs (2.51 GB). The K-mer distribution showed a single peak, indicating a high homozygosity (99.70%) of the assembled genome.

Figure 2.

Figure 2.

K-mer (21) distribution. GenomeScope2 was used to estimate the genome size and heterozygosity of the Rusa alfredi genome. len - estimated haploid genome length; aa - homozygosity; ab - heterozygosity; k-cov - mean heterozygous k-mer coverage, err - read error rate; dup - the average rate of read duplications; k: k-mer size used for the run; p - ploidy.

Captive populations have low heterozygosity compared to wild populations primarily due to factors like inbreeding and bottleneck effect [50], which limit genetic diversity in smaller, isolated groups. The variations between the two haplotypes in genomes with low heterozygosity often involve smaller-scale differences, making alignment easier during genome assembly and leading to accurate consensus sequences. In deer genomes, such as those of the sika deer [51] and the white-tailed deer [52], low levels of heterozygosity have been shown to simplify de novo assembly and improve alignment accuracy.

Genome assembly and quality assessment

The assembled draft genome of R. alfredi, RusAlf_1.1, has 171,678 total contigs with a total length of 2.5 GB. The genome size of the assembled genome was found to be comparable to the genomes of other cervid species, such as Cervus hanglu yarkadensis with 2.6 GB (CEY_v1, GenBank accession GCA_010411085.1) and Muntiacus muntjak with 2.57 GB (UCB_Mmun_1.0, GenBank accession GCA_008782695.1). The assembled genome has short contiguity with N50 of 46 kb, which was expected considering the limitations of short paired-end reads (2 × 151 bp) to resolve the repeats in large genomes [53]. Long-read sequences are usually added for the assembly to achieve longer contiguity, which adds to the overall cost associated with genome assembly efforts. Genome assembly can be improved using reference genomes, provided the reference genome is closely related to the target species. The genus Cervus is one of the closest relatives of R. alfredi based on the phylogenetic tree of mitochondrial genomes of the tribe Cervini [54]. For the draft genome RusAlf_1.1, mCerEla1.1 (GenBank accession GCA_910594005.1) was used to correct misassembled contigs based on sequence homology and improve the assembly through homology-based scaffolding. The final assembly has a total of 57,916 scaffolds, scaffold N50 of 75 MB, and scaffold L50 of 13 (Figure 3A). The same homology-based assembly was performed using C. elaphus, mCerEla1.1, as a reference for mounting contigs for chromosome-level assembly of the fallow deer (Dama dama) reference genome [55].

Figure 3.

Figure 3.

(A) Assembly metrics and BUSCO scores of RusAlf_1.1 and (B) Repeat elements in the draft genome of RusAlf_1.1.

The quality of genome assemblies is generally assessed based on contiguity and completeness. It was highlighted that interpreting the quality of the assembly using metrics like N50 or L50 alone can be misleading, as it only measures the assembly contiguity and does not consider the assembly completeness and correctness [56]. In this study, despite the low level of contiguity, the draft genome of R. alfredi scored a high level of completeness with 95.5% complete BUSCO using cetartiodactyla_odb10 (n = 13,335). The assembled genome also scored a high Merqury QV of 47 (equivalent to about 99.99% base accuracy) and a completeness score of 96.76%. In addition, the k-mer spectrum plot shows a single high peak for 1-copy k-mer (red) and a very small area for 2-copy k-mer (blue), indicating a homozygous genome (Figure 4). The assembly quality was also checked by mapping the reads back to the final assembly using the QUAST pipeline. The read mapping results revealed that 99.38% of the reads were successfully mapped back to the assembly with a mean base coverage of 47×. This study showed that a high level of assembly completeness of the draft genome can still be achieved using only short paired-end reads. It is worth noting that the estimated genome size from the genome survey is smaller than the assembled genome size mainly due to differences in methodology. K-mer analysis underestimates size by excluding repetitive sequences and errors, while a whole genome assembly includes all data, including repetitive regions and possible duplications. This results in a larger assembled genome size compared to the survey estimate.

Figure 4.

Figure 4.

Merqury k-mer spectrum plot of the assembled genome of Rusa alfredi against the Illumina short paired-end reads. Read-only (grey) represents k-mers that are only found in reads but not in the assembly. Colors represent k-mers found in reads and the assembly 1× (red), 2× (blue), 3× (green), 4× (purple), and >4× (yellow).

Comparison of the BUSCO results between contigs (from MaSuRCA) and scaffolds (MaSuRCA+RagTag correct and scaffold) of the assembly shows improvement in the completeness of the assembly with a complete score increasing from 74.2% to 95.5% (Table 1). RagTag statistics after scaffolding also showed high confidence scores (average grouping confidence: 99.78%; average location confidence: 99.65%). However, it should be noted that using reference genomes of different species for scaffolding could introduce errors, considering the structural variations even between genomes of two related species. As there are no current genetic maps and limited related genomic resources for Rusa alfredi, structural variations in the genome could be addressed and validated in future studies by incorporating long-read sequencing as well as Hi-C libraries. Nevertheless, the current draft genome of R. alfredi serves as a valuable foundational resource for the continued conservation of this species.

Table 1.

BUSCO summary results for contigs and scaffolds of the Rusa alfredi draft genome.

BUSCO (n = 13,335, cetartiodactyla_odb10) INITIAL CONTIGS (using MaSuRCA v4.1.0) SCAFFOLD (with homology-based correction and scaffolding using RagTag v2.1.0)
Complete (single + duplicated) 74.2% (9,901) 95.5% (12,741)
Single-copy 71.5% (9,538) 92.9% (12,392)
Duplicated 2.7% (363) 2.6% (349)
Fragmented 7.0% (935) 1.3% (178)
Missing 18.8% (2,499) 3.2% (416)

Genome annotation report

The RusAlf_1.1 genome is comprised of 44.27% of total interspersed repeat sequences. Most repeats were classified as retroelements, comprising 40.16% of the genome, followed by DNA transposons, 2.61% of the genome, and unclassified repeats, 1.50% of the genome (Figure 3B). The repetitive sequence analysis revealed similarities to several cervid species’ genomes in terms of genomic composition. For instance, in the Sika deer (Cervus nippon), repetitive sequences make up around 45.38% of its genome [57]. Among repetitive elements, long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs), and long terminal repeats (LTRs) are the most abundant. Similar patterns are observed in Tarim red deer (Cervus elaphus yarkandensis) [58], Siberian musk deer (Moschus moschiferus) [59], white-tailed deer (Odocoileus virginianus) [60], and reindeer (Rangifer tarandus) [61], where repetitive sequences account for significant portions of their genomes, ranging from 39.1% to 42.4%. It was also found that the simple sequence repeats (SSRs) make up about 0.76% of the RusAlf_1.1 genome. SSRs or microsatellites are highly polymorphic loci that can be used for conservation genetics to estimate genetic structure. Genetic diversity plays a crucial role in wildlife management and disease mitigation, as demonstrated by studies on wild pig populations in Texas and roe deer in Iberia, emphasizing the need to integrate genetic data into conservation strategies [62, 63]. The captive population of Silliman University would directly benefit from the assembled genome in assisting their current genetic diversity studies.

Gene annotation of RusAlf_1.1 was initially performed through homology-based gene prediction. The addition of transcriptome data has been shown to improve the accuracy of gene prediction [64], especially for de novo gene prediction. However, obtaining transcriptome data for critically endangered species like R. alfredi is challenging due to its limited population size and the ethical and logistic constraints of sampling. Additionally, obtaining a sample for RNA-Seq in this study was not possible due to limited financial resources. Nevertheless, a total of 22,862 genes were predicted from the RusAlf_1.1 genome through homology, which is comparable to the 22,941 predicted genes in the red deer (Cervus elaphus) genome [65]. To further predict genes present in the genome, additional gene prediction was performed using the BRAKER pipeline C, incorporating a protein database for external evidence in gene prediction. BRAKER initially predicted a total of 35,129 genes, of which 16,343 were verified using InterProScan with the Pfam database. Among these verified genes, 1,669 CDS were unique to BRAKER and did not overlap with the GeMoMa annotation. These genes with non-overlapping CDS were added to the GeMoMa annotation, resulting in a final gene count of 24,531 for R. alfredi. The predicted genes in RusAlf_1.1 were then used to study the phylogenetic relationship of R. alfredi with other cervids with sequenced genomes.

Phylogenetic inference

A phylogenetic analysis was constructed based on single-copy orthologs of different species of deer (Figure 5). The resulting phylogenetic tree showed a monophyletic grouping of the four species of Cervus, namely C. elaphus (GenBank accession GCA_910594005.1), C. hanglu yarkandensis (CEY_v1, GenBank accession GCA_010411085.1), C. nippon (GenBank accession GCA_040085125.1), and C. canadensis (GenBank accession GCF_019320065.1). A similar tree was depicted in a previous study with the addition of RusAlf_1.1 from this study [66]. The species tree revealed a close relationship between RusAlf_1.1 and the genus Cervus. This result supports a previous study based on complete mitochondrial genomes, suggesting that the genus Rusa is sister to Cervus [67]. However, the phylogenetic position of R. alfredi relative to other species of Rusa could not be evaluated due to the absence of genome data for other Rusa species. The continued efforts for the genome assembly of Rusa species will be crucial for elucidating the evolutionary relationships between Cervus and Rusa.

Figure 5.

Figure 5.

ML tree of different cervid species based on single-copy orthologs. The ML tree was constructed from the multiple sequence alignment (MSA) of 7,188 concatenated single-copy orthologs of nuclear genomes. The MSA reached a total of 3,992,528 amino acid sites after trimming. The ML tree was constructed using Q.mammal+F+I+R10 substitution model and bootstrap set at 1,000.

Mitochondrial genome assembly, annotation, and phylogenetics

The complete mitochondrial genome for RusAlf_1.1 was also assembled using short paired-end reads. The final assembly has a 16,356 bp total length. A total of 13 coding genes, 22 tRNA genes, and two rRNA genes were annotated in the assembled mitogenome. The assembly was uploaded in the GenBank with accession number PQ083075.

An ML tree of different species of cervids using concatenated coding sequences of mitogenomes (Figure 6) showed subdivisions between subfamilies of cervids: Capreolinae and Cervinae. The monophyletic grouping of R. alfredi Abraham (GenBank Accession number PQ083075.1) and the reference mitogenome for R. alfredi (GenBank Accession number JN632698.1) was also observed. Our ML tree result further supports the close relationship between R. alfredi and the genus Cervus.

Figure 6.

Figure 6.

ML tree of different species of cervids based on concatenated coding sequences from complete mitochondrial genomes. The ML tree was constructed using the TIM2+F+I+R3 substitution model with 1,000 bootstrap replicates.

Mitogenome sequences have become valuable resources for elucidating phylogenetic relationships among different cervids. For instance, the proposed transfer of Rucervus eldii to the genus Panolia was due to mitogenomic evidence of its close relationship with Elaphurus davidianus and its separation from Rucervus duvaucelii [67, 68]. In this study, we found that Rusa forms an evolutionary grade with Cervus due to the position of the latter as a monophyletic clade nested within Rusa. Rusa alfredi was recovered as basal to the Rusa + Cervus clade, agreeing with a previous mitogenomic phylogeny [67]. The basal position of R. alfredi in the clade raises interesting questions about the evolutionary history of the cervids, particularly in island environments such as the Philippines. The patterns of speciation and diversification of cervids in insular southeast Asia require further study. It is recommended to also sequence the genome of R. marianna, another cervid endemic to the Philippines, as well as other allopatrically-distributed Rusa populations, to elucidate their evolutionary histories and taxonomic distinctiveness. Increased taxon sampling could also potentially serve to test the Pleistocene Aggregate Island Complex theory [69] by examining patterns of divergence, gene flow, and demographic history of deer populations that were potentially connected during periods of low sea levels but are currently separated in different islands. In addition, Pleistocene climate-driven changes in the availability of suitable habitats may have also caused disjunct distributions and diversification [70]. Paleoclimatic models can be incorporated to understand the physical and environmental factors that may have promoted diversification in Philippine cervids.

Previous studies showed evidence of hybridization between different species of Cervus [71, 72] and between species of Rusa [73]. Subsequent backcrossing of hybrids to the population could cause mitochondrial introgression, which could obscure or complicate phylogenetic reconstruction if based solely on the mitochondrial genomes. In the case of R. alfredi, although hybridization between R. alfredi and R. marianna was previously observed [2, 3], it was unlikely to happen in the current small population size and non-overlapping geographic distribution of the two species. Also, there is still a lack of genetic evidence for the previous report of hybridization between R. alfredi and R. marianna that can support the possible mitochondrial introgression in the current captive and wild populations of R. alfredi. Providing whole genome sequences for other native species of Rusa could further provide genomic resources for detecting hybrids, which will also help the management and monitoring of these species, especially for the reintroduction of captive populations in the wild.

The assembled genome of R. alfredi represents an advancement in the research and conservation efforts for this endangered endemic species. It not only reinforces previous taxonomic classifications of R. alfredi but also facilitates the evaluation of its evolutionary relationships with other species of Rusa and Cervus [3, 74]. This underscores the importance of obtaining additional genomic data from more Rusa and Cervus species. Considering the limitations of the draft assembly using short reads sequencing and the possibilities of misassembly given the used methods and resources, the quality of the genome of R. alfredi can be improved by adding RNA-Seq, karyotyping to establish a clear chromosomal framework, integrating long-read sequencing to enhance contiguity and accuracy, and utilizing Hi-C libraries to detect and resolve structural variations. These approaches will not only refine the genome assembly but also provide critical insights into structural differences between R. alfredi and other Cervus species, ultimately contributing to more robust conservation strategies. Nevertheless, the initial availability of a genomic resource will support the development of targeted conservation strategies among the captive population. Incorporating samples from wild populations of R. alfredi will also allow us to identify genes that have evolved in captive settings, informing us about survival adaptations crucial for reintroduction efforts into the wild. This work enables further studies, such as microsatellite analysis, SNPs, RADseq, reference gene characterization, and whole-genome resequencing [75].

Acknowledgements

The authors would like to express their gratitude to Ozzy Boy S. Nicopior for his assistance in generating the distribution map used in this paper.

Funding Statement

This work was supported by the Philippine Genome Center Visayas, University of the Philippines Visayas.

Data availability

The genome assembly generated in this study has been deposited at NCBI GenBank under the accession JBCEYX000000000. All sequencing reads can be accessed through the NCBI SRA (BioProject number: PRJNA1102104). Files generated in this study (Illumina reads, codes, configuration file for assembly, assembled genome, annotations, MSA, and phylogenetic tree files) are available in GigaDB [10].

Abbreviations

BIC, Bayesian Information Criterion; CDS, contained coding sequences; CENTROP, Center for Tropical Conservation Studies; COI, cytochrome oxidase I; DENR, Department of Environmental and Natural Resources; IUCN, International Union for Conservation of Nature; LINEs, long interspersed nuclear elements; LTRs, long terminal repeats; ML, maximum likelihood; MSA, multiple sequence alignment; QV, quality value; SINEs, short interspersed nuclear elements; SSrs, simple sequence repeats; VCF, variant call format; VSD, Visayan Spotted Deer.

Declarations

Ethics approval and consent to participate

All study procedures and utility of experimental animals were conducted following the Republic Act No. 8485 or The Animal Welfare Act of 1998 of the Philippines. The tissue sampling carried out for this study was approved by the DENR Region VII (Gratuitous Permit No. 2022-17 Series of 2022). The authors declare that ethical approval was not required for this type of research.

Competing interests

The authors declare that there is no conflict of interest.

Authors’ contributions

MCM, VMEF, CDC, RSG, and NPA conceptualized and supervised the study. VMEF secured the funding for the conduct of the study. PMS facilitated the permit for sample collection and handled sample preparation before sequencing. MCJ performed the experiment and managed the project. AN conducted the assembly and bioinformatics analysis. MCJ and AN wrote the manuscript with contributions from all authors. All authors reviewed and approved the final manuscript.

Funding

This work was supported by the Philippine Genome Center Visayas, University of the Philippines Visayas.

References

  • 1.Ali NANG, Abdullah ML, Nor SAM et al. A review of the genus Rusa in the indo-malayan archipelago and conservation efforts. Saudi J. Biol. Sci., 2021; 28(1): 10–26. doi: 10.1016/J.SJBS.2020.08.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Brook S. . Rusa alfredi, Phillipine spotted deer. In: The IUCN Red List of Threatened Species. 2016; e.T4273A22168782, doi: 10.2305/IUCN.UK.2016-2.RLTS.T4273A22168782.en. [DOI] [Google Scholar]
  • 3.Oliver WLR, Cox CR, Dolar LL. . The Philippine spotted deer conservation project. Oryx, 1991; 25(4): 199–205. doi: 10.1017/S0030605300034335. [DOI] [Google Scholar]
  • 4.Hogg CJ, Ottewell K, Latch P et al. Threatened species initiative: empowering conservation action using genomic resources. Proc. Natl. Acad. Sci. USA, 2022; 119(4): e2115643118. doi: 10.1073/PNAS.2115643118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Andrews S. . FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/people.html#simon. Accessed September 14, 2024.
  • 6.Bolger AM, Lohse M, Usadel B. . Genome analysis trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014; 30(15): 2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Marçais G, Kingsford C. . A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 2011; 27(6): 764–770. doi: 10.1093/BIOINFORMATICS/BTR011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ranallo-Benavidez TR, Jaron KS, Schatz MC. . GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun., 2020; 11(1): 1432. doi: 10.1038/s41467-020-14998-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zimin AV, Marçais G, Puiu D et al. The MaSuRCA genome assembler. Bioinformatics, 2013; 29(21): 2669–2677. doi: 10.1093/BIOINFORMATICS/BTT476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Javier MCF, Noblezada AC, Sienes PMQ et al. Supporting data for “Draft genome of the endangered visayan spotted deer (Rusa alfredi), a Philippine endemic species”. GigaScience Database, 2025; 10.5524/102662. [DOI] [Google Scholar]
  • 11.Alonge M, Lebeigle L, Kirsche M et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol., 2022; 23(1): 258. doi: 10.1186/S13059-022-02823-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.GitHub . ablab/quast: Genome assembly evaluation tool. https://github.com/ablab/quast. Accessed: October 10, 2024.
  • 13.Manni M, Berkeley MR, Seppey M et al. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol., 2021; 38(10): 4647–4654. doi: 10.1093/MOLBEV/MSAB199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Challis R, Richards E, Rajan J et al. BlobToolKit – interactive quality assessment of genome assemblies. G3 Genes|Genomes|Genetics, 2020; 10(4): 1361–1374. doi: 10.1534/G3.119.400908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Rhie A, Walenz BP, Koren S et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol., 2020; 21(1): 245. doi: 10.1186/S13059-020-02134-9/FIGURES/6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Li H, Durbin R. . Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 2009; 25(14): 1754–1760. doi: 10.1093/BIOINFORMATICS/BTP324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Danecek P, Bonfield JK, Liddle J et al. Twelve years of SAMtools and BCFtools. GigaScience, 2021; 10(2): giab008. doi: 10.1093/GIGASCIENCE/GIAB008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Shen W, Sipos B, Zhao L. . SeqKit2: a Swiss army knife for sequence and alignment processing. iMeta, 2024; 3(3): e191. doi: 10.1002/IMT2.191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Flynn JM, Hubley R, Goubert C et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA, 2020; 117(17): 9451–9457. doi: 10.1073/PNAS.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.GitHub . Dfam-consortium/FamDB: FamDB file format library and utilities. https://github.com/Dfam-consortium/FamDB. Accessed: October 08, 2024.
  • 21.Smit A, Hubley R, Green P. . RepeatMasker Open-4.0. http://www.repeatmasker.org. Accessed: October 09, 2024.
  • 22.Keilwagen J, Hartung F, Grau J. . GeMoMa: homology-based gene prediction utilizing intron position conservation and RNA-seq data. Methods Mol. Biol., 2019; 1962: 161–177. doi: 10.1007/978-1-4939-9173-0_9. [DOI] [PubMed] [Google Scholar]
  • 23.Gabriel L, Hoff KJ, Brůna T et al. TSEBRA: transcript selector for BRAKER. BMC Bioinform., 2021; 22(1): 566. doi: 10.1186/s12859-021-04482-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Stanke M, Schöffmann O, Morgenstern B et al. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinform., 2006; 7(1): 62. doi: 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Stanke M, Diekhans M, Baertsch R et al. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics, 2008; 24(5): 637–644. doi: 10.1093/BIOINFORMATICS/BTN013. [DOI] [PubMed] [Google Scholar]
  • 26.Iwata H, Gotoh O. . Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucleic Acids Res., 2012; 40(20): e161, doi: 10.1093/NAR/GKS708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Buchfink B, Xie C, Huson DH. . Fast and sensitive protein alignment using DIAMOND. Nat. Methods, 2014; 12(1): 59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
  • 28.Gotoh O. . A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence. Nucleic Acids Res., 2008; 36(8): 2630–2638. doi: 10.1093/NAR/GKN105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lomsadze A, Ter-Hovhannisyan V, Chernoff YO et al. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res., 2005; 33(20): 6494–6506. doi: 10.1093/NAR/GKI937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Brůna T, Lomsadze A, Borodovsky M. . GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom. Bioinform., 2020; 2(2): lqaa026. doi: 10.1093/NARGAB/LQAA026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hoff KJ, Lomsadze A, Borodovsky M et al. Whole-genome annotation with BRAKER. Methods Mol. Biol., 2019; 1962: 65–95. doi: 10.1007/978-1-4939-9173-0_5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Brůna T, Hoff KJ, Lomsadze A et al. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom. Bioinform., 2021; 3(1): lqaa108. doi: 10.1093/NARGAB/LQAA108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hoff KJ, Lange S, Lomsadze A et al. BRAKER1: unsupervised RNA-seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics, 2016; 32(5): 767–769. doi: 10.1093/BIOINFORMATICS/BTV661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kuznetsov D, Tegenfeldt F, Manni M et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res., 2023; 51(D1): D445–D451. doi: 10.1093/NAR/GKAC998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Mistry J, Chuguransky S, Williams L et al. Pfam: the protein families database in 2021. Nucleic Acids Res., 2021; 49(D1): D412–D419. doi: 10.1093/NAR/GKAA913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Jones P, Binns D, Chang H-Y et al. InterProScan 5: genome-scale protein function classification. Bioinformatics, 2014; 30(9): 1236–1240. doi: 10.1093/BIOINFORMATICS/BTU031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Dainat J, Hereñú D, Murray KD et al. NBISweden/AGAT: AGAT-v1.4.1. Zenodo. 2024; 10.5281/ZENODO.13799920. [DOI]
  • 38.Wang Q, Han R, Xing H et al. A consensus genome of sika deer (Cervus nippon) and transcriptome analysis provided novel insights on the regulation mechanism of transcript factor in antler development. BMC Genom., 2024; 25(1): 617. doi: 10.1186/S12864-024-10522-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Emms DM, Kelly S. . OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol., 2019; 20(1): 238. doi: 10.1186/S13059-019-1832-Y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Edgar RC. . MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform., 2004; 5(1): 113. doi: 10.1186/1471-2105-5-113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Castresana J. . Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol., 2000; 17(4): 540–552. doi: 10.1093/OXFORDJOURNALS.MOLBEV.A026334. [DOI] [PubMed] [Google Scholar]
  • 42.Nguyen LT, Schmidt HA, Von Haeseler A et al. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol., 2015; 32(1): 268–274. doi: 10.1093/MOLBEV/MSU300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Kalyaanamoorthy S, Minh BQ, Wong TKF et al. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods, 2017; 14(6): 587–589. doi: 10.1038/nmeth.4285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Letunic I, Bork P. . Interactive tree of life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res., 2024; 52(W1): W78–W82. doi: 10.1093/NAR/GKAE268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Hahn C, Bachmann L, Chevreux B. . Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads—a baiting and iterative mapping approach. Nucleic Acids Res., 2013; 41(13): e129. doi: 10.1093/NAR/GKT371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Meng G, Li Y, Yang C et al. MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization. Nucleic Acids Res., 2019; 47(11): e63, doi: 10.1093/NAR/GKZ173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Tamura K, Stecher G, Kumar S. . MEGA11: molecular evolutionary genetics analysis version 11. Mol. Biol. Evol., 2021; 38(7): 3022–3027. doi: 10.1093/MOLBEV/MSAB120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hassanin A, Delsuc F, Ropiquet A et al. Pattern and timing of diversification of Cetartiodactyla (Mammalia, Laurasiatheria), as revealed by a comprehensive analysis of mitochondrial genomes. C. R. Biol., 2011; 335(1): 32–50. doi: 10.1016/j.crvi.2011.11.002. [DOI] [PubMed] [Google Scholar]
  • 49.Heckeberg NS, Erpenbeck D, Wörheide G et al. Systematic relationships of five newly sequenced cervid species. PeerJ, 2016; 2016(4): e2307. doi: 10.7717/PEERJ.2307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Liang HM, Yang K-T, Cheng Y-T et al. Genetic diversity and population structure in captive populations of formosan sambar deer (Rusa unicolor swinhoei). Animals, 2023; 13(19): 3106. doi: 10.3390/ANI13193106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Wang Q, Han R, Xing H et al. A consensus genome of sika deer (Cervus nippon) and transcriptome analysis provided novel insights on the regulation mechanism of transcript factor in antler development. BMC Genom., 2024; 25(1): 617. doi: 10.1186/S12864-024-10522-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.London EW, Roca AL, Novakofski JE et al. A De novo chromosome-level genome assembly of the white-tailed deer, Odocoileus Virginianus . J. Heredity, 2022; 113(4): 479–489. doi: 10.1093/JHERED/ESAC022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Baptista RP, Reis-Cunha JL, DeBarry JD et al. Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231. Microb. Genom., 2018; 4(4): e000156. doi: 10.1099/MGEN.0.000156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Ghazi MG, Sharma SP, Tuboi C et al. Population genetics and evolutionary history of the endangered Eld’s deer (Rucervus eldii) with implications for planning species recovery. Sci. Rep., 2021; 11(1): 2564. doi: 10.1038/S41598-021-82183-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Barnard RK, Smith JA, Yuan N et al. An announcement of a new genome sequence available for Dama dama (fallow deer). Forensic Sci. Int.: Animals and Environments, 2023; 4: 100074. doi: 10.1016/j.fsiae.2023.100074. [DOI] [Google Scholar]
  • 56.Porrelli S, Gerbault-Seureaui M, Rozz R et al. Draft genome of the lowland anoa (Bubalus depressicornis) and comparison with buffalo genome assemblies (Bovidae, Bubalina). G3 Genes|Genomes|Genetics, 2022; 12(11): jkac234. doi: 10.1093/g3journal/jkac234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Xing X, Ai C, Wang T et al. The first high-quality reference genome of sika deer provides insights for high-tannin adaptation. Genomics Proteomics Bioinform., 2023; 21(1): 203–215. doi: 10.1016/J.GPB.2022.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Ba H, Cai Z, Gao H et al. Chromosome-level genome assembly of Tarim red deer, Cervus elaphus yarkandensis . Sci. Data, 2020; 7(1): 187. doi: 10.1038/s41597-020-0537-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Yi L, Su R, Lin W et al. Whole-genome sequencing of wild Siberian musk deer (Moschus moschiferus) provides insights into its genetic features. BMC Genom., 2020; 21(1): 108. doi: 10.1186/S12864-020-6495-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.London EW, Roca AL, Novakofski JE et al. A De Novo chromosome-level genome assembly of the white-tailed deer, Odocoileus Virginianus . J. Heredity, 2022; 113(4): 479–489. doi: 10.1093/jhered/esac022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Li Z, Lin Z, Ba H et al. Draft genome of the reindeer (Rangifer tarandus). GigaScience, 2017; 6(12): gix102. doi: 10.1093/GIGASCIENCE/GIX102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Barros T, Ferreira E, Rocha RG et al. The multiple origins of roe deer populations in western iberia and their relevance for conservation. Animals, 2020; 10(12): 2419. doi: 10.3390/ANI10122419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Delgado-Acevedo J, Zamorano A, Deyoung RW et al. Genetic population structure of wild pigs in southern texas. Animals, 2021; 11(1): 168. doi: 10.3390/ani11010168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Prasad TSK, Mohanty AK, Kumar M et al. Integrating transcriptomic and proteomic data for accurate assembly and annotation of genomes. Genome Res., 2017; 27(1): 133–144. doi: 10.1101/GR.201368.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Pemberton J, Johnston SE, Fletcher TJ et al. The genome sequence of the red deer, Cervus elaphus Linnaeus 1758 [version 1; peer review: 1 approved, 1 approved with reservations]. Wellcome Open Res., 2021; 6: 336. doi: 10.12688/wellcomeopenres.17493.1. [DOI] [Google Scholar]
  • 66.Tang L, Dong S, Xing X. . Comparative genomics reveal phylogenetic relationship and chromosomal evolutionary events of eight cervidae species. Animals, 2024; 14(7): 1063. doi: 10.3390/ANI14071063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Mackiewicz P, Matosiuk M, Świsocka M et al. Phylogeny and evolution of the genus Cervus (Cervidae, Mammalia) as revealed by complete mitochondrial genomes. Sci. Rep., 2022; 12(1): 16381. doi: 10.1038/S41598-022-20763-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Pitra C, Fickel J, Meijaard E et al. Evolution and phylogeny of old world deer. Mol. Phylogenet. Evol., 2004; 33(3): 880–895. doi: 10.1016/J.YMPEV.2004.07.013. [DOI] [PubMed] [Google Scholar]
  • 69.Heaney LR. . Biogeography of mammals in SE Asia: estimates of rates of colonization, extinction and speciation. Biol. J. Linnean Soc., 1986; 28(1–2): 127–165. doi: 10.1111/J.1095-8312.1986.TB01752.X. [DOI] [Google Scholar]
  • 70.Hosner PA, Sánchez-González LA, Townsend Peterson A et al. Climate-driven diversification and pleistocene refugia in philippine birds: evidence from phylogeographic structure and paleoenvironmental niche modeling. Evolution (NY), 2014; 68(9): 2658–2674. doi: 10.1111/EVO.12459. [DOI] [PubMed] [Google Scholar]
  • 71.Smith SL, Carden RF, Coad B et al. A survey of the hybridisation status of Cervus deer species on the island of Ireland. Conserv. Genet., 2014; 15(4): 823–835. doi: 10.1007/S10592-014-0582-3. [DOI] [Google Scholar]
  • 72.Queirós J, Gortázar C, Alves PC. . Deciphering anthropogenic effects on the genetic background of the red deer in the Iberian Peninsula. Front. Ecol. Evol., 2020; 8: 515401. doi: 10.3389/FEVO.2020.00147. [DOI] [Google Scholar]
  • 73.Hill E, Murphy N, Li-Williams S et al. Hybridisation rates, population structure, and dispersal of sambar deer (Cervus unicolor) and rusa deer (Cervus timorensis) in south-eastern Australia. Wildlife Res., 2023; 50(9): 669–687. doi: 10.1071/WR22129. [DOI] [Google Scholar]
  • 74.Grubb P, Groves CP. . Notes on the taxonomy of the deer (Mammalia, Cervidae) of the Philippines. Zool. Anz., 1983; 210(1/2): 119–144. [Google Scholar]
  • 75.Brandies P, Peel E, Hogg CJ et al. The value of reference genomes in the conservation of threatened species. Genes, 2019; 10(11): 846. doi: 10.3390/GENES10110846. [DOI] [PMC free article] [PubMed] [Google Scholar]
GigaByte. 2025 Feb 24;2025:gigabyte150.

Article Submission

Ma Carmel Javier
GigaByte.

Assign Handling Editor

Editor: Scott Edmunds
GigaByte.

Editor Assess MS

Editor: Hongfang Zhang
GigaByte.

Curator Assess MS

Editor: Mary-Ann Tuli
GigaByte.

Review MS

Editor: Endre Barta

Reviewer name and names of any other individual's who aided in reviewer Endre Barta
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? Yes
Please add additional comments on language quality to clarify if needed
Are all data available and do they match the descriptions in the paper? No
Additional Comments The authors provided only the assembly in Fasta and GenBank format and the contigs (scaffolds?) in GenBank format. Neither the annotation nor the raw Illumina reads are available.
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a> Yes
Additional Comments In the cases where the data is uploaded, the provided metadata is consistent.
Is the data acquisition clear, complete and methodologically sound? Yes
Additional Comments
Is there sufficient detail in the methods and data-processing steps to allow reproduction? No
Additional Comments The exact parameters used during the processing are completely missing. For example, it is unclear how the RagTag-based correcting and scaffolding were carried out.
Is there sufficient data validation and statistical analyses of data quality? Not my area of expertise
Additional Comments
Is the validation suitable for this type of data? No
Additional Comments Without having the raw Illumina reads and the exact command line parameters used, it is not possible to validate the provided results.
Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes
Additional Comments
Any Additional Overall Comments to the Author Assembling the reference genomes of endangered species is a task of immense importance, with the potential to significantly advance our understanding and conservation of these species. This work provides an initial genome assembly based on Illumina short-read sequencing. The correction and scaffolding of the contigs were made with the RagTag program using the red deer PacBio-based chromosome-level assembly. The potential benefits of this work are vast, from gaining knowledge to initiating and furthering population studies to preserve the species. According to the annotation and the BUSCO analysis, the final assembly seems especially good, considering that it is short-read based. However, there are some concerns about the methodology and the provided data. 1. The Illumina short reads and the annotation data (GFFs, VCFs) are not available. 2. The methods used are not reproducible because the descriptions of the exact parameters are missing. 3. It seems that the authors did not use the ‘-r’ parameter during the scaffolding, which resulted in inserting 100bp Ns instead of the actual size insertion based on the red deer reference genome. 4. There is no K-mer based genome size estimation. 5. The chromosome number is not known. Is there any chromosomal rearrangement between the red deer and the Visayan Spotted Deer? 6. It is not justified why the protein- and mitochondria-based trees are drawn as cladograms and not as phylograms. This way, the actual distances between the different species cannot be seen. 7. Although the short reads were mapped back to the assembly, no variation data is provided. 8. Is it necessary to include these high number (46104) short (1000>) contigs in the assembly? 9. Although the red deer assembly was used for the correction and scaffolding, the annotation was compared to the mule deer.
Recommendation Major Revision
GigaByte.

Review MS

Editor: Haimeng Li

Reviewer name and names of any other individual's who aided in reviewer Haimeng Li
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? Yes
Please add additional comments on language quality to clarify if needed
Are all data available and do they match the descriptions in the paper? No
Additional Comments The genomic annotation file is not publicly available.
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a> No
Additional Comments Genomic annotation information and protein sequence information were not found in the NCBI database.
Is the data acquisition clear, complete and methodologically sound? Yes
Additional Comments
Is there sufficient detail in the methods and data-processing steps to allow reproduction? No
Additional Comments
Is there sufficient data validation and statistical analyses of data quality? Yes
Additional Comments
Is the validation suitable for this type of data? Yes
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data? No
Additional Comments
Any Additional Overall Comments to the Author The manuscript, 'Draft Genome of the Endangered Visayan Spotted Deer (Rusa alfredi), a Philippine Endemic Species,' contributes to the field of conservation genomics. The study presents the first draft genome assembly of the Visayan Spotted Deer, utilizing Illumina short-read sequencing technology to generate valuable genomic resources for this endangered species. Here are some questions and comments. Q1. Why was gene annotation conducted using only homology-based annotation? It is recommended that the annotation approach includes de novo, RNA-based, and homology-based methods. Combining these approaches would provide a more comprehensive gene set, particularly for species with limited genomic resources. Please revise the method section to include these additional annotation strategies. The authors have stated that due to sampling limitations, RNA-based experiments could not be conducted. RNA extraction might be performed using the tissue samples that were previously collected for genome assembly. In Lines 167-172 Q2. Before proceeding with genome assembly, it is essential to conduct a genome survey. This initial step provides crucial information about the genome's size, complexity, and composition, which is vital for planning the assembly strategy and selecting appropriate sequencing technologies and bioinformatics tools. The survey should include estimates of genome size, GC content, repetitive elements, and ploidy level. Additionally, the result could be used to assess the completeness of the assembly. Please include a section on the genome survey in the Method section. Q3. To enhance the quality and contiguity of the assembly, utilizing another species as a reference genome for scaffolding might introduce errors due to discrepancies in karyotype. It is essential to ascertain whether there is a definitive karyotype study that verifies the consistency of the karyotype between the Visayan Spotted Deer and the reference species, indicating the absence of chromosomal fission or fusion events. In Lines 236-238 This information is crucial for the reliability of the scaffolding process. Q4. Although the length of scaffold N50 is long, the high number of scaffolds and contigs suggests fragmentation. Have you addressed redundancy in the assembly? In Line 238 Q5. Have you used software like Merqury to detect assembly errors and assess the completeness of the assembly? This is useful for evaluating the quality of the genome sequence and identifying potential issues that may need to be addressed. Q6. Are the species divergent, which might explain the low number of orthologous genes? Is this an annotation issue or does it reflect true biological divergence? Further investigation into the annotation process and comparative genomic analyses may be warranted to understand the extent of divergence and the implications for the study. In Lines 313-317 Q7. Please standardize the format of numbers throughout the manuscript to maintain consistency in the number of significant figures. In Lines 224, 225, 227, 239, 245
Recommendation Major Revision
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte. 2025 Feb 24;2025:gigabyte150.

Major Revision

Ma Carmel Javier
GigaByte.

Assess Revision

Editor: Hongfang Zhang
GigaByte.

Re-Review MS

Editor: Endre Barta

Indicate in the comments box below whether you are happy with the changes made or if the manuscript is unacceptable.
Comments on revised manuscript I thank the authors for their efforts to address the concerns raised. I broadly agree with the answers, but three further details need clarification: 1. Calculating the raw reads and the resulting genome size yields a coverage of about 62x. The authors mapped the raw reads back to the resulting reference genome sequence, which gave 47x coverage. However, both Genomescope and Merqury K-mer analysis showed 22x coverage. What is the reason for this discrepancy? 2. The K-mer analysis does indeed, and a bit strangely, show what appears to be a haploid genome. However, the 0.302% heterozygosity measured by GenomeScope is not remarkably low. To have an accurate picture of this, it would be important to count the number of heterozygous sites based on the raw reads mapped back at 47x coverage. 3. Although we do not know the exact chromosome number, fitting the reference to the red deer reference could be interesting. It would show how many scaffolds fit more than one red deer chromosome. Of course, this could be either due to chromosome rearrangement or because the contigs' scaffolding or assembly was incorrect.
GigaByte.

Re-Review MS

Editor: Haimeng Li

Indicate in the comments box below whether you are happy with the changes made or if the manuscript is unacceptable.
Comments on revised manuscript Q1:Why is the estimated genome size from the genome survey much smaller than the assembled genome size? Q2:In the method section, I did not see a description of the de novo method for gene structure annotation. Q3:I am concerned about using a reference genome with unclear karyotype relationships for scaffolding. Q4:Are there other published comparative genomic studies on deer that have identified such a small number of homologous genes?
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte. 2025 Feb 24;2025:gigabyte150.

Minor Revision

Ma Carmel Javier
GigaByte.

Assess Revision

Editor: Hongfang Zhang
GigaByte.

Final Data Preparation

Editor: Yannan Fan
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte.

Accept

Editor: Scott Edmunds

Editor’s Assessment The Visayan spotted deer (Rusa alfredi), is a small, endangered, primarily nocturnal species of deer found in the rainforests of the Visayan Islands in the Philippines. The present study reports the first draft genome assembly for the species, addressing a critical gap in genomic data for this IUCN-redlisted cervid. Using Illumina sequencing, the resulting genome assembly spans 2.52 Gb in size with a BUSCO completeness score of 95.5% and encompasses 24,531 annotated genes. Phylogenetic analysis suggests a close evolutionary relationship between R. alfredi and Cervus species suggesting that the genus Rusa is sister to Cervus. Peer-review teased out more benchmarking results and the annotation files, demonstrating this genomic resource is useful and usable for advancing population genetics and evolutionary studies, thereby informing conservation strategies and enhancing breeding programs for the critically threatened species. Providing whole genome sequences for other native species of Rusa could further provide genomic resources for detecting hybrids, which will also help the management and monitoring of these species, especially for the reintroduction of captive populations in the wild.
Editor’s Assessment The Visayan spotted deer (Rusa alfredi), is a small, endangered, primarily nocturnal species of deer found in the rainforests of the Visayan Islands in the Philippines. The present study reports the first draft genome assembly for the species, addressing a critical gap in genomic data for this IUCN-redlisted cervid. Using Illumina sequencing, the resulting genome assembly spans 2.52 Gb in size with a BUSCO completeness score of 95.5% and encompasses 24,531 annotated genes. Phylogenetic analysis suggests a close evolutionary relationship between R. alfredi and Cervus species suggesting that the genus Rusa is sister to Cervus. Peer-review teased out more benchmarking results and the annotation files, demonstrating this genomic resource is useful and usable for advancing population genetics and evolutionary studies, thereby informing conservation strategies and enhancing breeding programs for the critically threatened species. Providing whole genome sequences for other native species of Rusa could further provide genomic resources for detecting hybrids, which will also help the management and monitoring of these species, especially for the reintroduction of captive populations in the wild.
GigaByte.

Export to Production

Editor: Scott Edmunds

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    The genome assembly generated in this study has been deposited at NCBI GenBank under the accession JBCEYX000000000. All sequencing reads can be accessed through the NCBI SRA (BioProject number: PRJNA1102104). Files generated in this study (Illumina reads, codes, configuration file for assembly, assembled genome, annotations, MSA, and phylogenetic tree files) are available in GigaDB [10].


    Articles from GigaByte are provided here courtesy of Gigascience Press

    RESOURCES