Draft genome of the lowland anoa (Bubalus depressicornis) and comparison with buffalo genome assemblies (Bovidae, Bubalina)

Stefano Porrelli; Michèle Gerbault-Seureau; Roberto Rozzi; Rayan Chikhi; Manon Curaudeau; Anne Ropiquet; Alexandre Hassanin

doi:10.1093/g3journal/jkac234

. 2022 Sep 16;12(11):jkac234. doi: 10.1093/g3journal/jkac234

Draft genome of the lowland anoa (Bubalus depressicornis) and comparison with buffalo genome assemblies (Bovidae, Bubalina)

Stefano Porrelli ¹, Michèle Gerbault-Seureau ², Roberto Rozzi ^3,⁴, Rayan Chikhi ⁵, Manon Curaudeau ⁶, Anne Ropiquet ⁷, Alexandre Hassanin ^8,^✉

Editor: D -J de Koning

PMCID: PMC9635665 PMID: 36111873

Abstract

Genomic data for wild species of the genus Bubalus (Asian buffaloes) are still lacking while several whole genomes are currently available for domestic water buffaloes. To address this, we sequenced the genome of a wild endangered dwarf buffalo, the lowland anoa (Bubalus depressicornis), produced a draft genome assembly and made comparison to published buffalo genomes. The lowland anoa genome assembly was 2.56 Gbp long and contained 103,135 contigs, the longest contig being 337.39 kbp long. N50 and L50 values were 38.73 and 19.83 kbp, respectively, mean coverage was 44× and GC content was 41.74%. Two strategies were adopted to evaluate genome completeness: (1) determination of genomic features with de novo and homology-based predictions using annotations of chromosome-level genome assembly of the river buffalo and (2) employment of benchmarking against universal single-copy orthologs (BUSCO). Homology-based predictions identified 94.51% complete and 3.65% partial genomic features. De novo gene predictions identified 32,393 genes, representing 97.14% of the reference’s annotated genes, whilst BUSCO search against the mammalian orthologs database identified 71.1% complete, 11.7% fragmented, and 17.2% missing orthologs, indicating a good level of completeness for downstream analyses. Repeat analyses indicated that the lowland anoa genome contains 42.12% of repetitive regions. The genome assembly of the lowland anoa is expected to contribute to comparative genome analyses among bovid species.

Keywords: Bovidae, Bubalus depressicornis, lowland anoa, genome assembly, de novo assembly

Introduction

The lowland anoa, Bubalus depressicornis (Smith 1827), is a wild dwarf buffalo endemic to Sulawesi and Buton Islands, where it can be found in sympatry with the mountain anoa, Bubalus quarlesi (Ouwens 1910). Both anoa species are currently classified as endangered with declining populations due to hunting and habitat loss (Burton et al. 2016). Because of their singular appearance, they were initially described in their own genus Anoa (Ouwens 1910). However, Anoa was not regarded as a valid genus in more recent classifications, in which both anoa species were ascribed to the genus Bubalus, together with the wild water buffalo—Bubalus arnee (Kerr 1792) and the tamaraw—Bubalus mindorensis (Heude 1888; Groves 1969; IUCN 2022). Molecular studies based on mitochondrial sequences have supported a sister-group relationship between B. depressicornis and B. quarlesi (Schreiber et al. 1999; Priyono et al. 2020). In addition, the mitogenome of the lowland anoa was found to be equally distant from those of the 2 types of domestic water buffalo, the river buffalo from the Indian subcontinent and Mediterranean countries and the swamp buffalo from China and Southeast Asia (Hassanin et al. 2012). Since the same phylogenetic pattern was recovered from the analyses of 2 nuclear datasets, one based on 30 autosomal genes and the other based on 2 genes of the Y chromosome, Curaudeau et al. (2021) have concluded the existence of 2 species of domestic buffaloes: Bubalus bubalis (Linnaeus 1758) for the river buffalo and Bubalus kerabau (Fitzinger 1860) for the swamp buffalo, which diverged during the Pleistocene at around 0.84 Mya. As discussed in Curaudeau et al. (2021), the 2 domestic species can easily be distinguished based on coat and horn characteristics (Castelló 2016), and they have different karyotypes: B. bubalis has 2n = 50 chromosomes with a fundamental number (FN) equal to 58; whereas B. kerabau has 2n = 48 chromosomes and FN = 56 (Nguyen et al. 2008).

With rapid progress and cost reduction in sequencing technologies, many whole genomes of domestic bovid species have been sequenced. Whole-genome sequencing has allowed the identification of variants involved in domestication and genetic improvement for several livestock species such as cattle and buffaloes (Zimin et al. 2009; Canavez et al. 2012; Li et al. 2020; Rosen et al. 2020). Chromosome-level genome assemblies include those of the domestic cow, Bos taurus (Zimin et al. 2009), the domestic river buffalo, B. bubalis (Deng et al. 2016), the swamp buffalo, B. kerabau [reported as Bubalus carabanensis in Luo et al. (2020) but see Curaudeau et al. (2021) for further taxonomic information], the domestic Yak, Bos grunniens (Zhang et al. 2021) and the zebu cattle, Bos indicus (Canavez et al. 2012). Whereas a total of 8 chromosome- and scaffold-level genome assemblies are publicly available for domestic buffaloes, there are currently no genome data available for wild species of the genus Bubalus. To fill this gap, a biopsy of a living lowland anoa was used for next-generation sequencing, and a draft genome was assembled de novo for comparison to other buffalo genome assemblies available in international databases such as NCBI (National Center for Biotechnology Information) and BIG_GWH (Beijing Institute of Genomics Genome Warehouse database).

Materials and methods

DNA extraction, library preparation, and genome sequencing

A living male adult of lowland anoa, named Yannick, was sampled at the Ménagerie du Jardin des Plantes of the Muséum national d’Histoire naturelle (MNHN, Paris, France; Fig. 1). A skin biopsy was performed in 2006 by a veterinary surgeon following protocols approved by the MNHN and in line with ethical guidelines. The same biopsy was previously used to determine its karyotype (2n = 48; FN = 58; Nguyen et al. 2008). DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) following the manufacturer’s protocol. DNA quantification was performed with a Qubit 2.0 Fluorometer with Qubit dsDNA HS Assay Kit (Thermo Fischer Scientific, Walthan, MA, USA). Library preparation and sequencing were conducted at the Institut du Cerveau et de la Moelle épinière. The sample was sequenced on a NextSeq 500 Illumina system generating 2 × 151 bp reads using the NextSeq 500 High Output Kit v2 with 300 cycles and aiming for an insert size of 350 bp.

Fig. 1. — Lowland anoa (*Bubalus depressicornis*) housed at the *Ménagerie du Jardin des Plantes* (© Alexandre Hassanin—MNHN).

De novo assembly

Data quality was assessed with FastQC v.0.11.5 (https://www.bioinformatics.babrah am.ac.uk/projects/fastqc/) and results were collated with MultiQC v1.12 (Ewels et al. 2016). Raw reads were quality-trimmed and adapter sequences and contaminants removed with Trimmomatic v.0.36 (Bolger et al. 2014) with the following parameters: “ILLUMINACLIP: TruSeq3 -PE.fa:2:30:10 LEADING:33 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36.” Data quality of quality-trimmed reads was reassessed with FastQC. A de novo assembly was performed with MaSuRCA v.3.3.1 (Zimin et al. 2013, 2017) using recommended parameters for mammalian genomes and paired-end Illumina-only data, as indicated in Zimin et al. (2017). The mean and standard deviation for the Insert size were estimated with an “estimate-insert-size” script (https://gist.github.com/rchikhi/7281991). Paired-end reads were error corrected using QuorUM (Marçais et al. 2015) and assembled into super-reads using a k-mer size of 99, as selected by the MaSuRCA assembler. The super-reads were then assembled into contigs using the CABOG assembler, part of the MaSuRCA pipeline (Zimin et al. 2017), followed by gap closing with the paired-end information (Zimin et al. 2013).

Assembly quality assessment

Genome assemblies publicly available for Bubalus and Syncerus genera were retrieved from NCBI and BIG_GWH for quality comparison and assessment. The dataset included 2 assemblies at the chromosome level for the river buffalo (B. bubalis) with a coverage of 100× and 572×, 4 scaffold-level draft assemblies of river buffalo with coverage ranging between 69× and 119×, one chromosome-level assembly of swamp buffalo (B. kerabau) with a mean coverage of 65×, and one scaffold-level draft assembly of the African buffalo (Syncerus caffer) with 162× coverage. The 8 retrieved assemblies were sequenced and assembled with different methods, summarized in Table 1.

Table 1.

Information regarding genome assemblies available for buffalo species.

Species/assembly name	Breed	Geographic location	ID	Assembly accession no	Sequencing technology	Assembly method	Coverage	Assembly level
Bubalus bubalis NDDB_SH_1_ (RefSeq)	Murrah	India	NDDB_SH_1	GCF_019923935.1	PacBio Sequel; 10X and BioNano Optical Map	Falcon+Scaff10X+BioNano v. 2019-02-25	572×	Chromosome
Bubalus bubalis Jaffrabadi_v3.0	Jaffrabadi	India	AAUIN_1	GCA_000180995.3	454; Illumina NextSeq 500	MaSuRCA v. 2.3.2b	100×	Scaffold
Bubalus bubalis UOA_WB_1	Mediterranean	Italy	UOA_WB_1	GCA_003121395.1	PacBio	Falcon-Unzip v. 1.8.7	69×	Chromosome
Bubalus bubalis Bubbub1.0	Bangladesh	Bangladesh	Bubbub1.0	GCA_004794615.1	Illumina HiSeq 2000	Soapdenovo v. 2.04	119×	Scaffold
Bubalus bubalis ASM299383v1	Egyptian	Egypt	EGYBUF_1.0	GCA_002993835.1	SOLiD	Velvet v. 1.1; Bowtie2 v. 2.1.0; SHRiMP v. 2.2.3	70×	Scaffold
Bubalus bubalis UMD_CASPUR_WB_2.0	Mediterranean	United States	UMD_CASPUR_WB_2.0	GCA_000471725.1	Illumina GAIIx; Illumina HiSeq; 454	MaSuRCA v. 1.8.3	70×	Scaffold
Bubalus depressicornis* MNHNYannick_LA_1	—	Indonesia	MNHNYannick_LA_1	Assembled MaSuRCA	Illumina NextSeq 500	MaSuRCA v. 3.3.1	44×	Scaffold
Bubalus kerabau CUSA_SWP	Fuzhong	China	CUSA_SWP	GWHAAJZ00000000	PacBio 57.8	Wtdbg 1.2.8	65×	Chromosome
Syncerus caffer ASM640878v2	African Buffalo	South Africa	ABF221	GCA_006408785.2	Illumina HiSeq	Platanus v. 1.2.4	162×	Scaffold

Open in a new tab

This study.

The quality of the lowland anoa genome assembly was assessed with QUAST-LG v.5.0.1 (Mikheenko et al. 2018) using the river buffalo NDDB_SH_1 genome assembly (Deng et al. 2016) as a reference. The default parameters for mammalian genomes were used to compare all assemblies in QUAST-LG: “MODE: large, threads: 50, eukaryotic: true, minimum contig length: 3,000, minimum alignment length: 500, ambiguity: 1, threshold for extensive misassembly size: 7,000.” All analyzed assemblies were aligned to the river buffalo NDDB_SH_1 assembly and results were plotted with Circos v. 0.69.8 (Krzywinski et al. 2009) and Jupiter consistency plots (Chu 2018).

We adopted 2 different strategies to evaluate genome completeness. Firstly, genomic features were predicted with the homology-based method by aligning the lowland anoa genome to that of the annotated river buffalo reference genome (NDDB_SH_1 and relative annotations retrieved from NCBI). Secondly, we used a de novo gene prediction method with GlimmerHMM v3.0.4 (Majoros et al. 2004). Thirdly, we employed benchmarking against universal single-copy orthologs (BUSCO v5.2.2; Manni et al. 2021) using the mammalia_odb10 dataset (2021 February 19, number of genomes: 24, number of BUSCOs: 9,226) from OrthoDB (Kriventseva et al. 2019) and compared to other buffalo genome assemblies already deposited on NCBI and BIG_GWH (Table 1).

Repeats and gene annotation

Repetitive regions in the lowland anoa genome were identified, annotated, and masked with RepeatMasker v.4.1.2-p1 (Tarailo-Graovac and Chen 2009). Firstly, a de novo repeat library was constructed from the genome assembly with RepeatModeler v.2.0.2a. RepeatMasker was used with default parameters to produce a homolog-based repeat library and mask the genome’s repetitive regions. The scripts “calcDivergenceFromAlign.pl” and “createRepeatLandscape.pl” were used to calculate the Kimura divergence values and to plot the resulting repeat landscape. The repeat landscape of B. taurus was retrieved from the RepeatMasker database for visual comparison.

Results and discussion

Whole-genome sequencing and data QC

Whole-genome sequencing generated 991,437,058 paired-end reads with a length of 151 bp. Quality trimming removed 46,616,722 low-quality, adapter-contaminated, and PCR-duplicated reads, representing approximately 0.5% of the total reads. A total of 944,820,336 clean paired-end reads were generated, covering the lowland anoa genome with an estimated 56× depth based on a genome size of 2.56 Gbp. The estimation of insert size using in-house script returned a mean of 377 and a standard deviation of 83.

De novo assembly quality metrics

The final lowland anoa genome assembly generated here contained 103,135 contigs, the largest being 337.39 kbp long, an N50 of 38.73 kbp and an L50 of 19.83 kbp (Table 2). The total length was 2.56 Gbp with a mean coverage of 44×, and GC content was 41.74%, in agreement with other published assemblies (between 41.60% and 41.92%, Table 3). When aligned to the NDDB_SH_1 genome assembly, the fraction of the anoa genome assembly was 95.41%, a value comparable to other buffalo genome assemblies (Fig. 2), with a total alignment length of 2,515,453,843 bp. A total of 886 contigs could not be aligned to the river buffalo genome assembly, whilst 8,085 contigs were only partially aligned, resulting in a total unaligned length of 45,224,171 bp, which reflects the discrepancy between the total length of the lowland anoa genome and the total aligned length to the reference river buffalo genome assembly. Partially aligned and unaligned contigs could have resulted from structural variations between the lowland anoa and the reference river buffalo assembly, such as large INDELS (insertion/deletions), as well as repetitive regions and/or alternative haplotypes causing assembly errors. The nature of short-read technology causes difficulties in characterizing genomic regions such as telomeres, centromeres, repetitive, and highly heterochromatic regions (Johnson et al. 2005; Low et al. 2019; Weissensteiner and Suh 2019), which are notoriously difficult to assemble and could be better resolved with long-read sequencing.

Table 2.

Draft assembly statistics of the lowland anoa genome.

Contig statistics	value
Total length	2,565,510,706
Number of contigs	103,135
Largest contig	337,395
GC (%)	41.74
N50	38,737
L50	19,832

Open in a new tab

Table 3.

Comparison of assembly quality metrics of the lowland anoa (Bubalus depressicornis) and other buffalo assemblies.

Name/assembly name (NCBI)	ID	Genome fraction %	Total aligned length	Largest alignment	Scaffolds count	N50	L50	GC%
Bubalus bubalis NDDB_SH1 (RefSeq)	NDDB_SH_1	—	—	—	26	116,997,125	9	41.75
Bubalus bubalis Jaffrabadi_v3.0	AAUIN_1	83.189	2,299,810,356	834,863	75,621	104,127	9,942	41.78
Bubalus bubalis UOA_WB_1	UOA_WB_1	98.851	2,605,694,501	34,949,624	509	117,219,835	9	41.81
Bubalus bubalis Bubbub1.0	Bubbub1.0	86.537	2,309,804,413	9,328,338	14,905	7,025,746	116	41.6
Bubalus bubalis ASM299383v1	EGYBUF_1.0	36.01	974,053,149	2,013,276	6,313	3,666,815	234	41.92
Bubalus bubalis UMD_CASPUR_WB_2.0	UMD_CASPUR_WB_2.0	93.634	2,473,056,510	7,952,377	5,714	1,545,294	508	41.73
Bubalus depressicornis MNHNYannick_LA_1	MNHNYannick_LA_1	95.415	2,515,453,834	337,395	103,135	38,737	19,832	41.74
Bubalus kerabau CUSA_SWP	CUSA_SWP	97.086	2,557,653,758	23,566,932	1,534	117,253,548	8	41.83
Syncerus caffer ASM640878v2	ABF221	73.046	1,942,672,810	4,692,267	13,167	2,448,414	351	41.72

Open in a new tab

Fig. 2. — Cumulative length of aligned contigs of the lowland anoa (red line) against the river buffalo NDDB_SH_1 reference genome assembly (dashed line) and compared to other buffalo genome assemblies available on NCBI.

The lowland anoa genome assembly has a modest N50 compared to other buffalo genome assemblies (Table 3), indicating lower levels of contiguity, which is expected due to the short-read output of Illumina sequencing technology (read length = 151 bp). In addition, repeat analysis revealed that 42.12% of the lowland anoa genome is composed of repetitive regions. This, coupled with low-sequence coverage, sequencing and assembly errors, causes breaks in the assembly contiguity (Gnerre et al. 2011; Low et al. 2019). This is apparent even in high-quality chromosome-level genome assemblies that use multiple sequencing libraries and multiple sequencing technologies, such as the previous human genome assembly GRCh38, which contained hundreds of gaps (International Human Genome Sequencing Consortium 2004). In addition, the chromosome-level genome assemblies retrieved from NCBI (NDDB_SH_1, UOA_WB_1) were sequenced using multiple insert size libraries and sequencing technologies and were intensively verified with multiple methods such as optical mapping, Hi-C, and RH (Deng et al. 2016; Low et al. 2019).

Moreover, quality metrics of publicly available assemblies are usually limited to reporting N50 and L50 values, which represent the shortest contig length needed to cover 50% of the total assembly size, and the number of contigs whose cumulative length covers 50% of the total assembly size, respectively (Bradnam et al. 2013). Such metrics are often used to compare and evaluate performances of the ever-growing assembly and annotation methods and software (Manchanda et al. 2020). However, we hereby show that reporting N50 and L50 metrics exclusively can be misleading, as they only provide a standard measure of assembly contiguity whilst omitting information such as gene content and completeness, as well as assembly correctness. Furthermore, N50 values can be artificially raised by deliberately excluding short contigs from analyses and by the presence of undetermined nucleotides (Ns) linking the scaffolded contigs (Gurevich et al. 2013). Therefore, to assess the quality of the lowland anoa genome assembly, we generated conventional N50 and L50 metrics and also determined genome completeness in terms of gene content and genome correctness by comparing our assembly to a chromosome-level genome assembly of the river buffalo (B. bubalis). In addition, a swamp buffalo (B. kerabau, CUSA_SWP) and a more distantly related African buffalo species (S. caffer, ABF221) were also included in our comparison.

Regardless of the modest N50 value, the lowland anoa genome assembly is in good agreement with the NDDB_SH_1 assembly, with 95.91% of contigs correctly mapped to the 25 reference chromosomes of the river buffalo and fewer misassembled blocks compared to other draft assemblies (Fig. 3). The genome assembly of the Egyptian river buffalo (EGYBUF_1.0) had an abnormally high number of misassembled blocks with respect to the reference genome, followed by the genome assembly of a female Italian river buffalo (UOA_WB_1). To investigate this, misassemblies and structural variation metrics were computed in QUAST-LG (Table 4). The Egyptian river buffalo assembly (EGYBUF_1.0) showed the highest number of mismatches and the highest number of Ns, followed by the Jaffrabadi river buffalo (AAUIN_1). The genome assembly of the African buffalo (S. caffer, ABF221) showed a larger number of mismatches (Table 4), but this can be explained by the higher sequence divergence between Syncerus and Bubalus, as the 2 genera have separated in the Late Miocene (Hassanin et al. 2012). Misassemblies and structural variation metrics could not explain the misassembled blocks of the UOA_WB_1 assembly observed in the Circos plot of Fig. 3. However, some of these misassembled blocks could be due to unplaced contigs. To investigate this, the UOA_WB_1 assembly was aligned to the NDDB_SH_1 reference to generate Jupiter consistency plots. When using the largest 26 contigs of the UOA_WB_1 assembly to cover 100% of the reference river buffalo genome, an almost perfect level of synteny was observed (Fig. 4a). Although this result was expected for genomes of the same species, it also indicates a good level of assembly quality in terms of correctness. However, when including all 509 contigs of the UOA_WB_1 assembly, several misassembled regions were observed (Fig. 4b). Three nonexclusive hypotheses can be advanced to interpret this result: possible genomic rearrangements, genome assembly errors, and repetitive regions. Whether the results of the consistency plots are due to the factors mentioned above or other factors, such as contamination, remains speculative. Nevertheless, the results of the quality metric comparison conducted here further indicate the unreliability of using exclusively N50 and L50 metrics when assessing assembly quality. Instead, contiguity metrics should be supplemented with genome completeness and correctness metrics.

Fig. 3. — Circos plot of scaffolds mapped to NDD_SH_1 reference genome assembly (*Bubalus bubalis*). Outer circle represents reference sequence with GC% heatmap (0% = white, 69% = black). Inner circles represent assembly tracks, with heatmap representing correct contigs (green) and misassembled blocks (red).

Table 4.

QUAST-LG statistics of all buffalo assemblies with respect to the river buffalo NDDB_SH_1 reference.

	B. depressicornis MNHNYannick_LA_1	*B. bubalis* AAUIN_1	B. bubalis Bubbub1.0	B. bubalis EGYBUF_1.0	B. bubalis UMD_CASPUR_WB_2.0	B. bubalis UOA_WB_1	*B. kerabau* CUSA_SWP	*S. caffer* ABF221
Misassemblies	4,949	19,238	3,561	131	4,040	1,724	2,111	6,565
Relocations	1,447	13,540	2,761	85	1,434	1,051	1,199	3,397
Translocations	3,203	4,714	757	10	2,569	647	896	3,032
Inversions	299	984	43	36	37	26	16	136
Misassembled contigs	4,550	15,988	1,049	45	1,943	255	533	1,727
Misassembled contigs length	159,179,266	1,334,096,556	2,506,642,146	55,459,162	1,891,377,139	2,639,940,877	2,594,120,526	2,486,555,687
Local misassemblies	7,014	73,267	241,261	6,933	7,100	4,870	9,940	435,454
Possible TEs	164	874	886	10	544	136	158	654
Unaligned mis. contigs	287	2,378	548	2,522	63	104	381	1,324
Unaligned contigs	886 + 8,085 partial	2,555 + 57,865 partial	297 + 7,280 partial	2,806 + 3,472 partial	182 + 3,290 partial	1 + 416 partial	140 + 1110 partial	900 + 7,314 partial
Unaligned length	45,224,171	596,227,806	299,544,303	1,673,093,194	82,826,374	49,291,638	51,316,520	779,611,955
Genome fraction (%)	95.415	83.189	86.537	36.01	93.634	98.851	97.086	73.046
Duplication ratio	1.007	1.425	1.076	1.36	1.034	1.005	1.013	1.045
Mismatches	16,233,421	19,654,061	23,375,163	17,890,296	10,863,130	10,118,782	15,844,866	114,608,168
Indels	1,578,224	746,243	705,955	6,440,610	1,136,878	1,400,310	1,534,735	2,128,964
Indels length	12,654,316	56,163,406	24,209,936	35,356,432	24,745,254	23,411,739	33,123,824	18,236,722
Mismatches per 100 kbp	649	901	1,030	1,895	442	390	622	5,983
Indels per 100 kbp	63	34	31	682	46	54	60	111
Indels (≤5 bp)	1,297,998	598,354	515,830	5,758,980	893,802	1,227,309	1,269,689	1,641,754
Indels (> 5 bp)	280,226	147,889	190,125	681,630	243,076	173,001	265,046	487,210
N's	493,027	850,098,824	138,209,713	328,128,682	73,946,361	373,500	22,116,406	59,283,755
N's per 100 kbp	19.22	22,942	5,040.03	11,097	2,820.18	14.06	840.50	2,131.26

Open in a new tab

Fig. 4. — Jupiter consistency plot showing alignment between the river buffalo genome assemblies UO_AWB_1 and NDDB_SH_1. The left of the plots shows the numbered NDDB_SH_1 chromosomes. The right of the plots shows (a) the 26 longest contigs of the UOA_WB_1 assembly needed to cover 100% of the reference genome and (b) all the 509 contigs of the UO_AWB_1 assembly. Colored bands represent synteny between the genomes. Lines represent genomic rearrangements, break points in the scaffolds or assembly errors. The absence of lines connecting the UO_AWB_1 blocks to the NDDB_SH_1 chromosomes indicates contigs that could not be aligned to the reference.

Genomic features, gene prediction, and annotation

Homology and de novo gene predictions performed on the lowland anoa genome assembly were in agreement with each other and indicated a good level of genome completeness. Results were comparable to other published genome assemblies (Tables 5 and 6), and an improvement over the Bangladeshi river buffalo (Bubbub_1.0), the Egyptian river buffalo (EGYBUF_1.0), and Mediterranean river buffalo (UMD_CASPUR_WB_2.0) assemblies.

Table 5.

Gene features (CDS and mRNA) predicted with GlimmerHMM.

Name/assembly name (NCBI)	ID	Predicted gene features (unique)	Predicted gene features (≥0 bp)	Predicted gene features (≥300 bp)	Predicted gene features (≥1500 bp)	Predicted gene features (≥3,000 bp)
Bubalus bubalis Jaffrabadi_v3.0	AAUIN_1	1,065,654	1,087,174 + 1,214 part	719,235 + 911 part	129,801 + 19 part	24,579 + 7 part
Bubalus bubalis UOA_WB_1	UOA_WB_1	1,055,791	1,059,972 + 21 part	762,464 + 17 part	154,594 + 0 part	29,659 + 0 part
Bubalus bubalis Bubbub1.0	Bubbub1.0	948,732	958,663 + 101 part	655,839 + 73 part	136,045 + 4 part	27,867 + 1 part
Bubalus bubalis ASM299383v1	EGYBUF_1.0	826,048	826,155 + 69 part	530,835 + 37 part	96,365 + 0 part	16,243 + 0 part
Bubalus bubalis UMD_CASPUR_WB_2.0	UMD_CASPUR_WB_2.0	963,177	964,473 + 138 part	669,508 + 117 part	134,780 + 5 part	26,448 + 2 part
Bubalus depressicornis MNHNYannick_LA_1	MNHNYannick_LA_1	1,027,469	1,023,163 + 5,278 part	702,282 + 4,582 part	131,966 + 204 part	24,994 + 37 part
Bubalus kerabau CUSA_SWP	CUSA_SWP	1,042,862	1,046,662 + 87 part	752,170 + 70 part	151,809 + 10 part	29,488 + 6 part
Syncerus caffer ASM640878v2	ABF221	1,061,091	1,064,542 + 229 part	750,719 + 171 part	150,033 + 10 part	29,460 + 1 part

Open in a new tab

Table 6.

Genes predicted with homology-based prediction method.

Name/assembly name (NCBI)	ID	Genes	Partial genes	Total	% of reference's annotated genes (n = 33,348)
Bubalus bubalis Jaffrabadi_v3.0	AAUIN_1	10,804	20,895	31,699	95.05
Bubalus bubalis UOA_WB_1	UOA_WB_1	30,810	1,955	32,765	98.25
Bubalus bubalis Bubbub1.0	Bubbub1.0	11,039	20,983	32,022	96.02
Bubalus bubalis ASM299383v1	EGYBUF_1.0	1,345	23,770	25,115	75.31
Bubalus bubalis UMD_CASPUR_WB_2.0	UMD_CASPUR_WB_2.0	18,656	13,271	31,927	95.74
Bubalus depressicornis MNHNYannick_LA_1	MNHNYannick_LA_1	19,148	13,245	32,393	97.14
Bubalus kerabau CUSA_SWP	CUSA_SWP	28,349	3,419	31,768	95.26
Syncerus caffer ASM640878v2	ABF221	8,763	21,575	30,338	90.97

Open in a new tab

Interestingly, these 3 assemblies showed higher contiguity (N50) than the draft assembly of the lowland anoa, further indicating the unreliability of using exclusively N50 and L50 metrics when assessing genome assembly quality.

Out of the 1,921,249 genomic features annotations of the reference assembly NDDB_SH_1, homology prediction identified 1,815,794 (94.51%) complete and 69,929 (3.63%) partial features in the lowland anoa genome assembly, which is comparable to other published assemblies (Fig. 5), indicating a good level of genome completeness. GlimmerHMM de novo predicted 1,027,469 unique genomic features (mRNA and coding sequences, CDS), which is an improvement over some of the water buffalo assemblies used for quality comparison (Table 5). Homology-based gene prediction identified 32,393 genes in the lowland anoa genome assembly, representing 97.14% of the genes annotated in NDDB_SH_1 (n = 33,348). Of these, 59.11% (19,148) were complete and 40.88% (13,245) were partial, probably reflecting the level of fragmentation of the lowland anoa genome assembly. Nevertheless, the total number of genes predicted still represents an improvement over some of the compared assemblies (Table 6).

Fig. 5. — Complete genomic features identified in the lowland anoa assembly and compared to other assemblies using the river buffalo (*Bubalus bubalis*) NDD_SH1 reference sequence and annotations.

When predicting mammalian orthologs with BUSCO, the lowland anoa genome assembly contained 6,556 (71.1%) complete BUSCOs, of which 6,412 (69.5%) were single copy and 144 (1.6%) were duplicated. The number of fragmented BUSCOs was 1,076 (11.7%), whilst 1,594 (17.2%) were missing. The BUSCO results indicate an acceptable level of genome completeness (<70%, Simão et al. 2015) for downstream analyses for the anoa genome assembly, and a slight improvement over the Egyptian river buffalo assembly (EGYBUF_1.0, Fig. 6).

Fig. 6. — BUSCO results of the genome assembly of the lowland anoa (*Bubalus depressicornis*) compared to other publicly available buffalo genome assemblies.

Mammalian genomes contain large families of repeats (Goodier and Kazazian 2008), such as long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs), and long-terminal repeats (LTRs). RepeatMasker revealed that 42.12% of the lowland anoa genome is composed of repetitive regions (Table 7), which is comparable to data previously published for genome assemblies of river buffalo and other bovids (Deng et al. 2016; Low et al. 2019; Mintoo et al. 2019; El-Khishin et al. 2020). Results also agree with the repetitive content in the cattle genome (Fig. 7b). Both lowland anoa and cattle genomes showed 2 waves of repeat expansion in their repeat landscape (Fig. 7, a and b), suggesting a shared inheritance of such repeats. In the lowland anoa, the LINEs were more abundant, representing 30.04% of the repeats, followed by LTRs representing 3.10% and SINEs representing 1.03% (Table 7).

Table 7.

Repeat sequence composition of the lowland anoa genome.

Family	Copy number of elements	Length occupied (bp)	% Genome
SINEs	296,064	26,945,915	1.03
LINEs	2,864,468	786,815,034	30.04
LINE1	1,203,360	282,366,346	10.78
LINE2	101,415	13,911,301	0.53
RTE/Bov-B	1,461,651	481,114,012	18.37
LTR elements	362,123	81,208,077	3.10
DNA transposon	255,003	38,433,935	1.47
Small RNA	139,586	14,174,190	0.54
Satellites	269	52,169	0.00
Simple repeats	500,363	20,187,327	0.77
Low complexity	81,685	3,956,146	0.15
Unclassified	611,789	100,086,577	3.82
Total			42.12

Open in a new tab

Fig. 7. — Interspersed repeat landscape of (a) the lowland anoa genome assembled in this study and (b) *Bos taurus*.

Conclusion

To date, whole-genome sequencing has allowed the identification of variants involved in domestication and genetic improvement for several livestock species (Zimin et al. 2009; Canavez et al. 2012; Li et al. 2020; Rosen et al. 2020). However, the lack of wild buffalo genomes hinders further analyses addressing functional and evolutionary aspects of this group, as well as possible conservation efforts. The draft genome assembly of the lowland anoa reported here is expected to contribute to this gap in data availability, as this is the first draft genome assembly for wild Asian buffaloes. Furthermore, we showed that short-read Illumina sequencing data can still provide a cost-effective way of sequencing mammalian genomes to an adequate level of completeness for downstream comparative analyses.

Acknowledgments

The authors thank the people of the Ménagerie du Jardin des Plantes who helped to collect the biopsy of the lowland anoa used in this study: Norin Chai, Gerard Dousseau, Christelle Hano, Abderrahmane Latreche, Claire Rejaud, Roland Simon, and Rudy Wedlarski. They would like to thank Huw Jones for the proofreading of the manuscript.

Funding

RR was supported by sDiv, Synthesis Centre of the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, funded by the German Research Foundation (DFG– FZT 118, 202548816), and the German Research Foundation (DFG Research grant RO 5835/2-1).

Conflicts of interest

None declared.

Contributor Information

Stefano Porrelli, Department of Natural Sciences, Faculty of Science and Technology, Middlesex University, London NW4 4BT, UK.

Michèle Gerbault-Seureau, Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE, UA, 75005 Paris, France.

Roberto Rozzi, Museum für Naturkunde, Leibniz-Institut für Evolutions- und Biodiversitätsforschung, 10115 Berlin, Germany; German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103 Leipzig, Germany.

Rayan Chikhi, Institut Pasteur, Université Paris Cité, Sequence Bioinformatics, 75015 Paris, France.

Manon Curaudeau, Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE, UA, 75005 Paris, France.

Anne Ropiquet, Department of Natural Sciences, Faculty of Science and Technology, Middlesex University, London NW4 4BT, UK.

Alexandre Hassanin, Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE, UA, 75005 Paris, France.

Data Availability

The raw data and assembly are available on NCBI under BioProject PRJNA849775. The genome assembly of the lowland anoa is available on NCBI under BioSample accession SAMN29133250. The raw data are available on the Sequence Read Archive (SRA) on NCBI under accession SRR21016826.

Literature cited

Bolger AM, Lohse M, Usadel B.. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, Boisvert S, Chapman JA, Chapuis G, Chikhi R, et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience. 2013;2(1):10–31. doi: 10.1186/2047-217X-2-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Burton J, Wheeler P, Mustari A.. Bubalus depressicornis. The IUCN Red List of Threatened Species™. 2016. doi: 10.2305/IUCN.UK.2016-2.RLTS.T3126A46364222. [DOI]
Canavez FC, Luche DD, Stothard P, Leite KRM, Sousa-Canavez JM, Plastow G, Meidanis J, Souza MA, Feijao P, Moore SS, et al. Genome sequence and assembly of Bos indicus. J Hered. 2012;103(3):342–348. doi: 10.1093/jhered/esr153. [DOI] [PubMed] [Google Scholar]
Castelló JR. Bovids of the World: Antelopes, Gazelles, Cattle, Goats, Sheep, and Relatives . Princeton (NJ: ): Princeton University Press; 2016. [Google Scholar]
Chu J. Jupiter plot: a Circos-Based tool to Visualize Genome Assembly Consistency (Version 1.0). Github; 2018. [accessed 2022 Feb 22]. https://github.com/JustinChu/JupiterPlot.
Curaudeau M, Rozzi R, Hassanin A.. The genome of the lowland anoa (Bubalus depressicornis) illuminates the origin of river and swamp buffalo. Mol Phylogenet Evol. 2021;161(March):107170. doi:10.1016/j.ympev.2021.107170. [DOI] [PubMed] [Google Scholar]
Deng T, Pang C, Lu X, Zhu P, Duan A, Tan Z, Huang J, Li H, Chen M, Liang X.. De novo transcriptome assembly of the Chinese swamp buffalo by RNA sequencing and SSR marker discovery. PLoS One. 2016;11(1):e0147132. doi: 10.1371/journal.pone.0147132. [DOI] [PMC free article] [PubMed] [Google Scholar]
El-Khishin DA, Ageez A, Saad ME, Ibrahim A, Shokrof M, Hassan LR, Abouelhoda MI.. Sequencing and assembly of the Egyptian buffalo genome. PLoS One. 2020;15(8):e0237087. doi:10.1371/journal.pone.0237087. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ewels P, Magnusson M, Lundin S, Käller M.. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–3048. doi: 10.1093/bioinformatics/btw354. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fitzinger LJ. Der Sunda-Büffel (Bubalus kerabau). In: Wissenschaftlich-populäre Naturgeschichte der Säugethiere in ihren sämmtlichen Hauptformen, V. Kaiserlich-Königlichen Hof- und Staatsdruckerei. Wien. p. 329. 1860.
Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A. 2011;108(4):1513–1518. doi: 10.1073/pnas.1017351108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goodier JL, Kazazian HH.. Retrotransposons revisited: the restraint and rehabilitation of parasites. Cell. 2008;135(1):23–35. doi: 10.1016/j.cell.2008.09.022. [DOI] [PubMed] [Google Scholar]
Groves CP. Systematics of the anoa (Mammalia, Bovidae). Beaufortia. 1969;17:1–12. [Google Scholar]
Gurevich A, Saveliev V, Vyahhi N, Tesler G.. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hassanin A, Delsuc F, Ropiquet A, Hammer C, Jansen Van Vuuren B, Matthee C, Ruiz-Garcia M, Catzeflis F, Areskoug V, Nguyen TT, et al Pattern and timing of diversification of Cetartiodactyla (Mammalia, Laurasiatheria), as revealed by a comprehensive analysis of mitochondrial genomes. C R Biol. 2012;335(1):32–50. doi: 10.1016/j.crvi.2011.11.002. [DOI] [PubMed] [Google Scholar]
Heude PM. Note sur le petit buffle sauvage de l’ile de Mindoro (Philippines). Mémoires Concern I’histoire Nat L’Empire Chinois. 1888;2(4):50. [Google Scholar]
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431(7011):931–945. [DOI] [PubMed] [Google Scholar]
IUCN. Red List Threat Species. The IUCN Red List of Threatened Species; 2022. [accessed 2022 Feb 15]. https://www.iucnredlist.org/
Johnson JM, Edwards S, Shoemaker D, Schadt EE.. Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments. Trends Genet. 2005;21(2):93–102. doi: 10.1016/j.tig.2004.12.009. [DOI] [PubMed] [Google Scholar]
Kerr R. Arnee Bos arnee. In: Strahan A, Cadell T, Creech W, editors. The Animal Kingdom or Zoological System of the Celebrated Sir Charles Linnaeus. Class I. Mammalia. Edinburgh and London. 1792. p. 336. [Google Scholar]
Kriventseva EV, Kuznetsov D, Tegenfeldt F, Manni M, Dias R, Simão FA, Zdobnov EM.. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 2019;47(D1):D807–D811. doi: 10.1093/nar/gky1053. [DOI] [PMC free article] [PubMed] [Google Scholar]
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA.. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19(9):1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li X, Yang J, Shen M, Xie XL, Liu GJ, Xu YX, Lv FH, Yang H, Yang YL, Liu CB, et al. Whole-genome resequencing of wild and domestic sheep identifies genes associated with morphological and agronomic traits. Nat Commun. 2020;11(1):1–16. doi:0.1038/s41467-020–16485-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
Linnaeus. Bubalus bubalis. GBIF Secr; 1758. [accessed 2022 Mar 14]. https://www.gbif.org/species/7422937.
Low WY, Tearle R, Bickhart DM, Rosen BD, Kingan SB, Swale T, Thibaud-Nissen F, Murphy TD, Young R, Lefevre L, et al. Chromosome-level assembly of the water buffalo genome surpasses human and goat genomes in sequence contiguity. Nat Commun. 2019;10(1):260–211. doi:10.1038/s41467-018–08260-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Luo X, Zhou Y, Zhang B, Zhang Y, Wang X, Feng T, Li Z, Cui K, Wang Z, Luo C, et al. Understanding divergent domestication traits from the whole-genome sequencing of swamp- and river-buffalo populations. Natl Sci Rev. 2020;7(3):686–701. doi: 10.1093/nsr/nwaa024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Majoros WH, Pertea M, Salzberg SL.. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20(16):2878–2879. doi: 10.1093/bioinformatics/bth315. [DOI] [PubMed] [Google Scholar]
Manchanda N, Portwood JL, Woodhouse MR, Seetharam AS, Lawrence-Dill CJ, Andorf CM, Hufford MB.. GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics. 2020;21(1):1–9. doi: 10.1186/s12864-020-6568-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM.. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021;38(10):4647–4654. doi:10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marçais G, Yorke JA, Zimin A.. QuorUM: an error corrector for Illumina reads. PLoS One. 2015;10(6):e0130821. doi: 10.1371/journal.pone.0130821. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A.. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018;34(13):i142–i150. doi: 10.1093/bioinformatics/bty266. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mintoo AA, Zhang H, Chen C, Moniruzzaman M, Deng T, Anam M, Emdadul Huque QM, Guang X, Wang P, Zhong Z, et al. Draft genome of the river water buffalo. Ecol Evol. 2019;9(6):3378–3388. doi: 10.1002/ece3.4965. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nguyen TT, Aniskin VM, Gerbault-Seureau M, Planton H, Renard JP, Nguyen BX, Hassanin A, Volobouev VT.. Phylogenetic position of the saola (Pseudoryx nghetinhensis) inferred from cytogenetic analysis of eleven species of Bovidae. Cytogenet Genome Res. 2008;122(1):41–54. doi: 10.1159/000151315. [DOI] [PubMed] [Google Scholar]
Ouwens PA. Contribution a la connaissance des mammiféres de Célébeès. Bull Dépt Agric Indes Néerl. 1910;38(Zool. 6):1–7. [Google Scholar]
Priyono DS, Solihin DD, Farajallah A, Purwantara B.. The first complete mitochondrial genome sequence of the endangered mountain anoa (Bubalus quarlesi) (Artiodactyla: Bovidae) and phylogenetic analysis. J Asia-Pacific Biodivers. 2020;13(2):123–133. doi:10.1016/j.japb.2020.01.006. [Google Scholar]
Rosen BD, Bickhart DM, Schnabel RD, Koren S, Elsik CG, Tseng E, Rowan TN, Low WY, Zimin A, Couldrey C, et al. De novo assembly of the cattle reference genome with single-molecule sequencing. Gigascience. 2020;9(3):1–9. doi: 10.1093/gigascience/giaa021. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schreiber A, Seibold I, Nötzold G, Wink M.. Cytochrome b gene haplotypes characterize chromosomal lineages of anoa, the Sulawesi dwarf buffalo (Bovidae: Bubalus sp.). J Hered. 1999;90(1):165–176. doi: 10.1093/jhered/90.1.165. [DOI] [PubMed] [Google Scholar]
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM.. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. doi:10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
Smith CH. The seventh order of the Mammalia. The Ruminantia. In: Griffith E, Smith CH, Pidgeon E, editors. The Animal Kingdom Arranged in Conformity with Its Organization, by the Baron Cuvier, Member of the Institute of France, with Additional Descriptions of All the Species Hitherto Named, and of Many Not before Noticed. London: Whittaker G.B; 1827. p. 293. [Google Scholar]
Tarailo-Graovac M, Chen N.. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinforma. 2009;25:1–14. doi: 10.1002/0471250953.bi0410s25. [DOI] [PubMed] [Google Scholar]
Weissensteiner MH, Suh A.. Repetitive DNA: the dark matter of avian genomics. In: Kraus R, editor. Avian Genomics in Ecology and Evolution. Cham: Springer; 2019. pp. 93–150. 10.1007/978-3-030-16477-5_5. [DOI] [Google Scholar]
Zhang S, Liu W, Liu X, Du X, Zhang K, Zhang Y, Song Y, Zi Y, Qiu Q, Lenstra JA, et al. Structural variants selected during yak domestication inferred from long-read whole-genome sequencing. Mol Biol Evol. 2021;38(9):3676–3680. doi: 10.1093/molbev/msab134. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, Hanrahan F, Pertea G, Van Tassell CP, Sonstegard TS, et al. A whole-genome assembly of the domestic cow, Bos taurus. Genome Biol. 2009;10(4):R42. doi: 10.1186/gb-2009-10-4-r42. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zimin AV, Marçais G, Puiu D, Roberts M, Salzberg SL, Yorke JA.. The MaSuRCA genome assembler. Bioinformatics. 2013;29(21):2669–2677. doi: 10.1093/bioinformatics/btt476. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zimin AV, Puiu D, Luo MC, Zhu T, Koren S, Marçais G, Yorke JA, Dvořák J, Salzberg SL.. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res. 2017;27(5):787–792. doi: 10.1101/gr.213405.116. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[jkac234-B1] Bolger AM, Lohse M, Usadel B.. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B2] Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, Boisvert S, Chapman JA, Chapuis G, Chikhi R, et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience. 2013;2(1):10–31. doi: 10.1186/2047-217X-2-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B3] Burton J, Wheeler P, Mustari A.. Bubalus depressicornis. The IUCN Red List of Threatened Species™. 2016. doi: 10.2305/IUCN.UK.2016-2.RLTS.T3126A46364222. [DOI]

[jkac234-B4] Canavez FC, Luche DD, Stothard P, Leite KRM, Sousa-Canavez JM, Plastow G, Meidanis J, Souza MA, Feijao P, Moore SS, et al. Genome sequence and assembly of Bos indicus. J Hered. 2012;103(3):342–348. doi: 10.1093/jhered/esr153. [DOI] [PubMed] [Google Scholar]

[jkac234-B5] Castelló JR. Bovids of the World: Antelopes, Gazelles, Cattle, Goats, Sheep, and Relatives . Princeton (NJ: ): Princeton University Press; 2016. [Google Scholar]

[jkac234-B6] Chu J. Jupiter plot: a Circos-Based tool to Visualize Genome Assembly Consistency (Version 1.0). Github; 2018. [accessed 2022 Feb 22]. https://github.com/JustinChu/JupiterPlot.

[jkac234-B7] Curaudeau M, Rozzi R, Hassanin A.. The genome of the lowland anoa (Bubalus depressicornis) illuminates the origin of river and swamp buffalo. Mol Phylogenet Evol. 2021;161(March):107170. doi:10.1016/j.ympev.2021.107170. [DOI] [PubMed] [Google Scholar]

[jkac234-B8] Deng T, Pang C, Lu X, Zhu P, Duan A, Tan Z, Huang J, Li H, Chen M, Liang X.. De novo transcriptome assembly of the Chinese swamp buffalo by RNA sequencing and SSR marker discovery. PLoS One. 2016;11(1):e0147132. doi: 10.1371/journal.pone.0147132. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B9] El-Khishin DA, Ageez A, Saad ME, Ibrahim A, Shokrof M, Hassan LR, Abouelhoda MI.. Sequencing and assembly of the Egyptian buffalo genome. PLoS One. 2020;15(8):e0237087. doi:10.1371/journal.pone.0237087. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B10] Ewels P, Magnusson M, Lundin S, Käller M.. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–3048. doi: 10.1093/bioinformatics/btw354. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B11] Fitzinger LJ. Der Sunda-Büffel (Bubalus kerabau). In: Wissenschaftlich-populäre Naturgeschichte der Säugethiere in ihren sämmtlichen Hauptformen, V. Kaiserlich-Königlichen Hof- und Staatsdruckerei. Wien. p. 329. 1860.

[jkac234-B12] Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A. 2011;108(4):1513–1518. doi: 10.1073/pnas.1017351108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B13] Goodier JL, Kazazian HH.. Retrotransposons revisited: the restraint and rehabilitation of parasites. Cell. 2008;135(1):23–35. doi: 10.1016/j.cell.2008.09.022. [DOI] [PubMed] [Google Scholar]

[jkac234-B14] Groves CP. Systematics of the anoa (Mammalia, Bovidae). Beaufortia. 1969;17:1–12. [Google Scholar]

[jkac234-B15] Gurevich A, Saveliev V, Vyahhi N, Tesler G.. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B16] Hassanin A, Delsuc F, Ropiquet A, Hammer C, Jansen Van Vuuren B, Matthee C, Ruiz-Garcia M, Catzeflis F, Areskoug V, Nguyen TT, et al Pattern and timing of diversification of Cetartiodactyla (Mammalia, Laurasiatheria), as revealed by a comprehensive analysis of mitochondrial genomes. C R Biol. 2012;335(1):32–50. doi: 10.1016/j.crvi.2011.11.002. [DOI] [PubMed] [Google Scholar]

[jkac234-B17] Heude PM. Note sur le petit buffle sauvage de l’ile de Mindoro (Philippines). Mémoires Concern I’histoire Nat L’Empire Chinois. 1888;2(4):50. [Google Scholar]

[jkac234-B18] International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431(7011):931–945. [DOI] [PubMed] [Google Scholar]

[jkac234-B19] IUCN. Red List Threat Species. The IUCN Red List of Threatened Species; 2022. [accessed 2022 Feb 15]. https://www.iucnredlist.org/

[jkac234-B20] Johnson JM, Edwards S, Shoemaker D, Schadt EE.. Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments. Trends Genet. 2005;21(2):93–102. doi: 10.1016/j.tig.2004.12.009. [DOI] [PubMed] [Google Scholar]

[jkac234-B21] Kerr R. Arnee Bos arnee. In: Strahan A, Cadell T, Creech W, editors. The Animal Kingdom or Zoological System of the Celebrated Sir Charles Linnaeus. Class I. Mammalia. Edinburgh and London. 1792. p. 336. [Google Scholar]

[jkac234-B22] Kriventseva EV, Kuznetsov D, Tegenfeldt F, Manni M, Dias R, Simão FA, Zdobnov EM.. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 2019;47(D1):D807–D811. doi: 10.1093/nar/gky1053. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B23] Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA.. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19(9):1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B24] Li X, Yang J, Shen M, Xie XL, Liu GJ, Xu YX, Lv FH, Yang H, Yang YL, Liu CB, et al. Whole-genome resequencing of wild and domestic sheep identifies genes associated with morphological and agronomic traits. Nat Commun. 2020;11(1):1–16. doi:0.1038/s41467-020–16485-11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B25] Linnaeus. Bubalus bubalis. GBIF Secr; 1758. [accessed 2022 Mar 14]. https://www.gbif.org/species/7422937.

[jkac234-B26] Low WY, Tearle R, Bickhart DM, Rosen BD, Kingan SB, Swale T, Thibaud-Nissen F, Murphy TD, Young R, Lefevre L, et al. Chromosome-level assembly of the water buffalo genome surpasses human and goat genomes in sequence contiguity. Nat Commun. 2019;10(1):260–211. doi:10.1038/s41467-018–08260-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B27] Luo X, Zhou Y, Zhang B, Zhang Y, Wang X, Feng T, Li Z, Cui K, Wang Z, Luo C, et al. Understanding divergent domestication traits from the whole-genome sequencing of swamp- and river-buffalo populations. Natl Sci Rev. 2020;7(3):686–701. doi: 10.1093/nsr/nwaa024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B28] Majoros WH, Pertea M, Salzberg SL.. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20(16):2878–2879. doi: 10.1093/bioinformatics/bth315. [DOI] [PubMed] [Google Scholar]

[jkac234-B29] Manchanda N, Portwood JL, Woodhouse MR, Seetharam AS, Lawrence-Dill CJ, Andorf CM, Hufford MB.. GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics. 2020;21(1):1–9. doi: 10.1186/s12864-020-6568-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B30] Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM.. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021;38(10):4647–4654. doi:10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B31] Marçais G, Yorke JA, Zimin A.. QuorUM: an error corrector for Illumina reads. PLoS One. 2015;10(6):e0130821. doi: 10.1371/journal.pone.0130821. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B32] Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A.. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018;34(13):i142–i150. doi: 10.1093/bioinformatics/bty266. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B33] Mintoo AA, Zhang H, Chen C, Moniruzzaman M, Deng T, Anam M, Emdadul Huque QM, Guang X, Wang P, Zhong Z, et al. Draft genome of the river water buffalo. Ecol Evol. 2019;9(6):3378–3388. doi: 10.1002/ece3.4965. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B34] Nguyen TT, Aniskin VM, Gerbault-Seureau M, Planton H, Renard JP, Nguyen BX, Hassanin A, Volobouev VT.. Phylogenetic position of the saola (Pseudoryx nghetinhensis) inferred from cytogenetic analysis of eleven species of Bovidae. Cytogenet Genome Res. 2008;122(1):41–54. doi: 10.1159/000151315. [DOI] [PubMed] [Google Scholar]

[jkac234-B35] Ouwens PA. Contribution a la connaissance des mammiféres de Célébeès. Bull Dépt Agric Indes Néerl. 1910;38(Zool. 6):1–7. [Google Scholar]

[jkac234-B36] Priyono DS, Solihin DD, Farajallah A, Purwantara B.. The first complete mitochondrial genome sequence of the endangered mountain anoa (Bubalus quarlesi) (Artiodactyla: Bovidae) and phylogenetic analysis. J Asia-Pacific Biodivers. 2020;13(2):123–133. doi:10.1016/j.japb.2020.01.006. [Google Scholar]

[jkac234-B37] Rosen BD, Bickhart DM, Schnabel RD, Koren S, Elsik CG, Tseng E, Rowan TN, Low WY, Zimin A, Couldrey C, et al. De novo assembly of the cattle reference genome with single-molecule sequencing. Gigascience. 2020;9(3):1–9. doi: 10.1093/gigascience/giaa021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B38] Schreiber A, Seibold I, Nötzold G, Wink M.. Cytochrome b gene haplotypes characterize chromosomal lineages of anoa, the Sulawesi dwarf buffalo (Bovidae: Bubalus sp.). J Hered. 1999;90(1):165–176. doi: 10.1093/jhered/90.1.165. [DOI] [PubMed] [Google Scholar]

[jkac234-B39] Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM.. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. doi:10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[jkac234-B40] Smith CH. The seventh order of the Mammalia. The Ruminantia. In: Griffith E, Smith CH, Pidgeon E, editors. The Animal Kingdom Arranged in Conformity with Its Organization, by the Baron Cuvier, Member of the Institute of France, with Additional Descriptions of All the Species Hitherto Named, and of Many Not before Noticed. London: Whittaker G.B; 1827. p. 293. [Google Scholar]

[jkac234-B41] Tarailo-Graovac M, Chen N.. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinforma. 2009;25:1–14. doi: 10.1002/0471250953.bi0410s25. [DOI] [PubMed] [Google Scholar]

[jkac234-B42] Weissensteiner MH, Suh A.. Repetitive DNA: the dark matter of avian genomics. In: Kraus R, editor. Avian Genomics in Ecology and Evolution. Cham: Springer; 2019. pp. 93–150. 10.1007/978-3-030-16477-5_5. [DOI] [Google Scholar]

[jkac234-B43] Zhang S, Liu W, Liu X, Du X, Zhang K, Zhang Y, Song Y, Zi Y, Qiu Q, Lenstra JA, et al. Structural variants selected during yak domestication inferred from long-read whole-genome sequencing. Mol Biol Evol. 2021;38(9):3676–3680. doi: 10.1093/molbev/msab134. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B44] Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, Hanrahan F, Pertea G, Van Tassell CP, Sonstegard TS, et al. A whole-genome assembly of the domestic cow, Bos taurus. Genome Biol. 2009;10(4):R42. doi: 10.1186/gb-2009-10-4-r42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B45] Zimin AV, Marçais G, Puiu D, Roberts M, Salzberg SL, Yorke JA.. The MaSuRCA genome assembler. Bioinformatics. 2013;29(21):2669–2677. doi: 10.1093/bioinformatics/btt476. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac234-B46] Zimin AV, Puiu D, Luo MC, Zhu T, Koren S, Marçais G, Yorke JA, Dvořák J, Salzberg SL.. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res. 2017;27(5):787–792. doi: 10.1101/gr.213405.116. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Draft genome of the lowland anoa (Bubalus depressicornis) and comparison with buffalo genome assemblies (Bovidae, Bubalina)

Stefano Porrelli

Michèle Gerbault-Seureau

Roberto Rozzi

Rayan Chikhi

Manon Curaudeau

Anne Ropiquet

Alexandre Hassanin

Roles

Abstract

Introduction

Materials and methods

DNA extraction, library preparation, and genome sequencing

Fig. 1.

De novo assembly

Assembly quality assessment

Table 1.

Repeats and gene annotation

Results and discussion

Whole-genome sequencing and data QC

De novo assembly quality metrics

Table 2.

Table 3.

Fig. 2.

Fig. 3.

Table 4.

Fig. 4.

Genomic features, gene prediction, and annotation

Table 5.

Table 6.

Fig. 5.

Fig. 6.

Table 7.

Fig. 7.

Conclusion

Acknowledgments

Funding

Conflicts of interest

Contributor Information

Data Availability

Literature cited

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases