Skip to main content
GigaByte logoLink to GigaByte
. 2024 Jul 18;2024:1–13. doi: 10.46471/gigabyte.130

Chromosomal-level genome assembly and single-nucleotide polymorphism sites of black-faced spoonbill Platalea minor

Hong Kong Biodiversity Genomics Consortium *,
PMCID: PMC11273517  PMID: 39071178

Abstract

Platalea minor, or black-faced spoonbill (Threskiornithidae), is a wading bird confined to coastal areas in East Asia. Due to habitat destruction, it was classified as globally endangered by the International Union for Conservation of Nature. However, the lack of genomic resources for this species hinders the understanding of its biology and diversity, and the development of conservation measures. Here, we report the first chromosomal-level genome assembly of P. minor using a combination of PacBio SMRT and Omni-C scaffolding technologies. The assembled genome (1.24 Gb) contains 95.33% of the sequences anchored to 31 pseudomolecules. The genome assembly has high sequence continuity with scaffold length N50 = 53 Mb. We predicted 18,780 protein-coding genes and measured high BUSCO score completeness (97.3%). Finally, we revealed 6,155,417 bi-allelic single nucleotide polymorphisms, accounting for ∼5% of the genome. This resource offers new opportunities for studying the black-faced spoonbill and developing conservation measures for this species.

Introduction

The black-faced spoonbill Platalea minor (Threskiornithidae) (NCBI:txid259913, Figure 1A) is confined to coastal areas in East Asia, including Hong Kong, Macau, Taiwan, Vietnam, North Korea, South Korea, and Japan. The natural habitats of P. minor have been disturbed by human activities and industrialization, leading to the decline in the bird population over the last century [1, 2]. With an estimation of more than 6,000 individuals worldwide, the International Union for Conservation of Nature (also known as IUCN) has categorised the black-faced spoonbill as a globally endangered species. A quarter of the worldwide population of P. minor can be found in Hong Kong, and it is protected locally under the Wild Animals Protection Ordinance Cap 200. Genetic methods, including studies on genetic diversity and population structure, have been used to help retain this species with high conservation value [3, 4]. Nevertheless, a reference genome of this species was missing.

Figure 1.

Figure 1.

(A) Picture of Platalea minor; (B) Statistics of the genome assembly generated in this study; (C) Hi-C contact map of the assembly visualised using Juicebox (v1.11.08); (D) Repetitive elements distribution.

Methods

Sample collection

Tissue samples of 14 P. minor individuals were collected from the north and northwestern parts of the New Territories, Hong Kong, between February 2015 and February 2020, with help from Kadoorie Farm and Botanic Garden. These samples were stored in 95% ethanol. Details of the sample collection are listed in Table 1.

Table 1.

Summary of sequencing data.

Sample No. of reads No. of bases Coverage (×) Accession number KFBG reference number Provenance details Date specimen acquired at KFBG Specimen type
Sequencing data for reference genome
PacBio HiFi 2,707,085 25,352,990,722 20 SAMN35152374 K14352 Mai Po 15/02/2020 Tissue in 95% ethanol
Omni-C 518,587,164 77,788,074,600 63 SAMN40731791 B0181-2004 Lok Ma Chau 26/01/2004 Tissue sample to be taken from carcass
Population resequencing data
BFS1 39,968,142 5,995,204,753 4.8 SAMN35319659 K7366 Mai Po fishpond 23/02/2015 Tissue in 95% ethanol
BFS2 41,230,936 6,184,622,630 5.0 SAMN35319660 K7388 Lok Ma Chau fishpond 13/04/2015 Portion of toe in 95% ethanol
BFS3 40,776,316 6,116,429,712 4.9 SAMN35319661 K8627 Ma Cho Lung fishpond 11/02/2016 Tissue in 95% ethanol
BFS4 39,063,262 5,859,473,834 4.7 SAMN35319662 K11131 Mai Po Gei Wai 26/02/2018 Tissue in 95% ethanol
BFS5 44,638,096 6,695,693,640 5.4 SAMN35319663 K11194 Lut Chau, Nam San Wai 10/04/2018 Tissue in 95% ethanol
BFS6 41,890,130 6,283,502,065 5.1 SAMN35319664 K12542 Lok Ma Chau pond 18/12/2018 Blood in 95% ethanol
BFS7 41,319,808 6,197,954,720 5.0 SAMN35319665 K12611 Lok Ma Chau pond 17/01/2019 Tissue in 95% ethanol
BFS8 42,068,480 6,310,254,770 5.1 SAMN35319666 K12706 Lok Ma Chau 03/03/2019 Blood in 95% ethanol
BFS9 42,539,460 6,380,900,668 5.1 SAMN35319667 K14133 Lok Ma Chau 26/11/2019 Tissue in 95% ethanol
BFS10 40,359,264 6,053,871,746 4.9 SAMN35319668 K14208 Lok Ma Chau 25/12/2019 Tissue in 95% ethanol
BFS11 41,496,386 6,224,440,405 5.0 SAMN35319669 K14366 Mai Po 20/02/2020 Portion of feather in 95% ethanol
BFS12 39,361,374 5,904,189,803 4.8 SAMN35319670 K14401 Tin Shui Wai Wetland Park 12/03/2020 Tissue in 95% ethanol
BFS14 41,319,546 6,197,915,065 5.0 SAMN35319671 K14324 Tin Shui Wai Wetland Park 03/02/2020 Portion of feather in 95% ethanol

Isolation of high molecular weight genomic DNA

High molecular weight (HMW) genomic DNA was extracted from a single individual, labelled “BFS13”. The tissue sample was first ground into a powder with liquid nitrogen and then processed with the Qiagen MagAttract HMW kit (Qiagen Cat. No. 67563), following the manufacturer’s protocol. The final DNA sample was eluted with 120 μL of elution buffer (PacBio Ref. No. 101-633-500) and subjected to quality checks using the NanoDrop™ One/OneC Microvolume UV–Vis Spectrophotometer, Qubit® Fluorometer, and overnight pulse-field gel electrophoresis.

DNA shearing, PacBio library preparation, and sequencing

Approximately 4.4 μg of HMW DNA was processed with DNA shearing through six centrifugation steps in a g-tube (Covaris Part No. 520079) at 2,000 × g for 2 min. The sheared DNA was transferred to a 2 mL DNA LoBind® Tube (Eppendorf Cat. No. 022431048) and stored at 4 °C. Overnight pulse-field gel electrophoresis was used to assess the fragment size distribution of the sheared DNA. Next, an SMRT bell library was constructed using the SMRTbell® prep kit 3.0 (PacBio Ref. No. 102-141-700), following the manufacturer’s instructions. Briefly, the sheared DNA was processed with DNA repair, followed by polishing and tailing with A-overhang at both ends of each DNA strand. T-overhang SMRTbell adapters were then ligated to the polished ends to form SMRTbell templates, which were purified with SMRTbell® cleanup beads (PacBio Ref. No. 102158-300). The quantity and fragment size of the SMRTbell library were inspected with Qubit® Fluorometer and overnight pulse-field gel electrophoresis, respectively. A nuclease treatment was conducted to remove any non-SMRTbell structures, and a subsequent size-selection step with 35% AMPure PB beads was used to remove short fragments. The final preparation of the library was performed using the Sequel® II binding kit 3.2 (PacBio Ref. No. 102-194-100). In brief, Sequel II primer 3.2 and Sequel II DNA polymerase 2.2 were added to anneal and bind to the SMRTbell templates, respectively. An internal control provided by the kit was also added. Finally, the library was loaded on the PacBio Sequel IIe System at an on-plate concentration of 90 pM with the diffusion loading mode. The sequencing was run in 30-h movies, with 120 min pre-extension. In total, one SMRT cell was used to output high-fidelity (HiFi) reads, and the sequencing data details are listed in Table 1.

Omni-C library preparation and sequencing

An Omni-C library was constructed using the Dovetail® Omni-C® Library Preparation Kit (Dovetail Cat. No. 21005), following the manufacturer’s protocol. A total of 80 mg of tissue was ground into a powder with liquid nitrogen, transferred to 1 mL 1× PBS, and then subjected to crosslinking with formaldehyde and digestion with endonuclease DNase I. An aliquot of 2.5 μL lysate was used for assessing lysate quantification and fragment size distribution using Qubit® Fluorometer and TapeStation D5000 HS Screen Tape, respectively. Then, end polishing, bridge ligation, and proximity ligation were carried out in the crosslinked DNA fragments. Next, crosslink reversal was performed, followed by DNA purification and size selection with SPRIselect™ Beads (Beckman Coulter Product No. B23317). The library preparation was continued with end repair and adapter ligation using the Dovetail™ Library Module for Illumina (Dovetail Cat. No. 21004), followed by DNA purification with SPRIselect™ Beads. The DNA fragments were then captured with Streptavidin Beads and Universal and Index PCR Primers from the Dovetail™ Primer Set for Illumina (Dovetail Cat. No. 25005) were added to amplify the DNA library. A final size selection was carried out using SPRIselect™ Beads to retain DNA fragments ranging between 350 bp and 1000 bp. The quantity and fragment size distribution of the library were inspected by the Qubit® Fluorometer and the TapeStation D5000 HS ScreenTape, respectively. The final library was sequenced on an Illumina HiSeq-PE150 platform at Novogene. The details of the sequencing data are listed in Table 1.

Genome assembly and gene model prediction

De novo genome assembly was performed using Hifiasm (RRID:SCR_021069) [5]. Haplotypic duplications were identified and removed using purge_dups (RRID:SCR_021173) based on the depth of HiFi reads [6]. Proximity ligation data from the Omni-C library was used to scaffold genome assembly by YaHS (RRID:SCR_022965) [7]. Transposable elements (TEs) were annotated using the automated Earl Grey TE annotation pipeline (version 1.2) as previously described [8]. Genome annotation was performed using Braker (v3.0.8) (RRID:SCR_018964) [9] with default parameters. Briefly, the genome was soft-masked using redmask (v0.0.2) [10]. A total of 2,468,534 aves reference protein sequences were downloaded from NCBI as protein references. A blood RNA-Seq dataset (SRR6650848) [11] was also downloaded from NCBI and aligned to the soft-masked genome using hisat2 (RRID:SCR_015530) [12] to generate the bam file. The protein and bam files were used as input to Braker for genome annotation.

Platalea minor resequencing and single nucleotide polymorphism analysis

Genomic DNA from 13 P. minor individuals were isolated using the PureLink™ Genomic DNA Mini Kit (Invitrogen Cat no. K182002), following the manufacturer’s instructions. The quality of DNA samples was assessed with the NanoDrop™ One/OneC Microvolume UV–Vis Spectrophotometer and 1% gel electrophoresis. Next, the samples were sent to Novogene for sequencing on an Illumina HiSeq-PE150 platform at approximately 6× coverage. The sequenced raw reads were then trimmed by Trimmomatic (v0.39, RRID:SCR_011848) [13] and cleaned with Kraken 2 (RRID:SCR_005484) [14]. The cleaned reads were aligned to large scaffolds (>500 kb, n = 234), accounting for 97.1% of the P. minor reference genome, with BWA-MEM (RRID:SCR_022192) [15] using the parameters “-t 30 -M -R”. Variant calling was performed using “HaplotypeCaller” and “GenotypeGVCFs” commands from the Genome Analysis Toolkit (GATK, RRID:SCR_001876, v4.1.2.0) [16]. Hard filtering was employed to filter out single nucleotide polymorphisms (SNPs) with the following criteria: quality by depth <2.0, Fisher strand bias >60.0, mapping quality <40.0, mapping quality rank sum test <−12.5, and read position rank sum test <−8.0. The remaining SNPs were further filtered for bi-allelic (“--min-alleles 2 --max-alleles 2”), no missing data (“--max-missing 1”), minimum summed site-depth (sumDP) of 20 and maximum sumDP of 130 to remove sites that were below one-third and above three-fold of the average sumDP, respectively, using the “--site-depth” and “--positions” options in VCFtools (v0.1.16, RRID:SCR_001235) [17]. The heterozygosity and inbreeding coefficient were estimated using VCFtools (v0.1.16) [17]. Details of the resequencing data are listed in Table 1.

Data validation and quality control

During DNA extraction and PacBio library preparation, the samples were subjected to quality control with NanoDrop™ One/OneC Microvolume UV–Vis Spectrophotometer, Qubit® Fluorometer, and overnight pulse-field gel electrophoresis. The Omni-C library was inspected by Qubit® Fluorometer and TapeStation D5000 HS ScreenTape.

Regarding the genome assembly, the Hifiasm output was blast to the NT database, and the resulting output was used as input for Blobtools (v1.1.1, RRID:SCR_017618) [18]. Scaffolds identified as possible contaminations were removed from the assembly manually (Figure 2). A statistical kmer-based approach was applied to estimate the heterozygosity of the assembled genome. The repeat content and the corresponding sizes were analysed with k-mer 21 using Jellyfish (RRID:SCR_005491) [19] and GenomeScope (RRID:SCR_017014) [20] (Figure 3; Table 2). BUSCO (v5.5.0) [21] was used to assess the completeness of the genome assembly and gene annotation with a metazoan dataset (aves_odb10). HiC contact maps were generated using Juicer tools (version 1.22.01, RRID:SCR_017226) [22], following the Omni-C manual [23].

Figure 2.

Figure 2.

Genome assembly quality control and contaminant/cobiont detection. The upper panel shows the BlobPlot of the assembly. Each circle represents a scaffold with its size scaled according to its scaffold length, while the colour of the circle indicates the taxonomic assignment from BLAST similarity search results. The lower panel reveals the ReadCovPlot of the assembly, illustrating the proportion of unmapped and mapped sequences in the BLAST similarity search results on the left. The latter is further dissected according to the rank of phylum on the right.

Figure 3.

Figure 3.

The GenomeScope profile with kmer 21.

Table 2.

Summary of the GenomeScope statistics (k = 21).

Property min max
Homozygous (aa) 99.34% 99.37%
Heterozygous (ab) 0.63% 0.66%
Genome Haploid Length (bp) 1,141,324,739 1,144,280,536
Genome Repeat Length (bp) 112,880,770 113,173,108
Genome Unique Length (bp) 1,028,443,969 1,031,107,427
Model Fit 93.18% 99.52%
Read Error Rate 0.41% 0.41%

Omni-C reads and PacBio HiFi reads were used to measure the assembly completeness and the consensus quality (QV) using Merqury (v1.3, RRID:SCR_022964) [24] with kmer 20, resulting in 95.0738% kmer completeness for the Omni-C data and 59.746 QV scores for the HiFi reads, corresponding to 99.999% accuracy.

The black-faced spoonbill genome assembly was also compared to five other avian genomes with chromosome-level assemblies and genome annotations, including Gallus gallus (Ggal: GCF_016699485.2), Cuculus canorus (Ccan: GCF_017976375.1), Mycteria americana (Mame: GCA_035582795.1), Taeniopygia guttata (Tgut: GCF_003957565.2), and Theristicus caerulescens (Tcae: GCA_020745775.1), which were downloaded from NCBI [25] and UCSC [26], respectively [2729]. Macrosynteny was performed using MCScan (RRID:SCR_017650) with default parameters [30]. It is worth noting that some parts of the largest scaffold in P. minor mapped to several T. caerulescens chromosomes, while genomes in other birds show relatively high syntenic conservation [31], which may warrant further investigation (Figure 4).

Figure 4.

Figure 4.

Macrosynteny between P. minor (Pmin), Theristicus caerulescens (Tcae), Mycteria americana (Mame), Taeniopygia guttata (Tgut), Cuculus canorus (Ccan), and Gallus gallus (Ggal).

Results and discussion

Genome assembly of P. minor

A total of 25.35 Gb of HiFi bases was generated with an average HiFi read length of 9,365 bp with 20× data coverage (Table 1). After scaffolding with 77.79 Gb Omni-C sequencing data, the assembled genome size was 1.24 Gb in 468 scaffolds, with a scaffold N50 of 53 Mb and L50 of 8 (Tables 1, 3 and 4; Figures 1B and C). The genome size is comparable to those of other bird species in the family Threskiornithidae, which have genome sizes around 1.0–1.3 Gb, according to the data available in the NCBI Genbank, such as Theristicus caerulescens (1.20 Gb, GCA_020745775.1), Nipponia nippon (1.31 Gb, GCA_035839065.1), and Mesembrinibis cayennensis (1.19 Gb, GCA_013399675.1). The genome completeness was estimated by BUSCO (RRID:SCR_015008) with a value of 97.3% (aves_odb10) (Table 3; Figure 1B). The GC content was 42.98%. A total of 14,673 gene models were generated with 18,780 predicted protein-coding genes, having a mean coding-sequence length of 516 amino acids and a complete protein BUSCO value of 78.3% (Table 3).

Table 3.

Genome statistics.

Platalea minor
Total length (bp) 1,239,504,613
Number 468
Mean length (bp) 2,648,514
Longest scaffold length (bp) 108,170,464
Shortest scaffold length (bp) 1,000
N_count 0.02%
Gaps 998
N50 53,081,851
N50n 8
N70 31,851,659
N70n 15
N90 14,707,644
N90n 25
BUSCO (Geno, metazoa_odb10) C:93.7%[S:93.4%,D:0.3%],F:1.6%,M:4.7%,n:954
BUSCO (Geno, aves_odb10) C:97.3%[S:97.0%,D:0.3%],F:0.5%,M:2.2%,n:8338
Protein total length (amino acids) 9,684,019
Protein number (amino acids) 18,780
Protein mean length (amino acids) 516
BUSCO (Prot, metazoa_odb10) C:88.4%[S:72.5%,D:15.9%],F:1.7%,M:9.9%,n:954
BUSCO (Prot, aves_odb10) C:78.3%[S:59.9%,D:18.4%],F:1.9%,M:19.8%,n:8338

Table 4.

Scaffold information with a length larger than 1 Mb.

Scaffold ID Scaffold length (bp) Cumulative % of the whole genome
scaffold_1 108,170,464 8.73%
scaffold_2 94,402,276 16.34%
scaffold_3 92,629,600 23.81%
scaffold_4 79,211,724 30.20%
scaffold_5 73,048,200 36.10%
scaffold_6 72,537,701 41.95%
scaffold_7 69,784,815 47.58%
scaffold_8 53,081,851 51.86%
scaffold_9 44,005,081 55.41%
scaffold_10 36,492,200 58.35%
scaffold_11 35,351,409 61.20%
scaffold_12 34,190,505 63.96%
scaffold_13 33,557,343 66.67%
scaffold_14 33,263,400 69.35%
scaffold_15 31,851,659 71.92%
scaffold_16 31,385,420 74.45%
scaffold_17 30,450,346 76.91%
scaffold_18 30,280,111 79.35%
scaffold_19 26,443,200 81.49%
scaffold_20 25,001,000 83.50%
scaffold_21 23,104,878 85.37%
scaffold_22 18,730,967 86.88%
scaffold_23 17,395,620 88.28%
scaffold_24 15,031,567 89.49%
scaffold_25 14,707,644 90.68%
scaffold_26 14,565,521 91.86%
scaffold_27 13,184,709 92.92%
scaffold_28 9,343,000 93.67%
scaffold_29 7,143,856 94.25%
scaffold_30 6,869,713 94.80%
scaffold_31 6,476,723 95.33%
scaffold_32 4,846,626 95.72%
scaffold_33 4,355,004 96.07%
scaffold_34 2,327,616 96.26%
scaffold_35 2,194,594 96.43%
scaffold_36 1,422,971 96.55%
scaffold_37 1,263,285 96.65%
scaffold_38 1,176,292 96.74%
scaffold_39 1,115,873 96.83%

Repeat content

A total repeat content of 11.94% was found in the genome, which contained a lower level of repeat elements, similar to other avian genomes [32], with 2.49% unclassified elements. Of the remaining repeats, long interspersed nuclear elements (LINE) were the most abundant (5.10%), followed by long terminal repeats (LTR) (1.62%). In contrast, DNA, short interspersed nuclear elements (SINE), Penelope, and rolling circle were only present in low proportions (DNA: 0.63%, SINE: 0.09%, Penelope: 0.06%, rolling circle: 0.02%). A complete catalogue of the repeat content of the genome can be found in Table 5 and Figure 1D.

Table 5.

Summary of the repetitive elements analysis.

Classification Coverage length (bp) Count Proportion (%) No. of distinct classifications
DNA 7,812,155 25,610 0.63 3,780
LINE 63,189,657 102,036 5.10 5,652
LTR 20,129,006 29,922 1.62 3,928
Other (Simple Repeat, Microsatellite, RNA) 23,864,680 3,875 1.93 1,118
Penelope 690,955 1,966 0.06 646
Rolling Circle 287,526 744 0.02 404
SINE 1,165,920 4,910 0.09 885
Unclassified 30,908,397 60,141 2.49 5,695
SUM 148,048,296 229,204 11.94 22,108

Single nucleotide polymorphism sites

A total of 6,046,878 bi-allelic SNPs were called from 13 P. minor individuals, accounting for ∼0.5% of the genome. The mean individual heterozygosity was 0.142%. The lowest individual heterozygosity (0.077%) was close to other endangered bird species, such as Pelecanus crispus (0.60%) and Nestor notabilis (0.91%) [33]. The heterozygosity levels (0.108% to 0.116%) from five individuals were comparable to previous reports on spoonbills - black-faced spoonbill (0.101%–0.116%, mean 1.09%, n = 11) and royal spoonbill (0.098%–0.109%, mean 0.105%, n = 9) [4]. The remaining heterozygosity levels observed in this study were below the mean (0.221%) and median (0.213%) of heterozygosity reported from 40 avian species [33]. Signals of inbreeding were observed among the samples, with the inbreeding coefficient (F IS) ranging from 0.331 to 0.720 (Table 6), providing additional evidence of a recent genetic bottleneck in the black-faced spoonbill population [4]. High levels of F IS have also been observed in other bird populations suffering from past bottlenecks [34]. These results highlighted the need for continuous efforts in monitoring P. minor.

Table 6.

Number of SNPs, statistics of heterozygosity and inbreeding coefficient of 13 Platalea minor individuals.

Sample ID No. of sites with observed heterozygosity H 0 (%) F IS
BFS1 1,015,586 0.168 0.382
BFS2 673,637 0.111 0.590
BFS3 654,457 0.108 0.602
BFS4 1,013,752 0.168 0.383
BFS5 700,877 0.116 0.573
BFS6 1,076,231 0.178 0.345
BFS7 1,080,566 0.179 0.342
BFS8 1,055,650 0.175 0.357
BFS9 463,725 0.077 0.718
BFS10 997,870 0.165 0.393
BFS11 700,069 0.116 0.574
BFS12 652,122 0.108 0.603
BFS14 1,109,354 0.183 0.325

Conclusion and reuse potential

This study presents the first chromosomal-level genome assembly and single-nucleotide polymorphism sites of black-faced spoonbill Platalea minor. These are useful and valuable resources for future population genomic studies aimed at better understanding spoonbill species numbers and conservation.

Funding Statement

This work was funded and supported by the Hong Kong Research Grant Council Collaborative Research Fund (C4015-20EF), CUHK Strategic Seed Funding for Collaborative Research Scheme (3133356) and CUHK Group Research Scheme (3110154).

Contributor Information

Hong Kong Biodiversity Genomics Consortium:

Jerome H. L. Hui, Ting Fung Chan, Leo Lai Chan, Siu Gin Cheung, Chi Chiu Cheang, James Kar-Hei Fang, Juan Diego Gaitan-Espitia, Stanley Chun Kwan Lau, Yik Hei Sung, Chris Kong Chu Wong, Kevin Yuk-Lap Yip, Yingying Wei, Wai Lok So, Wenyan Nong, Sean Tsz Sum Law, Paul Crow, Aiko Leong, Liz Rose-Jeffreys, and Ho Yin Yip

Data availability

The final assembly has been deposited at NCBI under the accession number JBBPFK000000000. The raw reads generated in this study, including Omni-C (SAMN40731791) and PacBio HiFi (SAMN35152374) data, have been deposited in the NCBI database under the BioProject accession number PRJNA973839. The genome, genomic and repeat annotation files have been deposited and are publicly available in Figshare [35].

Abbreviations

HiFi, high-fidelity; HMW, high molecular weight; LINE, long interspersed nuclear element; LTR, long terminal repeat; QV, consensus quality; SINE, short interspersed nuclear element; SNP, single nucleotide polymorphisms; sumDP, summed site-depth; TE, transposable elements.

Declarations

Ethics approval and consent to participate

The authors declare that ethical approval was not required for this type of research.

Competing interests

The authors declare that they do not have competing interests.

Authors’ contributions

JHLH, TFC, LLC, SGC, CCC, JKHF, JDG, SCKL, YHS, CKCW, KYLY and YW conceived and supervised the study; WLS carried out DNA extraction, library preparation and sequencing; WN performed genome assembly and gene model prediction; STSL carried out the SNPs calling and Fst calculations; PC, AL, LRJ and HYY collected and maintained the samples. All authors approved the final version of the manuscript.

Funding

This work was funded and supported by the Hong Kong Research Grant Council Collaborative Research Fund (C4015-20EF), CUHK Strategic Seed Funding for Collaborative Research Scheme (3133356) and CUHK Group Research Scheme (3110154).

References

  • 1.Takano S, Henmi Y. . The influence of constructing a Shinkansen bridge on Black-faced Spoonbills Platalea minor wintering in Kyushu, Japan. Ornithol. Sci., 2012; 11: 21–28. doi: 10.2326/osj.11.21. [DOI] [Google Scholar]
  • 2.Guo-An W, Fu-Min L, Zuo-Hua Y et al. Nesting and disturbance of the Black-faced Spoonbill in Liaoning Province, China. Waterbirds, 2005; 28: 420–425. doi: 10.1675/1524-4695(2005)28[420:NADOTB]2.0.CO;2. [DOI] [Google Scholar]
  • 3.Lee M-Y, Kwon I-K, Lee K et al. Genetic diversity and population structure of the Black-faced Spoonbill (Platalea minor) among its breeding sites in South Korea: implication for conservation. Biochem. Syst. Ecol., 2017; 71: 106–113. doi: 10.1016/j.bse.2017.01.014. [DOI] [Google Scholar]
  • 4.Li S-H, Liu Y, Yeh C-F et al. Not out of the woods yet: signatures of the prolonged negative genetic consequences of a population bottleneck in a rapidly re-expanding wader, the black-faced spoonbill Platalea minor . Mol. Ecol., 2022; 31: 529–545. doi: 10.1111/mec.16260. [DOI] [PubMed] [Google Scholar]
  • 5.Cheng H, Concepcion GT, Feng X et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods, 2021; 18: 170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Guan D, Guan D, McCarthy SA et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics, 2020; 36: 2896–2898. doi: 10.1093/BIOINFORMATICS/BTAA025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zhou C, McCarthy SA, Durbin R. . YaHS: yet another Hi-C scaffolding tool. Bioinformatics, 2023; 39: btac808. doi: 10.1093/bioinformatics/btac808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Baril T, Galbraith J, Hayward A. . Earl Grey: a fully automated user-friendly transposable element annotation and analysis pipeline. Mol. Biol. Evol., 2024; 41(4): msae068. doi: 10.1093/molbev/msae068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hoff KJ, Lomsadze A, Borodovsky M et al. Whole-genome annotation with BRAKER. In: Gene Prediction: Methods and Protocols. Springer, 2019; pp. 65–95. doi: 10.1007/978-1-4939-9173-0_5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Girgis HZ. . Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinform., 2015; 16: 227. doi: 10.1186/s12859-015-0654-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cho YS, Jun JH, Kim JA et al. Raptor genomes reveal evolutionary signatures of predatory and nocturnal lifestyles. Genome Biol., 2019; 20: 181. doi: 10.1186/s13059-019-1793-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kim D, Paggi JM, Park C et al. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol., 2019; 37: 907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bolger AM, Lohse M, Usadel B. . Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014; 30: 2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wood DE, Lu J, Langmead B. . Improved metagenomic analysis with Kraken 2. Genome Biol., 2019; 20: 257. doi: 10.1186/s13059-019-1891-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Li H. . Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013; 10.48550/arXiv.1303.3997. [DOI]
  • 16.DePristo MA, Banks E, Poplin R et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet., 2011; 43: 491–498. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Danecek P, Auton A, Abecasis G et al. The variant call format and VCFtools. Bioinformatics, 2011; 27: 2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Laetsch DR, Blaxter ML. . BlobTools: interrogation of genome assemblies. F1000Research, 2017; 6: 1287. doi: 10.12688/f1000research.12232.1. [DOI] [Google Scholar]
  • 19.Marçais G, Kingsford C. . A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 2011; 27: 764–770. doi: 10.1093/bioinformatics/btr011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ranallo-Benavidez TR, Jaron KS, Schatz MC. . GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun., 2020; 11: 1432. doi: 10.1038/s41467-020-14998-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Manni M, Berkeley MR, Seppey M et al. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol., 2021; 38: 4647–4654. doi: 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Durand NC, Shamim MS, Machol I et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst., 2016; 3: 95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Omni-C manual . https://omni-c.readthedocs.io/en/latest/contact_map.html.
  • 24.Rhie A, Walenz BP, Koren S et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol., 2020; 21: 245. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.NCBI Repository. . https://www.ncbi.nlm.nih.gov/.
  • 26.UCSC Database. . https://hgdownload.soe.ucsc.edu/.
  • 27.Flamio R Jr, Ramstad KM. . Chromosome-level genome of the wood stork (Mycteria americana) provides insight into avian chromosome evolution. J. Heredity, 2024; 115: 230–239. doi: 10.1093/jhered/esad077. [DOI] [PubMed] [Google Scholar]
  • 28.Formenti G, Rhie A, Balacco J et al. Complete vertebrate mitogenomes reveal widespread repeats and gene duplications. Genome Biol., 2021; 22: 120. doi: 10.1186/s13059-021-02336-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Rhie A, McCarthy SA, Fedrigo O et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature, 2021; 592: 737–746. doi: 10.1038/s41586-021-03451-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tang H, Bowers JE, Wang X et al. Synteny and collinearity in plant genomes. Science, 2008; 320: 486–488. doi: 10.1126/science.1153917. [DOI] [PubMed] [Google Scholar]
  • 31.Xu L, Ren Y, Wu J et al. Evolution and expression patterns of the neo-sex chromosomes of the crested ibis. Nat. Commun., 2024; 15: 1670. doi: 10.1038/s41467-024-46052-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhang G, Li C, Li Q et al. Comparative genomics reveals insights into avian genome evolution and adaptation. Science, 2014; 346: 1311–1320. doi: 10.1126/science.1251385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Li S, Li B, Cheng C et al. Genomic signatures of near-extinction and rebirth of the crested ibis and other endangered bird species. Genome Biol., 2014; 15: 557. doi: 10.1186/s13059-014-0557-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Martin CA, Sheppard EC, Illera JC et al. Runs of homozygosity reveal past bottlenecks and contemporary inbreeding across diverging populations of an island-colonizing bird. Mol. Ecol., 2023; 32: 1972–1989. doi: 10.1111/mec.16865. [DOI] [PubMed] [Google Scholar]
  • 35.Hong Kong Biodiversity Genomics Consortium. . Chromosomal-level genome assembly and single-nucleotide polymorphism sites of black-faced spoonbill Platalea minor . Figshare. [Dataset]. 2024; 10.6084/m9.figshare.25532389.v1. [DOI]
GigaByte. 2024 Jul 18;2024:1–13.

Article Submission

Jerome Hui
GigaByte.

Assign Handling Editor

Editor: Scott Edmunds
GigaByte.

Editor Assess MS

Editor: Hongfang Zhang
GigaByte.

Curator Assess MS

Editor: Mary-Ann Tuli
GigaByte.

Review MS

Editor: Richard Flamio

Reviewer name and names of any other individual's who aided in reviewer Richard Flamio Jr.
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? No
Please add additional comments on language quality to clarify if needed There are some grammatical errors and spelling mistakes throughout the text.
Are all data available and do they match the descriptions in the paper? Yes
Additional Comments
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a> Yes
Additional Comments
Is the data acquisition clear, complete and methodologically sound? Yes
Additional Comments
Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
Additional Comments The authors did a phenomenal job at detailing the methods and data-processing steps.
Is there sufficient data validation and statistical analyses of data quality? Yes
Additional Comments
Is the validation suitable for this type of data? Yes
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes
Additional Comments
Any Additional Overall Comments to the Author Very nice job on the paper. The methods are sound and the statistics regarding the genome assembly are thorough. My only two comments are: 1) I think the paper could be improved by the correction of grammatical errors, and 2) I am interested in a discussion about the number of chromosomes expected for this species (or an estimate) based on related species and if the authors believe all of the chromosomes were identified. For example, is the karyotype known or can the researchers making any inferences about the number of microchromosomes in the assembly? Please see a recent paper I wrote on microchromosomes in the wood stork assembly (https://doi.org/10.1093/jhered/esad077) for some ideas in defining the chromosome architecture of the spoonbill and/or comparing this architecture to related species.
Recommendation Major Revision
GigaByte.

Review MS

Editor: Phred M Benham

Reviewer name and names of any other individual's who aided in reviewer Phred Benham
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? Yes
Please add additional comments on language quality to clarify if needed Generally yes, the language is sufficiently clear. However, a number of places could be refined and extra words removed.
Are all data available and do they match the descriptions in the paper? No
Additional Comments Additional data is available on fig share. I do not see any of the tables that are cited in the manuscript and contain legends. Am I missing something. Also there is no legend for the GenomeScope profile in figure 3. The assembly appears to be on genbank as a scaffold level assembly, can you list this accession info in the data availability section in addtion to the project number..
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a> Yes
Additional Comments
Is the data acquisition clear, complete and methodologically sound? Yes
Additional Comments
Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
Additional Comments
Is there sufficient data validation and statistical analyses of data quality? No
Additional Comments Overall fine, but some additional analyses would aid the paper. Comparison of the spoonbill genome to other close relatives using a synteny plot would be helpful. It would also be useful to put heterozygosity and inbreeding coefficients into context by comparing to results from other species.
Is the validation suitable for this type of data? Yes
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes
Additional Comments
Any Additional Overall Comments to the Author Hui et al. report a chromosome level genome for the black-faced spoonbill, a endangered species of coastal wetlands in East Asia. This genome will serve as an important genome for understanding the biology of and conserving this species. Generally, the methods are sound and appropriate for the generation of genomic sequence. Major comments: This is a highly contiguous genome in line with metrics for Vertebrate Genomics Project genomes and other consortia. The authors argue that they have assembled 31 Pseudo-molecules or chromosomes. It would be nice to see a plot showing synteny of these 31 chromosomes and a closely related species with a chromosome level assembly (e.g. Theristicus caerulescens; GCA_020745775.1) The tables appear to be missing from the submitted manuscript? Minor comments: Line 49: delete its Line 49-51: This sentence is a little awkward, please revise. Line 64: delete 'the' Line 67: replace 'with' with 'the spoonbil as a' Line 68: delete 'Interestingly' Line 70: can you be more specific about what kind of genetic methods had previously been performed? Line 79: can you provide any additional details on the necessary permits and/or institutional approval Line 78: what kind of tissue? or were these blood samples? Line 110: do you mean movies? Line 143: replace data with dataset Line 163: it may be worth applying some additional filters in vcftools, e.g. minor allele freq., min depth, max depth, what level of missing data was allowed?, etc. Line 171: delete 'resulted in' Line 172: do you mean scaffold L50 was 8? Line 191-195: some context would be useful here, how does this level of heterozygosity and inbreeding compare to other waterbirds? Line 217: why did you use the Metazoan database and not the Aves_odb10 database for Busco? Figure 1b: Number refers to what, scaffolds? Be consistent with capitalization for Mb. It seems like the order of scaffold N50 and L50 were reversed. Figure 3 is missing a legend.
Recommendation Major Revision
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte. 2024 Jul 18;2024:1–13.

Major Revision

Jerome Hui
GigaByte.

Assess Revision

Editor: Hongfang Zhang
GigaByte.

Re-Review MS

Editor: Richard Flamio

Indicate in the comments box below whether you are happy with the changes made or if the manuscript is unacceptable.
Comments on revised manuscript The authors incorporated the revisions nicely and have produced a quality manuscript. Well done. Minor revisions Line 46: A comma is needed after (Threskiornithidae). Line 47: “The” should not be capitalized. Line 48: This should read “as a globally endangered species.” Line 49: “However, the lack of genomic resources for the species hinders the understanding of its biology…” Line 56: Consider changing “also revealed” to “identified” to avoid repetition from the previous sentence. Line 65: Insert “the” before “bird’s.” Lines 69-70: Move “locally” higher in the sentence – “and it is protected locally…” Line 72: Replace “as of to date” with “prior to this study”. Lines 78-79: Pluralize “part.” Line 86: Replace “proceeded” with “processed.” Line 133: “…are listed in Table 1.” Line 158: “accounted” Line 159: “Variant calling was performed using…” Line 161: “Hard filtering was employed…” Lines 200-201: “The heterozygosity levels… from five individuals were comparable to previous reports on spoonbills – black-faced spoonbill … and royal spoonbill … (Li et al. 2022).” Line 202: New sentence. “The remaining heterozygosity levels observed…” Line 206: “…genetic bottleneck in the black-faced spoonbill…” Lines 208-209: “These results highlight the need…” Lines 213-214: “…which are useful and precious resources for future population genomic studies aimed at better understanding spoonbill species numbers and conservation.” Line 226: Missing a period after “heterozygosity.” For references, consider adding DOIs. Some citations have them but most citations would benefit from this addition.
GigaByte.

Re-Review MS

Editor: Phred M Benham

Indicate in the comments box below whether you are happy with the changes made or if the manuscript is unacceptable.
Comments on revised manuscript I previously reviewed this manuscript and overall the authors have done a nice job addressing all of my comments. I appreciate that the authors include the MCscan analysis that I suggested. However, the alignment of the P. minor assembly and annotations to other genomes suggests rampant mis-assembly or translocations. Birds have fairly high synteny and I would expect Pmin to look more similar to the comparison between T. caerulescens and M. americana in the MCscan plot. For instance, parts of the largest scaffold in the Pmin assembly map to multiple different chromosomes in the Tcae assembly. Similarly, the Z in Tcae maps to 11 different scaffolds in the Pmin assembly and there does not appear to be a single large scaffold in the Pmin assembly that corresponds to the Z chromosome. The genome seems to be otherwise of strong quality, so I urge the authors to double-check their MCscan synteny analysis. If this pattern remains, can you please add some comments about it to the end of the Data Validation and Quality Control section? I think other readers will also be surprised at the low levels of synteny apparent between the spoonbill and ibis assemblies.
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte. 2024 Jul 18;2024:1–13.

Minor Revision

Jerome Hui
GigaByte.

Assess Revision

Editor: Hongfang Zhang
GigaByte.

Final Data Preparation

Editor: Christopher Hunter
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte.

Accept

Editor: Scott Edmunds

Editor’s Assessment This work is part of a series of papers from the Hong Kong Biodiversity Genomics Consortium sequencing the rich biodiversity of species in Hong Kong (see https://doi.org/10.46471/GIGABYTE_SERIES_0006). This example assembles the genome of the black-faced spoonbill (Platalea minor), an emblematic wading bird from East Asia that is classified as globally endangered by the IUCN. This Data Release reporting a 1.24Gb chromosomal-level genome assembly produced using a combination of PacBio SMRT and Omni-C scaffolding technologies. BUSCO and Merqury validation were carried out, gene models created, and peer reviewers also requested MCscan synteny analysis. This showed the genome assembly had high sequence continuity with scaffold length N50=53 Mb. Presenting data from 14 individuals this will hopefully be a useful and valuable resources for future population genomic studies aimed at better understanding spoonbill species numbers and conservation.
Editor’s Assessment This work is part of a series of papers from the Hong Kong Biodiversity Genomics Consortium sequencing the rich biodiversity of species in Hong Kong (see https://doi.org/10.46471/GIGABYTE_SERIES_0006). This example assembles the genome of the black-faced spoonbill (Platalea minor), an emblematic wading bird from East Asia that is classified as globally endangered by the IUCN. This Data Release reporting a 1.24Gb chromosomal-level genome assembly produced using a combination of PacBio SMRT and Omni-C scaffolding technologies. BUSCO and Merqury validation were carried out, gene models created, and peer reviewers also requested MCscan synteny analysis. This showed the genome assembly had high sequence continuity with scaffold length N50=53 Mb. Presenting data from 14 individuals this will hopefully be a useful and valuable resources for future population genomic studies aimed at better understanding spoonbill species numbers and conservation.
GigaByte.

Export to Production

Editor: Scott Edmunds

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    The final assembly has been deposited at NCBI under the accession number JBBPFK000000000. The raw reads generated in this study, including Omni-C (SAMN40731791) and PacBio HiFi (SAMN35152374) data, have been deposited in the NCBI database under the BioProject accession number PRJNA973839. The genome, genomic and repeat annotation files have been deposited and are publicly available in Figshare [35].


    Articles from GigaByte are provided here courtesy of Gigascience Press

    RESOURCES