Skip to main content
Scientific Data logoLink to Scientific Data
. 2024 Oct 31;11:1187. doi: 10.1038/s41597-024-04032-2

Chromosome-scale genome assembly of the mangrove climber species Dalbergia candenatensis

Miaomiao Shi 1,2,#, Yu Zhang 1,2,#, Huiwen Huang 1,2, Shiran Gu 1,2, Xiangping Wang 1,2, Shijin Li 1,2,, Zhongtao Zhao 1,2,, Tieyao Tu 1,2,
PMCID: PMC11528007  PMID: 39482322

Abstract

Consisting of trees, climbers and herbs exclusively in the intertidal environments, mangrove forest is one of the most extreme and vulnerable ecosystems of our planet and has long been of great interest for biologists and ecologists. Here, we first assembled the chromosome-scale genome of a climber mangrove plant, Dalbergia candenatensis. The assembled genome size is approximately 474.55 Mb, with a scaffold N50 of 48.1 Mb, a complete BUSCO score of 98.4%, and a high LTR Assembly Index value of 21. The genome contained 283.46 Mb (59.74%) repetitive sequences, and 29,554 protein-coding genes were predicted, of which 87.54% were functionally annotated in five databases. The high-quality genome assembly and annotation presented herein provide a valuable genomic resource that will expedite genomic and evolutionary studies of mangrove plants and facilitate the elucidation of molecular mechanisms underlying the salt- and water-logging-tolerance of mangrove plants.

Subject terms: Genome, Plant stress responses

Background & Summary

Mangrove forests, characterized by fluctuating salinity, hypoxia, and intense ultraviolet light in intertidal environments1, represent one of the most extreme and vulnerable ecosystems. Despite these challenging conditions, mangroves have evolved a range of distinct morphological and physiological traits in order to adapt the harsh coastal conditions13, such as vivipary, salt secretion, and aerial roots to adapt. Mangrove forests can mitigate the effects of flooding and typhoons, maintain tropical and subtropical marine biodiversity, and sequester carbon4, thereby offering significant ecological benefits and economic value. However, mangrove forests are encountering escalating pressures from global climate changes and anthropogenic activities, such as exploitation and deforestation, which have resulted in more than 20% reduction of area in the past 40 years4,5, followed by losses of species richness and functional diversity6. This underscores the urgent need for effective conservation, restoration, and management practices to protect the mangrove ecosystems. To achieve successful conservation and sustainable management of mangrove ecosystems, it is essential to gain a deep understanding of the evolutionary patterns and genomic architecture of the diverse flora and fauna species that inhabit these unique habitats. The rapid development of sequencing technologies has enabled numerous studies to successfully generate high-quality whole-genome assembly resources of mangrove plants2,3,7, which facilitate uncovering their adaptation mechanisms in the intertidal zone, thereby promoting the breeding of coastal shelterbelts. However, previous studies on mangroves have primarily focused on tree species, with a notable lack of research on shrubs and climbers.

Dalbergia L.f. is a genus of the family Leguminosae (Fabaceae), the third-largest plant family of the angiosperms, encompassing approximately 250 species globally of trees, shrubs and lianas810. These species are predominantly distributed in the pantropic regions of Asia, America, and Africa11. Many Dalbergia species are economically significant due to the superior quality of their heartwood, characterized by exceptional durability, captivating color, and unique fragrance12, such as rosewoods D. oliveri Gamble ex Prain, D. cochinchinensis Pierre, and D. odorifera T. C. Chen, which is widely recognized as “Hongmu” in China10. Additionally, some Dalbergia species are of ecological importance for their abilities of fixing atmospheric nitrogen with aeschynomenoid type root modules13, and functions of ecological restoration in vulnerable ecosystems14. High-quality genomic resources provide opportunities to investigate the functional genes associated with key traits and disease resistance, as well as to elucidate the molecular mechanisms underlying environmental adaptation12,15. To date, useful genomic data of five species within the Dalbergia genus have been recently published, including D. cochinchinensis, D. cultrata12, D. odorifera16, D. oliveri15, and D. sissoo17, all of which are of economic significance due to their valuable heartwood.

Dalbergia candenatensis (Dennst.) Prain (2n = 2x = 20) is a woody climber predominantly found in the tropical coasts of China and neighboring southeast Asian countries, extending south to northern Australia9. It is the only species of the genus Dalbergia that grows exclusively on the landward side of mangrove forests near the high tide line, categorizing it as semi-mangroves14. This species can withstand tidal saline soil and survive under submerged conditions (Fig. 1). As a wood climber, D. candenatensis typically climbs on other mangrove plants, and its stem’s high tenacity may enhance the resilience of mangrove forests against wind and waves. The pod’s thick coriaceous or nearly woody nature facilitates long-term fruit floating in seawater and seed dispersal over great distances. Furthermore, the heartwood and leaves of D. candenatensis have been reported to contain high concentrations of isoflavonoids, flavonoids, tannins, and phenolic compounds1820, which potentially bolster stress resistance and promote growth in the challenging intertidal zone environments. These morphological and physiological attributes indicate that D. candenatensis is well-adapted to its unique habitats, making it as an ideal candidate species for ecological restoration of mangrove ecosystems. However, the lack of the assembled genome has significantly hindered our deeper understanding for D. candenatensis’s adaptive mechanisms and its potential application in mangrove forest restoration practices.

Fig. 1.

Fig. 1

The habitat and morphology of Dalbergia candenatensis. (a) The intertidal habitat. The red arrow shows D. candenatensis. (b) Branches bearing flowers. (c) Fruits of D. candenatensis. (d) Woody stem of D. candenatensis.

Here, we generated a high-quality chromosome-scale genome of a mangrove species D. candenatensis (Leguminosae) by combing PacBio high-fidelity (HiFi) long-read sequencing, Illumina short-read sequencing and Hi-C data. The assembled genome had a total size of 474.55 Mb, with a scaffold N50 of 48.1 Mb (Table 1). A total of 471.70 Mb (99.4%) of the sequences were successfully anchored and oriented onto ten pseudo-chromosomes of D. candenatensis. The genome contained 29,554 genes and 283.46 Mb (59.74%) repetitive sequences. The high-quality reference genome of D. candenatensis provides valuable resource, which will accelerate the genomic and evolutionary studies within the genus Dalbergia, facilitate to explore molecular mechanisms involved in the salt- and water-logging-tolerance of mangrove plants, and lay a foundation for utilization in ecological restoration of the mangrove ecosystem.

Table 1.

Statistics of Dalbergia candenatensis genome assembly and annotation.

Assembly feature
Estimated genome size (Mb) 521.35
Assembly size (Mb) 474.55
Scaffold N50 (Mb) 48.1
Contig N50 (Mb) 44.1
Anchor ratio (%) 99.4%
GC content 35.37%
BUSCO (%) 98.4%
LAI 21
Genome annotation
Number of protein-coding genes 29554
Average gene length (bp) 3331.22
Average CDS length (bp) 1127.55
Average exon length (bp) 225.53
Functional annotation
Nr 25831 (87.40%)
eggNOG 24943 (84.40%)
KEGG 10647 (36.03%)
GO 20739 (79.78%)
uniprot 25116 (84.98%)
Total 25871 (87.54%)

Methods

Plant materials and sequencing

In June 2021, young and healthy leaves were collected from one individual of D. candenatensis for Illumina sequencing, PacBio SMRT sequencing and Hi-C sequencing in Bamen Bay, Hainan, China (110°47′48.43″ E, 19°36′15.28″ N). The voucher specimen was deposited in South China Botanical Garden (accession number: Zsc545). The Cetyltrimethylammonium bromide (CTAB) method was used for genomic DNA extraction21. The quality of the genomic DNA was assessed by a NanoDrop spectrophotometer (Thermo Fisher Scientific, USA), using a pure DNA standard with an OD260/280 ratio between 1.8 and 2.0 and an OD260/230 ratio between 2.0 and 2.2. DNA quantification was then performed using a Qubit 4.0 fluorometer (Invitrogen, USA). For Illumina sequencing, libraries with an insert size of 350 bp were prepared for Paired-end sequencing on the Illumina NovaSeq6000 platform. Approximate 34.28 Gb of short-read data was obtained and used for genome survey (Table 2).

Table 2.

DNA sequencing statistics.

Read_type Raw data
Read_base Read_Number Depth (×)
HiFi reads 39,772,719,434 2,444,282 83.91
Illumina reads 34,277,051,700 228,513,678 72.23
Hi-C reads 71,176,276,500 474,508,510 149.99

For PacBio SMRT sequencing, qualified high-quality DNA samples with the bands larger than 30 kb were randomly broken into 15–18 kb fragments, and the libraries obtained by enrichment and purification of large fragments were sequenced on the PacBio Sequel II/PacBio Sequel IIe platform. A total of 39.77 Gb HiFi reads (~83.91 × coverage) with N50 size 17,211 bp were obtained for de novo assembling.

The Hi-C library was constructed according to the protocol involving the following steps: fixation of cells using paraformaldehyde to preserve their conformation; cross-linking of DNA in lysate-fixed cells; generation of sticky ends by treating the cross-linked DNA with restriction enzymes; repair and labeling of DNA ends with biotin; connection of DNA fragments using DNA ligase; elimination of cross-linking state and purification of DNA through protease digestion, followed by random fragmentation into 300–500 bp fragments. Subsequently, the library sequencing was performed using Illumina PE150, generating 71.18 Gb reads (~149.99 × coverage). Clean reads were obtained by de-splicing the original sequence and filtering out low-quality reads.

To aid gene prediction, three tissues including leaves, stems, roots of D. candenatensis were collected. The total RNA was extracted with HiPure Universal RNA Mini Kit (Magen, Guangzhou, China). Libraries were prepared and sequenced on Illumina NovaSeq6000 platform. A total of 8.25, 6.35 and 6.77 Gb of raw data were generated for leaves, stem, and root samples of D. candenatensis, respectively (Table 3).

Table 3.

RNA sequencing statistics.

Sample Sequencing platform Raw data Clean data
Total number of reads (bp) Total number of bases (bp) Total number of reads(bp) Total number of bases(bp)
RNA leaf Illumina 27,502,891 8,250,867,300 27,011,558 8,103,467,400
RNA stem Illumina 21,168,213 6,350,463,900 20,545,630 6,163,689,000
RNA root Illumina 22,555,024 6,766,507,200 21,943,263 6,582,978,900

Genome survey

The Illumina short reads were filtered for the adapter, duplicated and low-quality reads using fastp v0.20.022 with default parameters. To estimate the genome size, heterozygosity and repeat content of D. candenatensis, k-mer analysis was performed. The 17-bp k-mers with quality-filtered Illumina short reads were counted using Jellyfish v2.2.723 (Fig. 2a). Based on the counts of k-mers, the genome size of D. candenatensis was estimated to be ~521.35 Mb, with a heterozygosity of 0.09% and repeat content of 51.56% using GenomeScope v.2.024.

Fig. 2.

Fig. 2

K-mer frequency distribution curve (a) and the genome-wide interaction heathap of the Dalbergia candenatensis genome based on Hi-C data (b).

De novo genome assembly

The PacBio HiFi reads were de novo assembled by using HiFiasm v0.16.1-r37525 with default parameters. The short DNA reads were aligned to the draft assembled genome by BWA v0.7.1726. Subsequently sambamba v1.027 marked the repetitive sequences, and high-quality reads were filtered by samtools. Then polishing the genome assembly was conducted using Pilon28 with the parameter (-fix all) for two rounds. To further improve the quality and integrity of the genome, based on the Hi-C data obtained by sequencing, the assembled contigs were scaffolded to the near-chromosome level using AllHiC algorithm29, then manually corrected according to the strength of chromosome interactions using juicebox v2.13.0730 software. Finally, a genome at the chromosome level was obtained.

The total length of the D. candenatensis genome assembly was 474.55 Mb, which is smaller than genome size estimated by k-mer analysis (Table 1). The contig and scaffold N50 values of the genome assembly were 44.1 and 48.1 Mb, respectively. A total of 471.70 Mb (99.4%) of the sequences were successfully anchored to the ten distinct chromosomes (Table 4). The Hi-C interaction map exhibited a pronounced intrachromosomal interaction signal along the diagonal line (Fig. 2b).

Table 4.

Summary of the ten pseudochromosomes.

ID No. of contigs Length (bp) GC content (%)
Chr01 1 64026954 34.86
Chr02 2 57993263 34.71
Chr03 1 57167204 35.27
Chr04 1 51200502 35.15
Chr05 2 48107494 35.19
Chr06 1 48084155 35.74
Chr07 1 44149702 35.64
Chr08 2 43323290 36.03
Chr09 1 40293102 35.73
Chr10 1 40220275 34.95
unplaced 39 2983011 51.46

Identification of repetitive elements

High proportion of repetitive sequences in the genome will have a great impact on the accuracy of genome prediction. Therefore, it is necessary to screen the repetitive sequences before gene structure prediction. RepeatModeler v2.0.331 was performed first to identify repetitive sequences based on a de novo prediction method, which are as a custom library for annotating repeats using RepeatMasker. Then non-redundant repeats were extracted from Repbase32 and Dfam33 databases and added to the custom library. RepeatMasker v4.1.234 was used to make predictions for repetitive sequences based on homology searches. DeepTE pipelines35 were employed to classify the repeated sequences. A total of approximately 283.46 Mb of the D. candenatensis genome was identified as repetitive elements, accounting for 59.74% of the total genome size, among which 275.33 Mb (58.02% of the genome) were annotated as transposable elements (TEs), with LTR (33.14%) being the most abundant TE superfamilies (Table 5).

Table 5.

Summary of the repetitive sequences in Dalbergia candenatensis genome assembly.

Repeat type Number of elements Length (bp) Percentage of sequence
Retrotransposons 283979 168837505 33.93%
 SINEs 2709 373980 0.08%
 LINEs 25780 3582728 0.72%
  CRE/SLACS 41 1752 0.00%
  L2/CR1/Rex 3657 141421 0.03%
  R1/LOA/Jockey 1789 250739 0.05%
  R2/R4/NeSL 883 36528 0.01%
  RTE/Bov-B 6303 1204789 0.24%
  L1/CIN4 8505 1914749 0.38%
 LTR elements 255490 164880797 33.14%
  BEL/Pao 1336 76910 0.02%
  Ty1/Copia 77181 32608319 6.55%
  Gypsy/DIRS1 157524 127678771 25.66%
   Retroviral 9956 389277 0.08%
DNA transpsons 374095 78110883 15.70%
  hobo-Activator 130231 30335359 6.10%
  Tc1-IS630-Pogo 57647 11834351 2.38%
  En-Spm 35736 11288382 2.27%
  MULE-MuDR 25539 6257480 1.26%
  PiggyBac 344 16178 0.00%
  Tourist/Harbinger 58767 8976768 1.80%
  Other 1173 44235 0.01%
Rolling-circles 9547 1821628 0.37%
unclassified 167619 41720191 8.39%
Total interspersed repeats 288668579 58.02%
Small RNA 9 568 0.00%
Satellites 6946 839701 0.17%
Simple repeats 101763 5946573 1.20%
Low complexity 33510 1721665 0.35%

Gene prediction and functional annotation

The protein-coding genes in repeat-masked genome of D. candenatensis were identified by a combination of methods including ab initio, homologue-based and RNA-seq-based predictions. For ab initio predictions, we employed Augustus v3.4.036, GeneID v1.437, Snap v2006-07-2838, GlimmerHMM v3.0.439, GeneMark-ES v4.71_lic40 to predict de novo gene models. For homologue-based predictions, GeMoMa v1.941 was used to align the homologous genes from Arabidopsis thaliana42, Oryza sativa43 and D. odorifera16. In addition, adapters, duplicates, and low-quality reads from the RNA sequences of leaves, stems, and roots were filtered using fastp with default parameters, followed by assembly with Trinity v2.13.244, and then PASA v2.4.045 was performed to predict gene model for RNA-seq-based prediction. Subsequently, the results above the three methods were integrated by EVidenceModeler (EVM) v2.1.046 to generate a final non-redundant gene model set. Finally, a total of 29,554 protein-coding genes were identified from repeat-masked genome of D. candenatensis (Table 6). The average lengths of genes, coding sequences and exon sequences were 3,331.22 bp, 1,127.55 bp and 225.53 bp, respectively.

Table 6.

Summary of predicted protein-coding genes in Dalbergia candenatensis genome assembly.

Methods Gene set Gene number Average length of gene (bp) Average number of exon Average length of CDS (bp) Average length of exon (bp) Average length of intron (bp)
Ab initio annotation Augustus 35556 3138.22 4.75 1037.06 218.34 558.80
GeneID 163926 183.00 1.00 183.00 183.00 360.21
SNAP 34384 1163.89 2.83 598.75 211.63 421.52
GlimmerHMM 29606 1989.91 3.53 804.04 227.72 413.82
GeneMark-ES 28740 4085.25 5.66 1141.64 201.87 408.39
Homologous annotation Arabidopsis thaliana 21312 4001.91 5.82 1298.78 223.03 560.08
Oryza sativa 19162 4148.68 6.01 1330.40 221.21 429.28
Dalbergia odorifera 23052 3581.94 5.17 1231.38 238.30 560.32
Transcriptome annotation PASA 179262 3092.54 1.78 757.27 424.50 614.52
EVM 29554 3331.22 5.00 1127.55 225.53 557.68

Functional annotation of protein-coding genes was performed by comparing with public databases: non-redundant protein database (NCBI-NR, https://www.ncbi.nlm.nih.gov/), EggNOG47 (http://eggnog5.embl.de/), Gene Ontology48 (GO, http://geneontology.org/), Kyoto Encyclopedia of Genes and Genomes49 (KEGG, https://www.kegg.jp/), Uniprot50 (https://www.uniprot.org/), using Diamond v2.0.9.14751. In total, 25,871 (87.54%) protein-coding genes of the D. candenatensis genome were successfully annotated in functional databases (Table 1).

Genome-wide synteny analysis

We identified the syntenic blocks within the D. candenatensis genome, as well as between this genome and other published Dalbergia genomes, using the python version MCScan implemented in JCVI v1.2.752, with default parameters. Intra-genomic syntenic blocks were visualized using TBtools53 (Fig. 3), while inter-genomic syntenic blocks were visualized using JCVI with the parameter -minspan = 30 (Fig. 4).

Fig. 3.

Fig. 3

Genomic characteristics of Dalbergia candenatensis. The tracks from outer to inner circle represent the ten chromosomes (Chr01-Chr10), GC content, gene position, gene density, and syntenic gene blocks within the genome indicated by connecting lines.

Fig. 4.

Fig. 4

Genome-wide synteny among five genome assemblies in the Dalbergia genus. Conserved syntenic blocks were denoted by lines of different colors, each corresponding one of the ten chromosomes.

Data Records

The raw sequence data have been deposited in the Sequence Read Archive (SRA) at National Center for Biotechnology Information (NCBI) with accession number SRP51307754, including PacBio HiFi reads, Illumina PE150 reads, Hi-C reads, and RNA-seq data from different tissues. The final assembled chromosome-scale genome has been deposited in the NCBI GenBank under accession number JBHFQC00000000055. In addition, the genome assembly and annotation files were deposited in the Figshare database56.

Technical Validation

By using ~83.91 × PacBio HiFi reads and 149.99 × Hi-C reads, the chromosome-scale genome of D. candenatensis was assembled. The assembly was in length of 474.55 Mb with scaffold N50 of 48.1 Mb. The quality of genome assembly was evaluated through following ways. First, inter-genomic syntenic analyses were conducted between D. candenatensis and four other Dalbergia species to confirm the overall genome structure. Next, to access the integrity of the genome, gene content of the embryophyte odb10 dataset were searched against the assembled genome using Bench-marking Universal Single-copy orthologs (BUSCO) v5.5.057. Additionally, LTR_retriever v2.9.058 was used to calculate LTR Assembly Index (LAI) values using LTR-RTs to assess the assembly continuity. Furthermore, we mapped the PacBio long reads, Illumina short reads, and RNA short reads back to the assembled genome using minimap2 v2.2859, BWA v0.7.1726, and HISAT2 v2.2.160, respectively, to calculate the mapping rates.

The inter-genomic syntenic analyses revealed high conservation among D. candenatensis, D. cultrata, D. cochinchinensis, D. sissoo and D. odorifera (Fig. 4), suggesting that the gross genome structure of D. candenatensis has been accurately assembled. The complete BUSCO score was 98.4%, of which 95.2% were single-copy genes (Table 7), suggesting a high degree of completeness of the assembly. The LAI value was 21, which reached the “gold standard” (LAI value > 20) of genome assembly proposed by Ou et al.58. The alignment results showed that 99.92% of PacBio HiFi long reads, 99.76% of Illumina short reads, and an average of 91.6% of RNA reads were successfully mapped to the assembled genome (Table 8). These results indicate a high quality of the genome assembly of D. candenatensis.

Table 7.

BUSCO assessment result.

Type Number Percentage
Complete BUSCOs 2290 98.4%
Complete and single-copy BUSCOs 2215 95.2%
Complete and duplicated BUSCOs 75 3.2%
Fragmented BUSCOs 9 0.4%
Missing BUSCOs 27 1.2%
Total BUSCO groups searched 2326

Table 8.

Statistical summary of mapping rates to the genome assembly.

Read_type mapping rate(%)
RNA leaf 86.77
RNA stem 97.46
RNA root 90.56
HiFi reads 99.92
illumina reads 99.76
Hi-C reads 99.76

Acknowledgements

This work was supported by the National Natural Science Foundation of China (32070222, 32170232, 32271613) and by the Guangdong Provincial Special Fund for Natural Resource Affairs on Ecology and Forestry Construction (GDZZDC20228704).

Author contributions

T.T. and S.L. conceived and designed the study. H.H., Y.Z., M.S., X.W., S.G. and Z.Z. prepared the materials and analyzed the data. M.S. and Y.Z. prepared the results and wrote the manuscript. T.T. edited and improved the manuscript. All authors read and approved the final manuscript.

Code availability

The software utilized in this study were executed in strict adherence to the official guidelines of published bioinformatics programs. Anything not mentioned in Methods was run with default settings. No custom code was used.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Miaomiao Shi, Yu Zhang.

Contributor Information

Shijin Li, Email: lisj@scbg.ac.cn.

Zhongtao Zhao, Email: zhzht621@scbg.ac.cn.

Tieyao Tu, Email: tutieyao@scbg.ac.cn.

References

  • 1.Tomlinson, P. B. The botany of mangrove. (Cambridge University Press, 2016).
  • 2.Lyu, H., He, Z., Wu, C. I. & Shi, S. Convergent adaptive evolution in marginal environments: unloading transposable elements as a common strategy among mangrove genomes. New Phytol.217, 428–438 (2018). [DOI] [PubMed] [Google Scholar]
  • 3.Feng, X. et al. Genomic insights into molecular adaptation to intertidal environments in the mangrove Aegiceras corniculatum. New Phytol.231, 2346–2358 (2021). [DOI] [PubMed] [Google Scholar]
  • 4.Wang, Y. & Gu, J. Ecological responses, adaptation and mechanisms of mangrove wetland ecosystem to global climate change and anthropogenic activities Int. Biodeterior. Biodegrad.162, 105248 (2021). [Google Scholar]
  • 5.FAO. The world’s mangroves 2000–2020. (2023).
  • 6.Duke, N. C. et al. A world without mangroves. Science317, 41–42 (2007). [DOI] [PubMed] [Google Scholar]
  • 7.Ma, D. et al. Chromosome-level assembly of the mangrove plant Aegiceras corniculatum genome generated through Illumina, PacBio and Hi-C sequencing technologies. Mol. Ecol. Resour.21, 1593–1607 (2021). [DOI] [PubMed] [Google Scholar]
  • 8.Klitgård, B. B. & Lavin, M. in Legumes of the world (eds Lewis, G., Schrire, B., Mackinder, B. & Lock, M.) 307-335 (Royal Botanical Garden, Kew, 2005).
  • 9.Li, S. Dalbergia in Asia. (Science Press, 2017).
  • 10.Qin, M. et al. Comparative analysis of complete plastid genome reveals powerful barcode regions for identifying wood of Dalbergia odorifera and D. tonkinensis (Leguminosae). J. Syst. Evol.60, 73–84 (2022). [Google Scholar]
  • 11.Lavin, M. et al. The Dalbergioid legumes (Fabaceae): delimitation of a pantropical monophyletic clade. Am. J. Bot.88, 503–533 (2001). [PubMed] [Google Scholar]
  • 12.Yang, J. et al. Chromosome-scale genomes of five Hongmu species in Leguminosae. Sci. Data10, 710 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sprent, J. I. Legume nodulation: a global perspective. (Wiley-Blackwell, 2009).
  • 14.Huang, H. Genomic insights into adaptation to mangrove habitat in Dalbergia candenatensis Master thesis, University of Chinese Academy of Sciences, (2023).
  • 15.Hunga, T. H. et al. Range-wide differential adaptation and genomic offset in critically endangered Asian rosewoods. Proc. Natl. Acad. Sci. USA120, e2301603120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hong, Z. et al. The chromosome-level draft genome of Dalbergia odorifera. GigaScience9, giaa084 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sahu, S. K. et al. Chromosome-scale genome of Indian rosewood (Dalbergia sissoo). Front Plant Sci.14, 1218515 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Anisuzzman, M., Hasan, M. M., Acharzo, A. K., Das, A. K. & Rahman, S. In vivo and in vitro evaluation of pharmacological potentials of secondary bioactive metabolites of Dalbergia candenatensis leaves. Evid. Based Complementary Altern. Med.2017, 5034827 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hamburger, M. O., Cordell, G. A., Tantivatana, P. & Ruangrungsi, N. Traditional medicinal plants of Thailand, VIII. Isoflavonoids of Dalbergia candenatensis. J. Nat. Prod.50, 696–699 (1987). [DOI] [PubMed] [Google Scholar]
  • 20.Cheenpracha, S., Karalai, C., Ponglimanont, C. & Kanjana-Opas, A. Candenatenins A-F, phenolic compounds from the heartwood of Dalbergia candenatensis. J. Nat. Prod.72, 1395–1398 (2009). [DOI] [PubMed] [Google Scholar]
  • 21.Sahu, S. K., Thangaraj, M. & Kathiresan, K. DNA extraction protocol for plants with high levels of secondary metabolites and polysaccharides without using liquid nitrogen and phenol. ISRN Mol. Biol.2012, 205049 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Chen, S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta2, e107 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics27, 764–770 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics33, 2202–2204 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods18, 170–175 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Goto, S., Tsuda, Y., Koike, Y., Chunlan, L. & Ide, Y. Effects of landscape and demographic history on genetic variation in Picea glehnii at the regional scale. Ecol. Res.24, 1267–1277 (2009). [Google Scholar]
  • 27.Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics31, 2032–2034 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One9, e112963 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zhang, X., Zhang, S., Zhao, Q., Ming, R. & Tang, H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants5, 833–845 (2019). [DOI] [PubMed] [Google Scholar]
  • 30.Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst.3, 95–98 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA117, 9451–9457 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA6, 11 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Storer, J., Hubley, R., Rosen, J., Wheeler, T. J. & Smit, A. F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob. DNA12, 2 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics Chapter 4, 4.10.11–14.10.14 (2009). [DOI] [PubMed] [Google Scholar]
  • 35.Yan, H., Bombarely, A. & Li, S. DeepTE: a computational method for de novo classification of transposons with convolutional neural network. Bioinformatics36, 4269–4275 (2020). [DOI] [PubMed] [Google Scholar]
  • 36.Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinform.7, 62 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Guigó, R., Knudsen, S., Drake, N. & Smith, T. Prediction of gene structure. J. Mol. Biol.226, 141–157 (1992). [DOI] [PubMed] [Google Scholar]
  • 38.Korf, I. Gene finding in novel genomes. BMC Bioinform.5, 59 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics20, 2878–2879 (2004). [DOI] [PubMed] [Google Scholar]
  • 40.Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y. O. & Borodovsky, M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res.33, 6494–6506 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Keilwagen, J., Hartung, F., Paulini, M., Twardziok, S. O. & Grau, J. Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi. BMC Bioinform.19, 189 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.The Arabidopsis Genome Initiative. Analysis of the genome sequence of the fowering plant Arabidopsis thaliana. Nature408, 796–815 (2000). [DOI] [PubMed] [Google Scholar]
  • 43.Goff, S. A. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science296, 92–100 (2002). [DOI] [PubMed]
  • 44.Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol.29, 644–652 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res.31, 5654–5666 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol.9, R7 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res47, D309–d314 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet.25, 25–29 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res.40, D109–d114 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Coudert, E. et al. Annotation of biologically relevant ligands in UniProtKB using ChEBI. Bioinformatics39, btac793 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods12, 59–60 (2015). [DOI] [PubMed] [Google Scholar]
  • 52.Tang, H. et al. Synteny and collinearity in plant genomes. Science320, 486–488 (2008). [DOI] [PubMed] [Google Scholar]
  • 53.Chen, C. et al. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol. Plant13, 1194–1202 (2020). [DOI] [PubMed] [Google Scholar]
  • 54.NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP513077 (2024).
  • 55.Shi, M., Zhang, Y., Huang, H. & Tu, T. Dalbergia candenatensis isolate MS-2024a, whole genome shotgun sequencing project. GenBankhttps://identifiers.org/ncbi/insdc:JBHFQC000000000 (2024).
  • 56.Shi, M. et al. Chromosome-scale genome assembly of the mangrove climber species Dalbergia candenatensis. Figshare10.6084/m9.figshare.26170126 (2024). [DOI] [PubMed]
  • 57.Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol.38, 4647–4654 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Ou, S. & Jiang, N. LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol.176, 1410–1422 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Li, H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics34, 3094–3100 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Kim, D., Langmead, B. & Salzberg, S. L. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods12, 357–360 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP513077 (2024).
  2. Shi, M., Zhang, Y., Huang, H. & Tu, T. Dalbergia candenatensis isolate MS-2024a, whole genome shotgun sequencing project. GenBankhttps://identifiers.org/ncbi/insdc:JBHFQC000000000 (2024).
  3. Shi, M. et al. Chromosome-scale genome assembly of the mangrove climber species Dalbergia candenatensis. Figshare10.6084/m9.figshare.26170126 (2024). [DOI] [PubMed]

Data Availability Statement

The software utilized in this study were executed in strict adherence to the official guidelines of published bioinformatics programs. Anything not mentioned in Methods was run with default settings. No custom code was used.


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES