A high-quality chromosome-level genome assembly of Antiaris toxicaria

Weicheng Huang; Jiaxin Xiang; Yamei Ding; Wanzhen Liu; Ni Fang; Yongmei Xiong; Seping Dai; Hui Yu

doi:10.1186/s12863-025-01309-2

. 2025 Mar 24;26:21. doi: 10.1186/s12863-025-01309-2

A high-quality chromosome-level genome assembly of Antiaris toxicaria

Weicheng Huang ¹, Jiaxin Xiang ², Yamei Ding ^1,³, Wanzhen Liu ^1,³, Ni Fang ¹, Yongmei Xiong ⁴, Seping Dai ^4,^✉, Hui Yu ^1,^3,^5,^✉

PMCID: PMC11934813 PMID: 40128643

Abstract

Objectives

Antiaris toxicaria is a tall tree belonging to the Moraceae family, known for its medicinal value. Its latex contains various cardiac glycosides, which hold significant research and potential application value. However, the lack of genomic resources for A. toxicaria currently hinders molecular genetic studies on its medicinal components. For its effective conservation and elucidation of the distinctive genetic traits of and medical components, we present its chromosome-level genome assembly.

Data description

Here, we assembled two haplotypes of A. toxicaria, including a 671.73-Mb HapA subgenome containing 27,213 genes and a 666.41-Mb HapB subgenome containing 28,840 genes. Their contig N50 sizes were 90.18 and 90.29 Mb, respectively. The transposable elements represented 61.15% and 64.13% of the total assembled genome in HapA and HapB subgenome, respectively. A total of 27,213 and 28,840 genes were predicted in the two haplotypes. Hopefully, this chromosome-level genome of A. toxicaria will provide a valuable resource to enhance understanding of the biosynthesis of medicinal compounds.

Keywords: Antiaris toxicaria, Genome assembly, HiFi, ONT ultralong, Hi-C, Transcriptome

Objective

Antiaris is a genus in the Moraceae, all species are large trees with tall and straight trunk and plank-like roots. There are approximately seven species and three varieties worldwide, with only one species, Antiaris toxicaria, found in China, distributed in Guangdong, Hainan, Guangxi, and southern Yunnan. A. toxicaria is well known for its ornamental value, ecological function and cultural importance. Previous studies on the secondary metabolites of this species have indicated that its latex [1, 2] and seeds [3] are rich in cardiac glycosides. The sap of the leaves and branches of A. toxicaria contains highly toxic substances [4], with the main toxic components being cardiac glycosides such as α-antiarin [5], antioside [6, 7], and convallatoxin [8, 9], which have effects including enhancing heart function, inducing vomiting and diarrhea, and possessing anesthetic properties [10]. Additionally, HPLC screening of A. toxicaria extracts revealed the presence of gallic acid, catechins, chlorogenic acid, caffeic acid, ellagic acid, epigallocatechin, rutin, isoquercitrin, quercitrin, quercetin and kaempferol [11]. Therefore, A. toxicaria holds significant research and commercial value. However, our understanding of the biosynthesis and regulatory mechanisms of secondary metabolites in A. toxicaria is limited, and further research is needed on the candidate genes and transcription factors involved in cardiac glycoside biosynthesis pathways.

In this study, we successfully assembled the A. toxicaria chromosome-level genome using high-fidelity (HiFi) reads and high-throughput chromosome conformation capture (Hi-C) sequencing technologies. This study reports the high-quality genome of A. toxicaria. We believe that this research will provide important resources for studying the biosynthetic mechanisms of this species.

Data description

A. toxicaria samples were obtained from the South China Botanical Garden (23.18°N, 113.36°E), Guangzhou, China. Fresh leaves of A. toxicaria were collected for PacBio HiFi, ONT ultralong, and Hi-C sequencing. A PCR-free SMRTBell library was constructed using high-quality purified long reading DNA for PacBio HiFi sequencing. The ONT PromethION sequencer was used to generate ONT ultralong reads. Hi-C libraries were constructed and sequenced using BGI platform. Stems, leaves, and seeds of A. toxicaria were frozen in liquid nitrogen and stored at − 80 °C for transcriptome analyses. All Illumina sequencing data were filtered to obtain clean data using the fastp v0.23.1 [12] for subsequent analysis. A total of 128.34 Gb (~ 191.06 × coverage) paired-end Illumina reads (Table 1; Data set 1), 32.7 Gb (~ 48.68 × coverage) PacBio HiFi long reads (Table 1; Data set 2), 16.71 Gb ONT Ultra-long reads (~ 24.87 × coverage) (Table 1; Data set 3), and Hi-C reads (~ 126.14×coverage) (Table 1; Data set 4) were generated for the genome survey, and assembly (Table 1; Data file 1).

Table 1.

Overview of data files/data sets

Label	Name of data file/data set	File types (file extension)	Data repository and identifier (DOI or accession number)
Data file 1	Summary of library sequencing data	Word file (.docx)	Figshare, 10.6084/m9.figshare.28328342 [13]
Data file 2	K-mer-based estimation of genome characters	Image file (.jpg)	Figshare, 10.6084/m9.figshare.28328369 [14]
Data file 3	Hi-C interactive heatmap	Image file (.jpg)	Figshare, 10.6084/m9.figshare.28328372 [15]
Data file 4	Statistics of genome assembly and annotation	Word file (.docx)	Figshare, 10.6084/m9.figshare.28328378 [16]
Data file 5	Summary of gene functional annotation	Word file (.docx)	Figshare, 10.6084/m9.figshare.28328387 [17]
Data file 6	Gene function on Haplotype A	TXT file (.txt )	Figshare, 10.6084/m9.figshare.28328396 [18]
Data file 7	Gene function on Haplotype B	TXT file (.txt)	Figshare, 10.6084/m9.figshare.28328405 [19]
Data file 8	Statistical results of the repetitive sequences	Word file (.docx)	Figshare, 10.6084/m9.figshare.28328408 [20]
Data file 9	Summary of noncoding RNA genes	Word file (.docx)	Figshare, 10.6084/m9.figshare.28328414 [21]
Data set 1	Illumina survey data of A. toxicaria	Fastq file (.fastq)	NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR32205349
Data set 2	PacBio HiFi reads of A. toxicaria	Bam file (.bam)	NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR32203223
Data set 3	ONT Ultra-long reads of A. toxicaria	Fastq file (.fastq)	NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR32203292
Data set 4	Hi-C reads of A. toxicaria	Fastq file (.fastq)	NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR32205131
Data set 5	Genome assembly data for HapA	Fasta file (.fasta)	Figshare, 10.6084/m9.figshare.28328498 [22]
Data set 6	Genome assembly data for HapB	Fasta file (.fasta)	Figshare, 10.6084/m9.figshare.28328528 [23]
Data set 7	Transcriptome data of Antiaris toxicaria	Fastq file (.fastq)	NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR32202871
Data set 8	Gene prediction on HapA	GFF3 file (.gff3 )	Figshare, 10.6084/m9.figshare.28328429 [24]
Data set 9	Gene prediction on HapB	GFF3 file (.gff3 )	Figshare, 10.6084/m9.figshare.28328432 [25]
Data set 10	Transposable elements annotation on HapA	GFF3 file (.gff3 )	Figshare, 10.6084/m9.figshare.28328444 [26]
Data set 11	Transposable elements annotation on HapB	GFF3 file (.gff3 )	Figshare, 10.6084/m9.figshare.28328450 [27]
Data set 12	Noncoding RNA prediction on HapA	GFF3 file (.gff3 )	Figshare, 10.6084/m9.figshare.28328456 [28]
Data set 13	Noncoding RNA prediction on HapB	GFF3 file (.gff3 )	Figshare, 10.6084/m9.figshare.28328459 [29]

Open in a new tab

Before genome assembly, we used the GCE (Genomic Charactor Estimator) v 1.0.2 [30] to assess the genome size based on Illumina short reads. The genome size of A. toxicaria was estimated to be approximately 729.84 Mb based on the assessment results when using kmer length of 17 bp, showing a high degree of repeat content (70.62%) and heterozygosity (0.57%) (Table 1; Data file 2).

The PacBio HiFi, ONT Ultra-long, and Hi-C data were assembled using Hifiasm [31] with the default parameters. Then, the Hi-C data was aligned to the HapA and HapB subgenomes, respectively, and classified as valid or invalid interaction pairs using the Juicer pipeline [32] and YaHS v1.2 [33]. Meanwhile, misassembled contigs were detected, corrected manually and oriented to chromosomes through Juicebox v1.11.08 [32]. The corrected ONT and PacBio HiFi reads were used to replace the gap region using TGS-GapCloser v1.2.1 [34], and then obtained the haplotype-resolved gap-free genome of A. toxicaria. Finally, the A. toxicaria genome was ultimately phased into two haplotypes, comprising a total of 26 pseudochromosomes, with HapA spanning approximately 671.73 Mb and featuring a contig N50 of 90.18 Mb (Table 1; Data files 3–4; Data set 5). Similarly, HapB spans around 666.41 Mb with a contig N50 of 90.29 Mb (Table 1; Data files 3–4; Data set 6). Moreover, the GC content of HapA was 35.65%, while that of HapB was 35.61% (Table 1; Data file 4).

The genome completeness was assessed by searching the gene content of the embryophyta_odb10 database (1,614 expected genes from the embryophyta) with BUSCO v4.1.2 [35], showed that, the proportions of complete BUSCOs (including single-copy and multi-copy) of these two haplotypes were 98.5% and 98.6%, respectively (Table 1; Data file 4). The quality of repetitive genomic regions was assessed using the LAI v3.2 program [36], which exhibited LAI values of 16.4 (HapA) and 14.72 (HapB) (Table 1; Data file 4). Then the per-base consensus accuracy (QV) was estimated with Merqury v1.365 [37] using PacBio HiFi long reads, resulting in QV values of 47.13 and 47.1 (Table 1; Data file 4). Short-reads and long-reads were mapped to the genome with BWA v0.7.13-r1126 [38] and Minimap2 v2.21 [39], and we found that the genome coverage of sequencing data exceeded 99% (Table 1; Data file 4).

Protein-coding genes was predicted using homology-based, transcriptome-based, and ab initio prediction methods. First, we used homologies as protein-based evidence for predicting gene sets using GeneWise v2.4.1 [40]. Transcriptome data were mapped using HISAT2 v2.1.0 [41] (Table 1; Data set 7). ab initio prediction using packages AUGUSTUS v3.4.0 [42], trained by the transcriptome data. To generate a comprehensive protein-coding gene set, we used the GETA v2.6.1 (Genome-wide Electronic Tool for Annotation) pipeline (https://github.com/chenlianfu/geta) to integrate annotations from all homology-based, transcriptome-based, and ab initio predictions. Then Functional annotation of the protein-coding genes was carried out by blast searches against databases, including the NCBI nr [43], Swiss-Port [44], KOG [45], eggNOG [46], Pfam [47], GO [48], and KEGG [49]. In total, we obtained 27,213 and 28,840 protein-coding genes of the HapA and HapB subgenomes, respectively (Table 1; Data file 4; Data sets 8–9). Moreover, 26,906 (98.87%) genes of the HapA subgenome and 26,360 (98.8%) genes of the HapB subgenome were supported by multiple functional databases (Table 1; Data files 5–7).

To identify Transposable elements (TEs), we used the pipeline of Extensive de-novo TE Annotator (EDTA) v2.1.0 [50], which combines both structural-based and homology-based predictions. For noncoding RNA prediction, the tRNA genes were predicted using tRNAscan-SE v2.0.6 [51]. Others, including miRNA, rRNA and snRNA genes, were detected by comparison with the Rfam database [52] using CMsearch v1.1.3 [53] with the default parameters. A total of 427.39 Mb of TEs were identified, accounting for 64.13% of the HapB subgenome, which was higher than the HapA subgenome (Table 1; Data file 8; Data sets 10–11). In addition, the long terminal repeat retrotransposons (LTRs) were the predominant repeats covering 55.63% (370.77 Mb) of the HapB subgenome, and the Copia and Gypsy-type LTRs were the largest LTR subfamilies, accounting for 15.89% (105.89 Mb) and 39.10% (260.58 Mb), respectively (Table 1; Data file 8; Data sets 10–11). Moreover, 456 tRNAs and 111 miRNAs were identified in the A. toxicaria subgenome (Table 1; Data file 9; Data sets 12–13). 1,637 and 1,182 rRNAs were identified in the HapA and HapB subgenomes, respectively (Table 1; Data file 9; Data sets 12–13).

Limitations

Genome and transcriptome data are available in this study, but there is a lack of proteome and metabolome data from different tissues, as well as multi-omics correlation analysis.

Acknowledgements

We thank the reviewers for their time, expertise, and helpful suggestions to improve our manuscript.

Abbreviations

HiFi: High fidelity
ONT: Oxford Nanopore Technology
Hi-C: High-throughput chromosome conformation capture
HapA: Haplotype A
HapB: Haplotype B
BUSCO: Benchmarking Universal Single-Copy Orthologs
LAI: LTR Assembly Index
QV: Consensus quality
GO: Gene Ontology
KEGG: Kyoto Encyclopedia of Genes and Genomes
KOG: Eukaryotic Orthologous Groups
Nr: Non-redundant
LTR: Long-Terminal Repeat
TE: Transposon Eleme
NCBI: National Center for Biotechnology Information

Authors’ contributions

HY supervised the project. WCH, YMD, WZL and JXX conducted data analysis. WCH, NF collected samples. YMX and SPD edited the manuscript. All authors contributed to writing the manuscript.

Funding

This work was supported by National Key R & D Program of China (2023YFE0107400), Guangzhou Ecological Landscape Technology Collaborative Innovation Center (202206010058), Science and Technology Projects in Guangzhou (E33309), and the Guangdong Flagship Project of Basic and Applied Basic Research (2023B0303050001).

Data availability

The raw sequencing data for HiFi, Hi-C, RNA-seq, and ONT Ultra-long reads were submitted to NCBI Sequence Read Archive database under BioProject accession PRJNA1218215. The chromosomal-level genome assembly file were deposited in the Figshare database with DOIs 10.6084/m9.figshare.28328498 [22] and 10.6084/m9.figshare.28328528 [23]. Moreover, the gene structure, gene function, TE and non-coding RNA annatition files also have been deposited at the Figshare database [24–29].

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Seping Dai, Email: gzifla_dsp@gz.gov.cn.

Hui Yu, Email: yuhui@scib.ac.cn.

References

1.Liu Q, Tang JS, Hu MJ, Liu J, Chen HF, Gao H, et al. Antiproliferative cardiac glycosides from the latex of Antiaris toxicaria. J Nat Prod. 2013;76(9):1771–80. 10.1021/np4005147. [DOI] [PubMed] [Google Scholar]
2.Que DM, Gan YJ, Zeng YB, Dai HF, et al. Cytotoxic cardenolides from the latex of Antiaris toxicaria. J Trop Subtrop Bot. 2010;18:440–4. [Google Scholar]
3.Zuo WJ, DongWH, Zhao YX, Chen HQ, Mei WL, Dai HF. Two new strophanthidol cardenolides from the the seeds of Antiaris toxicaria. Phytochem Lett. 2013;6(1):1–4. 10.1016/j.phytol.2012.10.001. [Google Scholar]
4.Carter CA, Forney RW, Gray EA, Gehring AM, Schneider TL, Young DB, et al. Toxicarioside A. A new cardenolide isolated from Antiaris toxicaria latex-derived dart poison. Assignment of the tH- and tsC-NMR shifts for an antiarigenin aglycone. Tetrahedron. 1997;53(40):13557–66. 10.1016/S0040-4020(97)00895-8. [Google Scholar]
5.Tian DM, Qiao J, Bao YZ, Liu J, Zhang XK, Sun XL, et al. Design and synthesis of biotinylated cardiac glycosides for probing Nur77 protein inducting pathway. Bioorg Med Chem Lett. 2019;29(5):707–12. 10.1016/j.bmcl.2019.01.015. [DOI] [PubMed] [Google Scholar]
6.Kopp B, Bauer WP, Bernkop-Schnurch A. Analysis of some Malaysian dart poisons. J Ethnopharmacol. 1992;36(1):57–62. 10.1016/0378-8741(92)90061-u. [DOI] [PubMed] [Google Scholar]
7.Agrawal P, Akhade M, Laddha K, Narkhede S, Mirgal A, Salunke C. Quantification of Convallatoxin in Antiaris toxicaria leuschseeds by RP-HPLC. Anal Chem Lett. 2014;4(3):172–7. 10.1080/22297928.2014.925821. [Google Scholar]
8.Yang SY, Kim NH, Cho YS, Lee H, Kwon HJ. Convallatoxin, a dual inducer of autophagy and apoptosis, inhibits angiogenesis in vitro and in vivo. PLoS ONE. 2014;9(3):e91094. 10.1371/journal.pone.0091094. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Shi LS, Liao YR, Su MJ, Lee AS, Kuo PC, Damu AG, et al. Cardiac glycosides from Antiaris toxicaria with potent cardiotonic activity. J Nat Prod. 2010;73(7):1214–22. 10.1021/np9005212. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Mei WL, Gan YJ, Dai HF. Advances in studies on chemical constituents of Antiaris toxicaria and their Pharmacological activities. Tradit Chin Herb Drugs. 2008;39:151–4. [Google Scholar]
11.Subiono T, Tavip MA. Qualitative and quantitative phytochemicals of leaves, bark and roots of Antiaris toxicaria lesch., a promising natural medicinal plant and source of pesticides. Plant Sci Today. 2023;10(1):5–10. 10.14719/pst.1896. [Google Scholar]
12.Chen S, Zhou Y, Chen Y, Gu J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–90. 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Huang WC. Data files 1: Summary of library sequencing data. Figshare. 2025. 10.6084/m9.figshare.28328342.
14.Huang WC. Data files 2: K-mer-based Estimation of genome characters. Figshare. 2025. 10.6084/m9.figshare.28328369
15.Huang WC. Data files 3: Hi-C interactive heatmap. Figshare. 2025. 10.6084/m9.figshare.28328372
16.Huang WC. Data files 4: Statistics of genome assembly and annotation. Figshare. 2025. 10.6084/m9.figshare.28328378.
17.Huang WC. Data files 5: Summary of gene functional annotation. Figshare. 2025. 10.6084/m9.figshare.28328387.
18.Huang WC. Data files 6: gene function on haplotype A. Figshare. 2025. 10.6084/m9.figshare.28328396
19.Huang WC. Data files 7: gene function on haplotype B. Figshare. 2025. 10.6084/m9.figshare.28328405
20.Huang WC. Data files 8: Statistical results of the repetitive sequences. Figshare. 2025. 10.6084/m9.figshare.28328408
21.Huang WC. Data files 9: Summary of noncoding RNA genes. Figshare. 2025. 10.6084/m9.figshare.28328414.
22.Huang WC. Data set 5: Genome assembly data for Haplotype A. Figshare. 2025. 10.6084/m9.figshare.28328444.
23.Huang WC. Data set 6: Genome assembly data for Haplotype B. Figshare. 2025. 10.6084/m9.figshare.28328444
24.Huang WC. Data set 8: gene prediction on haplotype A. Figshare. 2025. 10.6084/m9.figshare.28328498
25.Huang WC. Data set 9: gene prediction on haplotype B. Figshare. 2025. 10.6084/m9.figshare.28328528
26.Huang WC. Data set 10: transposable elements annotation on haplotype A. Figshare. 2025. 10.6084/m9.figshare.28328444
27.Huang WC. Data set 11: transposable elements annotation on haplotype B. Figshare. 2025. 10.6084/m9.figshare.28328450
28.Huang WC. Data set 12: noncoding RNA prediction on haplotype A. Figshare. 2025. 10.6084/m9.figshare.28328456
29.Huang WC. Data set 13: noncoding RNA prediction on haplotype B. Figshare. 2025. 10.6084/m9.figshare.28328459
30.Liu B, Shi Y, Yuan J, Hu X, Zhang H, Li N et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. arXiv preprint arXiv:1308.2012. 10.48550/arXiv.1308.2012
31.Feng X, Cheng H, Portik D, Li H. Metagenome assembly of high-fidelity long reads with hifiasm-meta. Nat Methods. 2022;19(6):671–4. 10.1038/s41592-022-01478-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Durand NC, Shamim MS, Machol I, Rao SS, Huntley MH, Lander ES, Aiden EL. Juicer provides a One-Click system for analyzing Loop-Resolution Hi-C experiments. Cell Syst. 2016;3(1):95–8. 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Zhou C, McCarthy SA, Durbin R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics. 2023;39(1):btac808. 10.1093/bioinformatics/btac808. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Xu M, Guo L, Gu S, Wang O, Zhang R, Peters BA, et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience. 2020;9(9):giaa094. 10.1093/gigascience/giaa094. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2. 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
36.Ou S, Chen J, Jiang N. Assessing genome assembly quality using the LTR assembly index (LAI). Nucleic Acids Res. 2018;46(32):e126. 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Rhie A, Walenz BP, Koren S, Phillippy AM, Merqury. Reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21(1):245. 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589–95. 10.1093/bioinformatics/btp698. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100. 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Birney E, Clamp M, Durbin R. GeneWise and genomewise. Genome Res. 2004;14(5):988–95. 10.1101/gr.1865504. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Kim D, Langmead B, Salzberg SL. HISAT: A fast spliced aligner with low memory requirements. Nat Methods. 2015;12(4):357–60. 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS: Ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34:W435–9. 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, et al. Database resources of the National center for biotechnology information in 2023. Nucleic Acids Res. 2023;51(D1):D29–38. 10.1093/nar/gkac1032. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Bairoch A, Apweiler R. The SWISS-PROT protein sequence data bank and its supplement trembl in 1999. Nucleic Acids Res. 1999;27(1):49–54. 10.1093/nar/27.1.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. 10.1186/1471-2105-4-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Hernandez-Plaza A, Szklarczyk D, Botas J, Cantalapiedra CP, Giner-Lamia J, Mende DR, et al. EggNOG 6.0: enabling comparative genomics across 12 535 organisms. Nucleic Acids Res. 2023;51(D1):D389–94. 10.1093/nar/gkac1022. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Mistry J, et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 2021;49(D1):D412–9. 10.1093/nar/gkaa913. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Ashburner M, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer EL, et al. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet. 2000;25:25–9. 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Kanehis M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Su W, Ou S, Hufford MB, Peterson T. A tutorial of EDTA: extensive de Novo TE annotator. Methods Mol Biol. 2021;2250:55–67. 10.1007/978-1-0716-1134-0_4. [DOI] [PubMed] [Google Scholar]
51.Chan PP, Lin BY, Mak AJ, Lowe TM. TRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 2021;49(16):9077–96. 10.1093/nar/gkab688. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, et al. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 2015;43(D1):D130–7. 10.1093/nar/gku1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Cui X, Lu Z, Wang S, Jing-Yan Wang J, Gao X, CMsearch. Simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction. Bioinformatics. 2016;32(12):i332–40. 10.1093/bioinformatics/btw271. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Liu Q, Tang JS, Hu MJ, Liu J, Chen HF, Gao H, et al. Antiproliferative cardiac glycosides from the latex of Antiaris toxicaria. J Nat Prod. 2013;76(9):1771–80. 10.1021/np4005147. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Que DM, Gan YJ, Zeng YB, Dai HF, et al. Cytotoxic cardenolides from the latex of Antiaris toxicaria. J Trop Subtrop Bot. 2010;18:440–4. [Google Scholar]

[CR3] 3.Zuo WJ, DongWH, Zhao YX, Chen HQ, Mei WL, Dai HF. Two new strophanthidol cardenolides from the the seeds of Antiaris toxicaria. Phytochem Lett. 2013;6(1):1–4. 10.1016/j.phytol.2012.10.001. [Google Scholar]

[CR4] 4.Carter CA, Forney RW, Gray EA, Gehring AM, Schneider TL, Young DB, et al. Toxicarioside A. A new cardenolide isolated from Antiaris toxicaria latex-derived dart poison. Assignment of the tH- and tsC-NMR shifts for an antiarigenin aglycone. Tetrahedron. 1997;53(40):13557–66. 10.1016/S0040-4020(97)00895-8. [Google Scholar]

[CR5] 5.Tian DM, Qiao J, Bao YZ, Liu J, Zhang XK, Sun XL, et al. Design and synthesis of biotinylated cardiac glycosides for probing Nur77 protein inducting pathway. Bioorg Med Chem Lett. 2019;29(5):707–12. 10.1016/j.bmcl.2019.01.015. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Kopp B, Bauer WP, Bernkop-Schnurch A. Analysis of some Malaysian dart poisons. J Ethnopharmacol. 1992;36(1):57–62. 10.1016/0378-8741(92)90061-u. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Agrawal P, Akhade M, Laddha K, Narkhede S, Mirgal A, Salunke C. Quantification of Convallatoxin in Antiaris toxicaria leuschseeds by RP-HPLC. Anal Chem Lett. 2014;4(3):172–7. 10.1080/22297928.2014.925821. [Google Scholar]

[CR8] 8.Yang SY, Kim NH, Cho YS, Lee H, Kwon HJ. Convallatoxin, a dual inducer of autophagy and apoptosis, inhibits angiogenesis in vitro and in vivo. PLoS ONE. 2014;9(3):e91094. 10.1371/journal.pone.0091094. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Shi LS, Liao YR, Su MJ, Lee AS, Kuo PC, Damu AG, et al. Cardiac glycosides from Antiaris toxicaria with potent cardiotonic activity. J Nat Prod. 2010;73(7):1214–22. 10.1021/np9005212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Mei WL, Gan YJ, Dai HF. Advances in studies on chemical constituents of Antiaris toxicaria and their Pharmacological activities. Tradit Chin Herb Drugs. 2008;39:151–4. [Google Scholar]

[CR11] 11.Subiono T, Tavip MA. Qualitative and quantitative phytochemicals of leaves, bark and roots of Antiaris toxicaria lesch., a promising natural medicinal plant and source of pesticides. Plant Sci Today. 2023;10(1):5–10. 10.14719/pst.1896. [Google Scholar]

[CR12] 12.Chen S, Zhou Y, Chen Y, Gu J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–90. 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Huang WC. Data files 1: Summary of library sequencing data. Figshare. 2025. 10.6084/m9.figshare.28328342.

[CR14] 14.Huang WC. Data files 2: K-mer-based Estimation of genome characters. Figshare. 2025. 10.6084/m9.figshare.28328369

[CR15] 15.Huang WC. Data files 3: Hi-C interactive heatmap. Figshare. 2025. 10.6084/m9.figshare.28328372

[CR16] 16.Huang WC. Data files 4: Statistics of genome assembly and annotation. Figshare. 2025. 10.6084/m9.figshare.28328378.

[CR17] 17.Huang WC. Data files 5: Summary of gene functional annotation. Figshare. 2025. 10.6084/m9.figshare.28328387.

[CR18] 18.Huang WC. Data files 6: gene function on haplotype A. Figshare. 2025. 10.6084/m9.figshare.28328396

[CR19] 19.Huang WC. Data files 7: gene function on haplotype B. Figshare. 2025. 10.6084/m9.figshare.28328405

[CR20] 20.Huang WC. Data files 8: Statistical results of the repetitive sequences. Figshare. 2025. 10.6084/m9.figshare.28328408

[CR21] 21.Huang WC. Data files 9: Summary of noncoding RNA genes. Figshare. 2025. 10.6084/m9.figshare.28328414.

[CR22] 22.Huang WC. Data set 5: Genome assembly data for Haplotype A. Figshare. 2025. 10.6084/m9.figshare.28328444.

[CR23] 23.Huang WC. Data set 6: Genome assembly data for Haplotype B. Figshare. 2025. 10.6084/m9.figshare.28328444

[CR24] 24.Huang WC. Data set 8: gene prediction on haplotype A. Figshare. 2025. 10.6084/m9.figshare.28328498

[CR25] 25.Huang WC. Data set 9: gene prediction on haplotype B. Figshare. 2025. 10.6084/m9.figshare.28328528

[CR26] 26.Huang WC. Data set 10: transposable elements annotation on haplotype A. Figshare. 2025. 10.6084/m9.figshare.28328444

[CR27] 27.Huang WC. Data set 11: transposable elements annotation on haplotype B. Figshare. 2025. 10.6084/m9.figshare.28328450

[CR28] 28.Huang WC. Data set 12: noncoding RNA prediction on haplotype A. Figshare. 2025. 10.6084/m9.figshare.28328456

[CR29] 29.Huang WC. Data set 13: noncoding RNA prediction on haplotype B. Figshare. 2025. 10.6084/m9.figshare.28328459

[CR30] 30.Liu B, Shi Y, Yuan J, Hu X, Zhang H, Li N et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. arXiv preprint arXiv:1308.2012. 10.48550/arXiv.1308.2012

[CR31] 31.Feng X, Cheng H, Portik D, Li H. Metagenome assembly of high-fidelity long reads with hifiasm-meta. Nat Methods. 2022;19(6):671–4. 10.1038/s41592-022-01478-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Durand NC, Shamim MS, Machol I, Rao SS, Huntley MH, Lander ES, Aiden EL. Juicer provides a One-Click system for analyzing Loop-Resolution Hi-C experiments. Cell Syst. 2016;3(1):95–8. 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Zhou C, McCarthy SA, Durbin R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics. 2023;39(1):btac808. 10.1093/bioinformatics/btac808. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Xu M, Guo L, Gu S, Wang O, Zhang R, Peters BA, et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience. 2020;9(9):giaa094. 10.1093/gigascience/giaa094. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2. 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Ou S, Chen J, Jiang N. Assessing genome assembly quality using the LTR assembly index (LAI). Nucleic Acids Res. 2018;46(32):e126. 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Rhie A, Walenz BP, Koren S, Phillippy AM, Merqury. Reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21(1):245. 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589–95. 10.1093/bioinformatics/btp698. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100. 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Birney E, Clamp M, Durbin R. GeneWise and genomewise. Genome Res. 2004;14(5):988–95. 10.1101/gr.1865504. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Kim D, Langmead B, Salzberg SL. HISAT: A fast spliced aligner with low memory requirements. Nat Methods. 2015;12(4):357–60. 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS: Ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34:W435–9. 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, et al. Database resources of the National center for biotechnology information in 2023. Nucleic Acids Res. 2023;51(D1):D29–38. 10.1093/nar/gkac1032. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Bairoch A, Apweiler R. The SWISS-PROT protein sequence data bank and its supplement trembl in 1999. Nucleic Acids Res. 1999;27(1):49–54. 10.1093/nar/27.1.49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. 10.1186/1471-2105-4-41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Hernandez-Plaza A, Szklarczyk D, Botas J, Cantalapiedra CP, Giner-Lamia J, Mende DR, et al. EggNOG 6.0: enabling comparative genomics across 12 535 organisms. Nucleic Acids Res. 2023;51(D1):D389–94. 10.1093/nar/gkac1022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Mistry J, et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 2021;49(D1):D412–9. 10.1093/nar/gkaa913. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Ashburner M, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer EL, et al. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet. 2000;25:25–9. 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Kanehis M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Su W, Ou S, Hufford MB, Peterson T. A tutorial of EDTA: extensive de Novo TE annotator. Methods Mol Biol. 2021;2250:55–67. 10.1007/978-1-0716-1134-0_4. [DOI] [PubMed] [Google Scholar]

[CR51] 51.Chan PP, Lin BY, Mak AJ, Lowe TM. TRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 2021;49(16):9077–96. 10.1093/nar/gkab688. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, et al. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 2015;43(D1):D130–7. 10.1093/nar/gku1063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Cui X, Lu Z, Wang S, Jing-Yan Wang J, Gao X, CMsearch. Simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction. Bioinformatics. 2016;32(12):i332–40. 10.1093/bioinformatics/btw271. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A high-quality chromosome-level genome assembly of Antiaris toxicaria

Weicheng Huang

Jiaxin Xiang

Yamei Ding

Wanzhen Liu

Ni Fang

Yongmei Xiong

Seping Dai

Hui Yu

Abstract

Objectives

Data description

Objective

Data description

Table 1.

Limitations

Acknowledgements

Abbreviations

Authors’ contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A high-quality chromosome-level genome assembly of Antiaris toxicaria

Weicheng Huang

Jiaxin Xiang

Yamei Ding

Wanzhen Liu

Ni Fang

Yongmei Xiong

Seping Dai

Hui Yu

Abstract

Objectives

Data description

Objective

Data description

Table 1.

Limitations

Acknowledgements

Abbreviations

Authors’ contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases