Haplotype-resolved genome assembly of Coriaria nepalensis a non-legume nitrogen-fixing shrub

Shi-Wei Zhao; Jing-Fang Guo; Lei Kong; Shuai Nie; Xue-Mei Yan; Tian-Le Shi; Xue-Chan Tian; Hai-Yao Ma; Yu-Tao Bao; Zhi-Chao Li; Zhao-Yang Chen; Ren-Gang Zhang; Yong-Peng Ma; Yousry A El-Kassaby; Ilga Porth; Wei Zhao; Jian-Feng Mao

doi:10.1038/s41597-023-02171-6

. 2023 May 9;10:259. doi: 10.1038/s41597-023-02171-6

Haplotype-resolved genome assembly of Coriaria nepalensis a non-legume nitrogen-fixing shrub

Shi-Wei Zhao ^1,^#, Jing-Fang Guo ^1,^#, Lei Kong ^1,^#, Shuai Nie ¹, Xue-Mei Yan ¹, Tian-Le Shi ¹, Xue-Chan Tian ¹, Hai-Yao Ma ¹, Yu-Tao Bao ¹, Zhi-Chao Li ¹, Zhao-Yang Chen ¹, Ren-Gang Zhang ², Yong-Peng Ma ², Yousry A El-Kassaby ³, Ilga Porth ⁴, Wei Zhao ^5,^✉, Jian-Feng Mao ^1,^6,^✉

PMCID: PMC10167230 PMID: 37156769

Abstract

Coriaria nepalensis Wall. (Coriariaceae) is a nitrogen-fixing shrub which forms root nodules with the actinomycete Frankia. Oils and extracts of C. nepalensis have been reported to be bacteriostatic and insecticidal, and C. nepalensis bark provides a valuable tannin resource. Here, by combining PacBio HiFi sequencing and Hi-C scaffolding techniques, we generated a haplotype-resolved chromosome-scale genome assembly for C. nepalensis. This genome assembly is approximately 620 Mb in size with a contig N50 of 11 Mb, with 99.9% of the total assembled sequences anchored to 40 pseudochromosomes. We predicted 60,862 protein-coding genes of which 99.5% were annotated from databases. We further identified 939 tRNAs, 7,297 rRNAs, and 982 ncRNAs. The chromosome-scale genome of C. nepalensis is expected to be a significant resource for understanding the genetic basis of root nodulation with Frankia, toxicity, and tannin biosynthesis.

Subject terms: Plant sciences, Computational biology and bioinformatics

Background & Summary

Coriaria nepalensis Wall. (2n = 40)¹, also known as Masuri Berry, is a shrub belonging to the genus Coriaria of the unigeneric Coriariaceae family, and is mainly distributed in the Himalayan region. C. nepalensis is a non-legume nitrogen-fixing plant that forms root nodules with the actinomycete Frankia^2,3. The biological ability of nitrogen-fixation in this species contributes to its rehabilitation capacity of nutrient-poor degraded land^4,5; in combination with its osmotic adjustment function and drought tolerance^6,7, C. nepalensis improves the abiotic conditions and provides more suitable habitat for associated plant species^8–10. Furthermore, essential oils and extracts from C. nepalensis could be used as promising drugs due to their antimicrobial^11,12 and anti-convulsant activities¹³. Traditionally, C. nepalensis has been used in folk medicine to treat ailments such as toothaches and traumatic injuries^13,14. The toxic and antibacterial properties of C. nepalensis provide an interesting opportunity for the development of a potent new and environmentally friendly pesticide for pest management¹⁵. Moreover, C. nepalensis bark offers an important source of hydrolysable tannin^16,17, an ideal treatment for tanning leather¹⁶.

The phylogenetic position of Coriariaceae is still debated¹⁸. Previous analyses based on plastid rbcL gene sequences^19–21, and the complete chloroplast genome¹⁴ placed Coriariaceae close to families in Cucurbitales. However, the nuclear genome has not yet been sequenced in Coriariaceae, although the genome assemblies of related taxa, such as in Datiscaceae²² and Begoniaceae²³, have been published.

Molecular genetic investigation of non-legume nitrogen-fixation and root nodulation from Frankia requires a high-quality genome assembly and functional annotation of the host plant. Additionally, such genomic resources may also be crucial to advance the phylogenetics of the unigeneric Coriariaceae family and the efficient exploration of C. nepalensis’ valued natural products.

Here, we report a 620 Mb haplotype-resolved chromosome-scale assembly of C. nepalensis using a combination of high-quality PacBio HiFi (High Fidelity) long reads, Illumina reads, and Hi-C sequencing. The genome was assembled with contig N50 length of 11 Mb and 40 haplotype-resolved pseudochromosomes. We predicted 60,862 protein-coding genes, of which 99.5% were functionally annotated. Furtherore, 939 tRNAs, 7,297 rRNAs, and 982 ncRNAs were annotated. The provided genomic resources will be helpful for future functional studies in C. nepalensis.

Methods

Sample collection, library construction, and genome size estimation

Leave tissue samples for both genome and RNA sequencing were harvested in 2020 from a mature C. nepalensis individual growing in Kunming Botanical Garden which was transplanted from Songming county, Kunming, Yunnan province, China. Sampled leaves were immediately flash-frozen in liquid nitrogen and stored at −80 °C until further use. High-quality genomic DNA was extracted from leaf tissue using the DNeasy Plant Mini Kit (QIAGEN, Inc.) and purified using the Mobio PowerClean Pro DNA Clean-Up Kit (MO BIO Laboratories, Inc.). DNA integrity was assessed using Agilent 4200 Bioanalyzer. Messenger RNA (mRNA), whose sequence information was later utilized in protein-coding gene structure prediction, was isolated from leaves using the NEBNext Poly(A) mRNA Magnetic Isolation Module, and RNA quality was determined with the Agilent 2100 BioAnalyzer.

We combined PacBio HiFi long reads sequencing, Illumina sequencing, and Hi-C scaffolding for C. nepalensis genome assembly. Genomic DNA fragments were prepared using g-Tubes and purified using AMPure PB beads for library construction and subsequent SMRT cell PacBio HiFi long reads sequencing. Fragment molecules were screened on BluePippin system. The library sequencing was performed on PacBio Sequel II platform, and ccs (https://github.com/PacificBiosciences/ccs) v6.2.0 was used to generate PacBio HiFi data. We obtained ~14.5 Gb (~40×) of HiFi sequencing data with an average length of 19 kb and N50 of 21 kb (Fig. 1a). As for Illumina sequencing, 150 bp paired-end PCR-free libraries were prepared and sequenced on Illumina HiSeq X Ten platform, and ~70 Gb (~200×) of Illumina raw data were obtained. We followed a standard procedure for Hi-C library preparation²⁴. In brief, leaf tissues were fixed with formaldehyde and the cross-linked DNA was digested with MboI restriction enzyme. Digested fragments were then biotinylated at 5′ overhangs and joined to form chimeric junctions. After biotin-containing fragments were enriched and sheared, we constructed paired-end sequencing libraries. The Hi-C libraries were sequenced using the Illumina HiSeq X Ten platform and ~67 Gb of Hi-C raw data were obtained. RNA sequencing was performed on Illumina HiSeq X Ten platform after we constructed one sequencing library using the NEBNext Ultra RNA Library Prep Kit, and ~7.5 Gb (50 Mb reads) of raw data were acquired. Then, fastp²⁵ software was used for quality control to remove adapters and low-quality and too short Illumina reads (<60 bp). All clean reads were used for further genome assembly and gene predictions.

Fig. 1 — Length and quality of PacBio HiFi reads and genome size survey. (a) Reads length and mean Phred score distribution of PacBio HiFi reads. (b) 19-mers frequency distribution estimated from PacBio HiFi sequences: observed K-mer (raw K-mer) frequencies (in grey), fitted K-mer frequencies (in blue) with skew normal distribution model, and overall fitting (in red) that concatenated observed and fitted K-mer frequencies.

Genomic characteristics including genome size, repeat content, and heterozygous rate were estimated based on K-mer frequencies. Through K-mer analysis (K = 19) of PacBio HiFi data with Jellyfish²⁶ v2.3.0, an overall C. nepalensis haplotype genome size of 313.1 Mb was estimated using findGSE v1.94.R²⁷ (Fig. 1b).

De novo genome assembly

De novo assembly involved three steps: primary assembly, Hi-C scaffolding, and polishing (Fig. 2). With PacBio HiFi reads and Hi-C reads as inputs, we used hifiasm²⁸ v0.16.1 to assemble the genome into contigs and obtained a haplotype-resolved assembly with two haplotypes for subsequent analysis. Further, the Hi-C reads that were mapped to the assembly using Juicer²⁹ v1.6. 3D-DNA³⁰ (-m haploid -i 150000 -r 0--editor-repeat-coverage 5) were then used for preliminary Hi-C assisted chromosome assembly, and Juicebox³¹ (version 201008) was used to manually adjust the chromosome segmentation boundary and any wrong assembly, including switch error. Afterwards, we used 3D-DNA to re-scaffold each chromosome separately and used Juicebox to manually correct any visible error. We used TGS-GapCloser³² v1.0.1 (--min_match 1000 –minmap_arg ‘ -x asm20’) to fill the gaps (24 gaps were filled) with HiFi reads and performed three rounds of polishing using NextPolish³³ v1.4.0 based on Illumina reads, and removed redundant sequences identified by Redundans³⁴ v0.13c. Finally, a haplotype-resolved chromosomal level assembly with a total length of 620 Mb was obtained (Table 1). We obtained 40 pseudochromosomes, consistent with the chromosome number reported in a previous karyotype study¹. We named the chromosomes according to the descendent order of their lengths. Furthermore, as we were describing a haplotype-resolved genome assembly without parental information for subgenome phasing, we arbitrarily denoted the longer one from each pair of homologous chromosomes as haplotype genome “a” (with character “a” in the terminal of the chromosome IDs), while the other chromosome as haplotype genome “b” (with character “b”).

Table 1.

Statistics of the haplotype-resolved genome assembly of C. nepalensis.

Features	Statistics
Sequencing
Raw bases of WGS-PacBio HiFi (Gb)	~14.5
Raw bases of WGS-Illumina (Gb)	~70
Raw bases of Hi-C (Gb)	~67
Raw bases of RNA-seq (Gb)	~7.5
Assembly
Genome size (Mb)	620.52
Number of pseudochromosomes	40
Chloroplast genome assembly (bp)	158,558
Mitochondria genome assembly (bp)	480,951
N50 of contigs (Mb)	10.97
L50 of contig	22
N50 of scaffolds (Mb)	12.9
L50 of scaffolds	11
Number of gaps	62
GC content (%)	34.78
Complete BUSCOs	1,338 (93.0%)
Annotation
Number of protein-coding gene	60,862
Complete BUSCOs	1,440 (97.2%)
Average length of protein-coding gene (bp)	2,892.7
Average length of CDS (bp)	1,324
Average number of exons per transcript	6.3
Number of tRNA	939
Number of rRNA	7,297
Number of unclassified ncRNA	982

Open in a new tab

Chromosomes chr01-chr03 assemblies were significantly longer than the remaining chromosomes. The assembly of these three pairs of chromosomes was also difficult, showing Hi-C chromatin contact profiles distinct from others (Fig. 3a,b). These three pairs of chromosomes have a large number of gaps (in total 60) in the current assembly, while the other chromosomes had a total of only 2 gaps. Previous karyotype analysis¹ showed that C. nepalensis had three pairs of long chromosomes with extended heterochromatin regions, which is concurrent with the three long chromosomes revealed in the present study. A high number (679,177) of tandem array repeats with the consensus sequence “ATCATTTGCAAGTTATGCACAAAAGTTGTGTCTGTAGTGCAAAACTAGAATTCGTTCGACTTGCTTTGAAATAAGTTATTGACTTGAAATGACTCATTGAAATGATTTTAAGGTTAAACGAATGCACACTTTCCTTGCAATG” was identified on the three long chromosomes chr01-chr03 (Fig. 3c) using TRF³⁵ v4.09.1. We found the “TTTAGGG” characterized telomeric sequence in most chromosomes (Fig. 3c), indicating the high quality of our genome assembly.

Fig. 3 — Hi-C density heatmaps, genomic features and evolutionary history of *C. nepalensis*. (a) Hi-C chromatin contact density heatmap with a low threshold parameter (minimal mapping quality = 0). (b) Hi-C chromatin contact density heatmap with a high threshold parameter (minimal mapping quality = 1). (c) Distribution of genomic features of *C. nepalensis*. I: sequencing depth distribution of PacBio HiFi reads. II–IV: The density of *Copia* LTR-RTs, *Gypsy* LTR-TRs and Mutator TE. V, VI: Distribution of tandem array and telomere sequence. **VII,** **VIII**: Density of protein-coding gene and GC content. (d) Phylogenetic tree. (e) Ks dot plots of *C. nepalensis* haplotype genome “a” and *C. sativus*.

In addition, a 158,558 bp chloroplast (Pt) genome and a 480,951 bp mitochondrial (Mt) genome were assembled based on short- and long- reads gained from genome sequencing using GetOrganelle³⁶ v1.7.5.0 (Table 1).

Repeat annotation

We performed de novo transposable element (TE) annotation using EDTA³⁷ v1.9.3 (--sensitive 1 –anno 1) which integrates homology-based and structure-based approaches for TE identification (Fig. 2). A TE library was generated and used for further repeat annotation with RepeatMasker (http://www.repeatmasker.org/RepeatMasker/) (-no_is -xsmall). The output repeat soft-masked genome sequence file was used for gene prediction. A total of 428 Mb (69.0%) of the assembly was annotated as TE (Table 2), of which 61 Mb (9.9%) were long terminal repeat (LTR) retrotransposons. Mutator transposons with 280 Mb (45.2%) in total length showed the highest genome occupation, and also a distribution similar to the high occupation tandem array mentioned above (Fig. 3c). Our further analysis revealed that the sequence motif of these tandem arrays is included inside the Mutator transposons.

Table 2.

Statistics of repeat annotation of the C. nepalensis genome.

Superfamily	Number	Length (bp)	Percent (%)
Class I	102,847	63,608,643	10.25
LTR/Copia	46,213	29,513,519	4.76
LTR/Gypsy	12,994	7,928,273	1.28
LTR/unknown	39,113	24,243,803	3.91
nonLTR/pararetrovirus	730	347,201	0.06
nonLTR/LINE	3,797	1,575,847	0.25
Class II	675,085	314,548,320	50.69
TIR/hAT	17,702	5,852,661	0.94
TIR/CACTA	23,327	7,829,412	1.26
TIR/PIF-Harbinger	17,220	4,264,302	0.69
TIR/Mutator	563,911	280,243,005	45.2
TIR/Tc1_Mariner	9,638	3,344,642	0.54
Helitron	43,287	13,014,298	2.10
Other TEs	432,103	50,184,386	8.09
Total TEs	1,210,035	428,341,349	69.03

Open in a new tab

Protein-coding genes prediction and other annotations

We collected 139,950 non-redundant protein sequences of the closely related species Datisca glomerata²², Begonia fuchsioides²², Cucumis sativus³⁸, Vitis vinifera³⁹, Prunus persica⁴⁰, and Arabidopsis thaliana⁴¹ as evidence for protein homology (Fig. 2). Three strategies were used to assemble RNA-seq reads into transcripts which were further used as transcriptional evidence for gene annotation. For transcripts assembly, (1) de novo assembly was performed using Trinity⁴² v2.13.2; (2) genome-guided assembly was performed using Trinity after reads were mapped to the genome assembly using HISAT2⁴³ v2.2.1; and (3) another genome-guided assembly was prepared using StringTie⁴⁴ v2.2.0 with reads mapping using HISAT2. We combined all these three sets of transcripts and obtained 77,555 transcript sequences after removing the redundant sequences with CD-HIT⁴⁵ v4.8.1. Gene structure was annotated using the PASA⁴⁶ v2.5.0 pipeline based on transcriptional evidence. Then, full-length gene sequences were identified by evidence of protein homologies. Based on the full-length gene set, a gene model used for ab initio gene structure prediction was trained and optimized using AUGUSTUS⁴⁷ 3.4.0.

Furthermore, the MAKER2⁴⁸ pipeline was used to predict the putative protein-coding gene structure. We performed ab initio predictions of gene structures using AUGUSTUS 3.4.0. The transcript evidence and homologous protein evidence were aligned with the genome by BLAST+⁴⁹ v2.11.0 and optimized by exonerate⁵⁰ 2.4.0. AUGUSTUS was used to integrate gene models from the above-mentioned gene prediction. To further improve the annotation accuracy, EVidenceModeler⁵¹ (EVM) v1.1.1 and PASA were used to integrate and update the gene prediction results. We annotated a final set of 60,862 protein-coding genes (Table 1), among which 30,622 genes were predicted for the haplotype subgenome with a longer set of chromosomes (haplotype genome “a”), and 30,240 genes for the haplotype subgenome “b”. We identified 26,489 putative gene families among C. nepalensis (haplotype genome “a”), Aquilegia coerulea⁵², Vitis vinifera³⁹, Averrhoa carambola⁵³, Populus trichocarpa⁵⁴, Tripterygium wilfordii⁵⁵, Malus domestica⁵⁶, Datisca glomerata²², Begonia fuchsioides²³, Benincasa hispida⁵⁷, Cucumis sativus³⁸, and Quercus acutissima⁵⁸, with OrthoFinder⁵⁹ v2.5.2. (Fig. 3d). Then, 1,199 orthogroups, with a minimum of 83.3% of the species having single-copy genes in any orthogroup, were used to infer the species tree with STAG⁶⁰, and the phylogenetic location of C. nepalensis was confirmed. Ks (synonymous substitutions) dot plots of haplotype genome “a” vs genome “a” and genome “a” vs C. sativus were generated with WGDI⁶¹ v0.62 (Fig. 3e), and one recent unique WGD (whole genome duplication) was revealed and was distinct from that found in C. sativus.

BUSCO⁶² was used for evaluating the completeness of the gene set. Out of 1,440 conserved genes, 1,400 (97.2%) were annotated, among which 1,365 (96.9%) were complete and duplicated BUSCO genes.

Three strategies were used for functional annotation of protein-coding genes (Fig. 2, Table 3): (1) we mapped gene sequences against eggNOG⁶³ 5.0 database using eggNOG-mapper⁶⁴ v2.1.6 (--target_taxa Viridiplantae) and annotated 98.1% of the genes, of which 55.7 and 49.4% were annotated with GO and KEGG items, respectively; (2) based on the principle of sequence similarity, we annotated 98.5% genes using DIAMOND⁶⁵ v2.0.12 (--evalue 1e-5) against the following four protein databases: Swiss_Prot⁶⁶ (78.2%), TrEMBL⁶⁶ (98.4%), NR⁶⁷ (98.3%), and Arabidopsis thaliana genes⁴¹ (94.1%); (3) we annotated 99.1% of the genes against 14 databases using InterProScan⁶⁸ v5.52–86.0 (Table 3).

Table 3.

Statistics of protein-coding gene functional annotation.

Method	Database	Number	Percent (%)
eggNOG-mapper	eggNOG	59,758	98.09
	GO	33,948	55.73
	KEGG_KO	30,082	49.38
	KEGG_Pathway	18,595	30.52
	EC	12,864	21.12
	eggNOG	56,164	92.19
	COG	59,758	98.09
DIAMOND		59,981	98.46
	Swiss_Prot	47,641	78.20
	TrEMBL	59,933	98.38
	NR	59,896	98.32
	A.thaliana	57,323	94.10
InterProScan		60,396	99.14
	Pfam	51,101	83.88
	CDD	21,628	35.50
	SUPERFAMILY	40,074	65.78
	Interpro	53,945	88.55
	PANTHER	59,108	97.03
	Gene3D	42,846	70.33
	PIRSF	4,336	7.12
	PRINTS	8,886	14.59
	Coils	10,214	16.77
	TIGRFAM	6,982	11.46
	MobiDBLite	26,460	43.43
	TMHMM	14,313	23.49
	Phobius	20,510	33.67
	SMART	19,932	32.72
Total		60,637	99.54

Open in a new tab

As for non-coding RNA (ncRNA) gene prediction (Fig. 2), we identified 939 tRNAs using tRNAScan-SE⁶⁹ v2.0.8, 7,297 rRNAs using Barrnap v0.9 (https://github.com/tseemann/barrnap) (--kingdom euk), and 982 other ncRNA using Rfam^70,71 16.6.

We predicted the genes in the two organelle genomes using OGAP (https://github.com/zhangrengang/OGAP). A total of 131 genes (89 protein-coding genes, 8 rRNAs, and 34 tRNAs) were annotated for the chloroplast genome, and 63 (42 protein-coding genes, 3 rRNAs, and 18 tRNAs) for the mitochondria genome.

Genome comparison between haplotype assemblies

The minimap2⁷² v2.24 was used to perform alignments between haplotype assemblies, and SyRI⁷³ v1.6 to identify syntenic regions and structural variations (e.g., duplications, inversions, and translocations). Plotsr⁷⁴ v0.5.4 was used for the visualization of the identified structural rearrangements (Fig. 4a). Chr01-chr03 pairs showed remarkable structural variation, while the syntenies of the other homologous chromosome pairs were mostly conserved in high collinearity with only few rearrangements. Syntenic regions were larger than the various types of structural variations (Fig. 4b). Sequence differences (local variation, e.g., SNPs, indels) on syntenic regions were identified (Fig. 4c). Highly diverged regions of long fragments were uneven among chromosome pairs, but the number of sequence differences were minor. Large fragments of collinearity between unpaired chromosomes were also detected (Fig. 4a).

Fig. 4 — Structural variation and statistics between two haplotype genome assemblies of *C. nepalensis*. (a) Structural variation between haplotype genomes. Subgenome “a” (chr01a-chr20a) is used as the reference sequence and subgenome “b” (chr01b-chr20b) is the query. (b) Size distributions of different types of structural variation between two haplotype assemblies. (c) Numbers and lengths of sequence differences on the syntenic region for each chromosome pair.

Data Records

The raw data from PacBio HiFi, Illumina, and Hi-C sequencing were submitted to the SRA database (SRR22412655⁷⁵, SRR22026041⁷⁶, SRR22026042⁷⁷, SRR22026043⁷⁸). The haplotype-resolved genome assembly was deposited at Genbank with accession numbers GCA_027190085.1⁷⁹ and GCA_027186245.1⁸⁰. The genome assembly and gene annotation results of C. nepalensis were deposited in the figshare⁸¹ database.

Technical Validation

We mapped DNA and RNA sequencing reads to the final genome assembly for evaluation of the assembly quality (Fig. 2). A high read mapping rate of 99.2% was obtained when PacBio HiFi reads were mapped onto the genome using minimap2, and sequencing depth was counted and illustrated in the circos plot in Fig. 3c. We mapped the Illumina reads to the final assembly using BWA⁸² v0.7.17 and obtained a 98.7% reads mapping rate, and a low SNP heterozygosity level of ~0.0027% was obtained after SNPs were identified with SAMtools⁸³ v1.13. Furthermore, a single base error rate of ~0.0011% was acquired, and a read mapping rate of 96.2% was obtained when RNA-seq reads were mapped onto the final genome assembly using HISAT2. Since genome coverage by sequencing data was relatively high, our genome assembly has high completeness and continuity.

We performed further genome assembly quality control with Merqury⁸⁴ analysis (under K = 19) (Fig. 5, Table 4) based on PacBio HiFi reads. QVs (consensus quality values) for the individual haplotype genomes “a”, “b”, and shared for both “a” and “b” genomesare 46.39, 45.86, and 46.12, respectively. K-mer completeness scores for individual genomes “a”, “b”, and shared for both “a” and “b” genomes are 94.12, 93.68, and 98.87%, respectively. Again, our presented haplotype-resolved genome assembly was confirmed the good quality in completeness.

Table 4.

Statistics of Merqury analysis for genome quality assessment.

Assembly	QV (quality value)	Error rate	Completeness (%)
Genome “a”	46.39	2.30e-05	94.12
Genome “b”	45.86	2.60e-05	93.68
Genome both “a” and “b”	46.12	2.44e-05	98.87

Open in a new tab

We further performed BUSCO assessments for the assembly (Table 1), whereit was revealed that complete core genes (including single and multiple copies) accounted for 93.0%, while the missing gene rate accounted for only 4.9%, underscoring the good gene integrity of the assembly.

Acknowledgements

This research was supported by the National Natural Science Foundation of China (32171816) and the National Key R&D Program of China (2022YFD2200103).

Author contributions

Jian-Feng Mao and Wei Zhao conceived and designed the study; Yong-Peng Ma collected the samples; Shi-Wei Zhao, Jing-Fang Guo, Lei Kong, Shuai Nie, Xue-Mei Yan, Tian-Le Shi, Xue-Chan Tian, Hai-Yao Ma, Yu-Tao Bao, Zhi-Chao Li, Zhao-Yang Chen, Ren-Gang Zhang performed bioinformatics; Shi-Wei Zhao drafted the manuscript; Jian-Feng Mao, Yousry A. El-Kassaby and Ilga Porth revised the manuscript. Shi-Wei Zhao, Jing-Fang Guo and Lei Kong contributed equally to this work.

Funding

Open access funding provided by Umea University.

Code availability

All data processing commands and pipelines were carried out in accordance with the instructions and guidelines provided by the relevant bioinformatic software.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Shi-Wei Zhao, Jing-Fang Guo, Lei Kong

Contributor Information

Wei Zhao, Email: zhao.wei@umu.se.

Jian-Feng Mao, Email: jianfeng.mao@umu.se.

References

1.Oginuma K, Nakata M, Suzuki M, Tobe H. Karyomorphology of Coriaria (Coriariaceae): Taxonomic implications. The Botanical Magazine Tokyo. 1991;104:297–308. doi: 10.1007/BF02488383. [DOI] [Google Scholar]
2.Montserrat P. Root nodules of Coriaria. Nature. 1958;182:475–475. doi: 10.1038/182475a0. [DOI] [Google Scholar]
3.Hu C, Zhou P, Zhou Q, Chen H, Akkermans ADL. Nodulation and molecular characterization of pure cultures isolated from root nodules of Coriaria nepalensis. Chinese Science Bulletin. 1998;43:695–698. doi: 10.1007/BF02883580. [DOI] [Google Scholar]
4.Awasthi P, Bargali K, Bargali SS, Jhariya MK. Structure and functioning of Coriaria nepalensis dominated shrublands in degraded hills of Kumaun Himalaya. I. Dry matter dynamics. Land Degradation & Development. 2022;33:1474–1494. doi: 10.1002/ldr.4235. [DOI] [Google Scholar]
5.Mourya NR, Bargali K, Bargali SS. Impacts of Coriaria nepalensis colonization on vegetation structure and regeneration dynamics in a mixed conifer forest of Indian Central Himalaya. Journal of Forestry Research. 2019;30:305–317. doi: 10.1007/s11676-018-0613-x. [DOI] [Google Scholar]
6.Bargali K, Tewari A. Growth and water relation parameters in drought-stressed Coriaria nepalensis seedlings. Journal of Arid Environments. 2004;58:505–512. doi: 10.1016/j.jaridenv.2004.01.002. [DOI] [Google Scholar]
7.Zeng XM, Xu XL, Yi RZ, Zhong FX, Zhang YH. Sap flow and plant water sources for typical vegetation in a subtropical humid karst area of southwest China. Hydrological Processes. 2021;35:e14090. doi: 10.1002/hyp.14090. [DOI] [Google Scholar]
8.Tiwari M, Singh SP, Tiwari A, Sundriyal RC. Effect of symbiotic associations on growth of host Coriaria nepalensis and its facilitative impact on oak and pine seedlings in the Central Himalaya. Forest Ecology and Management. 2003;184:141–147. doi: 10.1016/S0378-1127(03)00209-3. [DOI] [Google Scholar]
9.Fang SZ, Li HY, Xie BD. Decomposition and nutrient release of four potential mulching materials for poplar plantations on upland sites. Agroforestry Systems. 2008;74:27–35. doi: 10.1007/s10457-008-9155-0. [DOI] [Google Scholar]
10.Yan K, et al. Current re-vegetation patterns and restoration issues in degraded geological phosphorus-rich mountain areas: A synthetic analysis of Central Yunnan, SW China. Plant Divers. 2017;39:140–148. doi: 10.1016/j.pld.2017.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Ahmad A, Khan A, Kumar P, Bhatt RP, Manzoor N. Antifungal activity of Coriaria nepalensis essential oil by disrupting ergosterol biosynthesis and membrane integrity against. Candida. Yeast. 2011;28:611–617. doi: 10.1002/yea.1890. [DOI] [PubMed] [Google Scholar]
12.Kumar P, et al. Antimicrobial activities of essential oil and methanol extract of Coriaria nepalensis. Nat Prod Res. 2011;25:1074–1081. doi: 10.1080/14786419.2010.529545. [DOI] [PubMed] [Google Scholar]
13.Zhao F, et al. New sesquiterpenes from the roots of Coriaria nepalensis. Tetrahedron. 2012;68:6204–6210. doi: 10.1016/j.tet.2012.05.067. [DOI] [Google Scholar]
14.Fang HL, Shang FN, Qian J, Duan BZ. Phylogenetic relationship and characterization of the complete chloroplast genome of the Coriaria nepalensis Wall. in China, a least concern folk medicine. Mitochondrial DNA Part B-Resources. 2020;5:1718–1719. doi: 10.1080/23802359.2020.1749179. [DOI] [Google Scholar]
15.Li ML, et al. Semisynthesis and antifeedant activity of new acylated derivatives of tutin, a sesquiterpene lactone from Coriaria sinica. Heterocycles. 2007;71:1155–1162. doi: 10.3987/COM-07-11021. [DOI] [Google Scholar]
16.Guo LX, Qiang TT, Ma YM, Wang K, Du K. Optimisation of tannin extraction from Coriaria nepalensis bark as a renewable resource for use in tanning. Industrial Crops and Products. 2020;149:112360. doi: 10.1016/j.indcrop.2020.112360. [DOI] [Google Scholar]
17.Guo LX, Qiang TT, Ma YM, Ren LF, Dai TT. Purification and characterization of hydrolysable tannins extracted from Coriaria nepalensis bark using macroporous resin and their application in gallic acid production. Industrial Crops and Products. 2021;162:113302. doi: 10.1016/j.indcrop.2021.113302. [DOI] [Google Scholar]
18.Yokoyama J, Suzuki M, Iwatsuki K, Hasebe M. Molecular phylogeny of Coriaria, with special emphasis on the disjunct distribution. Mol Phylogenet Evol. 2000;14:11–19. doi: 10.1006/mpev.1999.0672. [DOI] [PubMed] [Google Scholar]
19.Chase MW, et al. Phylogenetics of seed plants: An analysis of nucleotide sequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden. 1993;80:528–580. doi: 10.2307/2399846. [DOI] [Google Scholar]
20.Swensen SM, Mullin BC, Chase MW. Phylogenetic affinities of Datiscaceae based on an analysis of nucleotide sequences from the plastid rbcL gene. Systematic Botany. 1994;19:157–168. doi: 10.2307/2419719. [DOI] [Google Scholar]
21.Swensen SM. The evolution of actinorhizal symbioses: Evidence for multiple origins of the symbiotic association. American Journal of Botany. 1996;83:1503–1512. doi: 10.1002/j.1537-2197.1996.tb13943.x. [DOI] [Google Scholar]
22.Griesmann M, et al. Phylogenomics reveals multiple losses of nitrogen-fixing root nodule symbiosis. Science. 2018;361:eaat1743. doi: 10.1126/science.aat1743. [DOI] [PubMed] [Google Scholar]
23.Li L, et al. Genomes shed light on the evolution of Begonia, a mega-diverse genus. New Phytol. 2022;234:295–310. doi: 10.1111/nph.17949. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Xie T, et al. De novo plant genome assembly based on chromatin interactions: a case study of Arabidopsis thaliana. Mol Plant. 2015;8:489–492. doi: 10.1016/j.molp.2014.12.015. [DOI] [PubMed] [Google Scholar]
25.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–770. doi: 10.1093/bioinformatics/btr011. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Sun H, Ding J, Piednoël M, Schneeberger K. findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies. Bioinformatics. 2017;34:550–557. doi: 10.1093/bioinformatics/btx637. [DOI] [PubMed] [Google Scholar]
28.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18:170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Durand NC, et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Durand NC, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3:99–101. doi: 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Xu M, et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience. 2020;9:giaa094. doi: 10.1093/gigascience/giaa094. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Hu J, Fan J, Sun Z, Liu S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics. 2020;36:2253–2255. doi: 10.1093/bioinformatics/btz891. [DOI] [PubMed] [Google Scholar]
34.Pryszcz LP, Gabaldon T. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 2016;44:e113. doi: 10.1093/nar/gkw294. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Jin JJ, et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 2020;21:241. doi: 10.1186/s13059-020-02154-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Ou S, et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275. doi: 10.1186/s13059-019-1905-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Huang S, et al. The genome of the cucumber, Cucumis sativus L. Nat Genet. 2009;41:1275–1281. doi: 10.1038/ng.475. [DOI] [PubMed] [Google Scholar]
39.Jaillon O, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007;449:463–467. doi: 10.1038/nature06148. [DOI] [PubMed] [Google Scholar]
40.International Peach Genome I, et al. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genet. 2013;45:487–494. doi: 10.1038/ng.2586. [DOI] [PubMed] [Google Scholar]
41.Arabidopsis Genome I. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. doi: 10.1038/35048692. [DOI] [PubMed] [Google Scholar]
42.Grabherr MG, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–652. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12:357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Pertea M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Haas BJ, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–5666. doi: 10.1093/nar/gkg770. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008;24:637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]
48.Cantarel BL, et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18:188–196. doi: 10.1101/gr.6743907. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Slater GS, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31. doi: 10.1186/1471-2105-6-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 2008;9:R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Filiault DL, et al. The Aquilegia genome provides insight into adaptive radiation and reveals an extraordinarily polymorphic chromosome with a unique history. Elife. 2018;7:e36426. doi: 10.7554/eLife.36426. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Wu S, et al. The genome sequence of star fruit (Averrhoa carambola) Hortic Res. 2020;7:95. doi: 10.1038/s41438-020-0307-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Tuskan GA, et al. The genome of black cottonwood, Populus trichocarpa. Science. 2006;313:1596–1604. doi: 10.1126/science.1128691. [DOI] [PubMed] [Google Scholar]
55.Tu L, et al. Genome of Tripterygium wilfordii and identification of cytochrome P450 involved in triptolide biosynthesis. Nat Commun. 2020;11:971. doi: 10.1038/s41467-020-14776-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Duan N, et al. Genome re-sequencing reveals the history of apple and supports a two-stage model for fruit enlargement. Nat Commun. 2017;8:249. doi: 10.1038/s41467-017-00336-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Xie D, et al. The wax gourd genomes offer insights into the genetic diversity and ancestral cucurbit karyotype. Nat Commun. 2019;10:5158. doi: 10.1038/s41467-019-13185-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Fu R, et al. Genome-wide analyses of introgression between two sympatric Asian oak species. Nat Ecol Evol. 2022;6:924–935. doi: 10.1038/s41559-022-01754-7. [DOI] [PubMed] [Google Scholar]
59.Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238. doi: 10.1186/s13059-019-1832-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Emms, D. M. & Kelly, S. STAG: Species tree inference from all genes. bioRxiv, 267914 (2018).
61.Sun P, et al. WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant. 2022;15:1841–1851. doi: 10.1016/j.molp.2022.10.018. [DOI] [PubMed] [Google Scholar]
62.Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
63.Huerta-Cepas J, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research. 2018;47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Huerta-Cepas J, et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol Biol Evol. 2017;34:2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
66.Consortium TU. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research. 2020;49:D480–D489. doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Coordinators NR. Database resources of the National Center for Biotechnology Information. Nucleic Acids Research. 2013;42:D7–D17. doi: 10.1093/nar/gkt1146. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Jones P, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Kalvari I, et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Research. 2020;49:D192–D200. doi: 10.1093/nar/gkaa1047. [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Kalvari I, et al. Non-coding RNA analysis using the Rfam database. Curr Protoc Bioinformatics. 2018;62:e51. doi: 10.1002/cpbi.51. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Goel M, Sun H, Jiao WB, Schneeberger K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019;20:277. doi: 10.1186/s13059-019-1911-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Goel M, Schneeberger K. plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics. 2022;38:2922–2926. doi: 10.1093/bioinformatics/btac196. [DOI] [PMC free article] [PubMed] [Google Scholar]
75.2022. NCBI Sequence Read Archive (SRR22412655) SRR22412655
76.2022. NCBI Sequence Read Archive (SRR22026041) SRR22026041
77.2022. NCBI Sequence Read Archive (SRR22026042) SRR22026042
78.2022. NCBI Sequence Read Archive (SRR22026043) SRR22026043
79.2022. NCBI Assembly. GCA_027190085.1
80.2022. NCBI Assembly. GCA_027186245.1
81.Zhao SW, 2023. Haplotype-resolved genome assembly of Coriaria nepalensis, a non-legume nitrogen-fixing shrub associated with Frankia. figshare. [DOI] [PMC free article] [PubMed]
82.Li, H. J. A. P. A. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997v2 (2013).
83.Danecek P, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10:giab008. doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
84.Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:245. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

2022. NCBI Sequence Read Archive (SRR22412655) SRR22412655
2022. NCBI Sequence Read Archive (SRR22026041) SRR22026041
2022. NCBI Sequence Read Archive (SRR22026042) SRR22026042
2022. NCBI Sequence Read Archive (SRR22026043) SRR22026043
2022. NCBI Assembly. GCA_027190085.1
2022. NCBI Assembly. GCA_027186245.1
Zhao SW, 2023. Haplotype-resolved genome assembly of Coriaria nepalensis, a non-legume nitrogen-fixing shrub associated with Frankia. figshare. [DOI] [PMC free article] [PubMed]

Data Availability Statement

All data processing commands and pipelines were carried out in accordance with the instructions and guidelines provided by the relevant bioinformatic software.

[CR1] 1.Oginuma K, Nakata M, Suzuki M, Tobe H. Karyomorphology of Coriaria (Coriariaceae): Taxonomic implications. The Botanical Magazine Tokyo. 1991;104:297–308. doi: 10.1007/BF02488383. [DOI] [Google Scholar]

[CR2] 2.Montserrat P. Root nodules of Coriaria. Nature. 1958;182:475–475. doi: 10.1038/182475a0. [DOI] [Google Scholar]

[CR3] 3.Hu C, Zhou P, Zhou Q, Chen H, Akkermans ADL. Nodulation and molecular characterization of pure cultures isolated from root nodules of Coriaria nepalensis. Chinese Science Bulletin. 1998;43:695–698. doi: 10.1007/BF02883580. [DOI] [Google Scholar]

[CR4] 4.Awasthi P, Bargali K, Bargali SS, Jhariya MK. Structure and functioning of Coriaria nepalensis dominated shrublands in degraded hills of Kumaun Himalaya. I. Dry matter dynamics. Land Degradation & Development. 2022;33:1474–1494. doi: 10.1002/ldr.4235. [DOI] [Google Scholar]

[CR5] 5.Mourya NR, Bargali K, Bargali SS. Impacts of Coriaria nepalensis colonization on vegetation structure and regeneration dynamics in a mixed conifer forest of Indian Central Himalaya. Journal of Forestry Research. 2019;30:305–317. doi: 10.1007/s11676-018-0613-x. [DOI] [Google Scholar]

[CR6] 6.Bargali K, Tewari A. Growth and water relation parameters in drought-stressed Coriaria nepalensis seedlings. Journal of Arid Environments. 2004;58:505–512. doi: 10.1016/j.jaridenv.2004.01.002. [DOI] [Google Scholar]

[CR7] 7.Zeng XM, Xu XL, Yi RZ, Zhong FX, Zhang YH. Sap flow and plant water sources for typical vegetation in a subtropical humid karst area of southwest China. Hydrological Processes. 2021;35:e14090. doi: 10.1002/hyp.14090. [DOI] [Google Scholar]

[CR8] 8.Tiwari M, Singh SP, Tiwari A, Sundriyal RC. Effect of symbiotic associations on growth of host Coriaria nepalensis and its facilitative impact on oak and pine seedlings in the Central Himalaya. Forest Ecology and Management. 2003;184:141–147. doi: 10.1016/S0378-1127(03)00209-3. [DOI] [Google Scholar]

[CR9] 9.Fang SZ, Li HY, Xie BD. Decomposition and nutrient release of four potential mulching materials for poplar plantations on upland sites. Agroforestry Systems. 2008;74:27–35. doi: 10.1007/s10457-008-9155-0. [DOI] [Google Scholar]

[CR10] 10.Yan K, et al. Current re-vegetation patterns and restoration issues in degraded geological phosphorus-rich mountain areas: A synthetic analysis of Central Yunnan, SW China. Plant Divers. 2017;39:140–148. doi: 10.1016/j.pld.2017.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Ahmad A, Khan A, Kumar P, Bhatt RP, Manzoor N. Antifungal activity of Coriaria nepalensis essential oil by disrupting ergosterol biosynthesis and membrane integrity against. Candida. Yeast. 2011;28:611–617. doi: 10.1002/yea.1890. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Kumar P, et al. Antimicrobial activities of essential oil and methanol extract of Coriaria nepalensis. Nat Prod Res. 2011;25:1074–1081. doi: 10.1080/14786419.2010.529545. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Zhao F, et al. New sesquiterpenes from the roots of Coriaria nepalensis. Tetrahedron. 2012;68:6204–6210. doi: 10.1016/j.tet.2012.05.067. [DOI] [Google Scholar]

[CR14] 14.Fang HL, Shang FN, Qian J, Duan BZ. Phylogenetic relationship and characterization of the complete chloroplast genome of the Coriaria nepalensis Wall. in China, a least concern folk medicine. Mitochondrial DNA Part B-Resources. 2020;5:1718–1719. doi: 10.1080/23802359.2020.1749179. [DOI] [Google Scholar]

[CR15] 15.Li ML, et al. Semisynthesis and antifeedant activity of new acylated derivatives of tutin, a sesquiterpene lactone from Coriaria sinica. Heterocycles. 2007;71:1155–1162. doi: 10.3987/COM-07-11021. [DOI] [Google Scholar]

[CR16] 16.Guo LX, Qiang TT, Ma YM, Wang K, Du K. Optimisation of tannin extraction from Coriaria nepalensis bark as a renewable resource for use in tanning. Industrial Crops and Products. 2020;149:112360. doi: 10.1016/j.indcrop.2020.112360. [DOI] [Google Scholar]

[CR17] 17.Guo LX, Qiang TT, Ma YM, Ren LF, Dai TT. Purification and characterization of hydrolysable tannins extracted from Coriaria nepalensis bark using macroporous resin and their application in gallic acid production. Industrial Crops and Products. 2021;162:113302. doi: 10.1016/j.indcrop.2021.113302. [DOI] [Google Scholar]

[CR18] 18.Yokoyama J, Suzuki M, Iwatsuki K, Hasebe M. Molecular phylogeny of Coriaria, with special emphasis on the disjunct distribution. Mol Phylogenet Evol. 2000;14:11–19. doi: 10.1006/mpev.1999.0672. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Chase MW, et al. Phylogenetics of seed plants: An analysis of nucleotide sequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden. 1993;80:528–580. doi: 10.2307/2399846. [DOI] [Google Scholar]

[CR20] 20.Swensen SM, Mullin BC, Chase MW. Phylogenetic affinities of Datiscaceae based on an analysis of nucleotide sequences from the plastid rbcL gene. Systematic Botany. 1994;19:157–168. doi: 10.2307/2419719. [DOI] [Google Scholar]

[CR21] 21.Swensen SM. The evolution of actinorhizal symbioses: Evidence for multiple origins of the symbiotic association. American Journal of Botany. 1996;83:1503–1512. doi: 10.1002/j.1537-2197.1996.tb13943.x. [DOI] [Google Scholar]

[CR22] 22.Griesmann M, et al. Phylogenomics reveals multiple losses of nitrogen-fixing root nodule symbiosis. Science. 2018;361:eaat1743. doi: 10.1126/science.aat1743. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Li L, et al. Genomes shed light on the evolution of Begonia, a mega-diverse genus. New Phytol. 2022;234:295–310. doi: 10.1111/nph.17949. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Xie T, et al. De novo plant genome assembly based on chromatin interactions: a case study of Arabidopsis thaliana. Mol Plant. 2015;8:489–492. doi: 10.1016/j.molp.2014.12.015. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–770. doi: 10.1093/bioinformatics/btr011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Sun H, Ding J, Piednoël M, Schneeberger K. findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies. Bioinformatics. 2017;34:550–557. doi: 10.1093/bioinformatics/btx637. [DOI] [PubMed] [Google Scholar]

[CR28] 28.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18:170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Durand NC, et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Durand NC, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3:99–101. doi: 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Xu M, et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience. 2020;9:giaa094. doi: 10.1093/gigascience/giaa094. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Hu J, Fan J, Sun Z, Liu S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics. 2020;36:2253–2255. doi: 10.1093/bioinformatics/btz891. [DOI] [PubMed] [Google Scholar]

[CR34] 34.Pryszcz LP, Gabaldon T. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 2016;44:e113. doi: 10.1093/nar/gkw294. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Jin JJ, et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 2020;21:241. doi: 10.1186/s13059-020-02154-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Ou S, et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275. doi: 10.1186/s13059-019-1905-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Huang S, et al. The genome of the cucumber, Cucumis sativus L. Nat Genet. 2009;41:1275–1281. doi: 10.1038/ng.475. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Jaillon O, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007;449:463–467. doi: 10.1038/nature06148. [DOI] [PubMed] [Google Scholar]

[CR40] 40.International Peach Genome I, et al. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genet. 2013;45:487–494. doi: 10.1038/ng.2586. [DOI] [PubMed] [Google Scholar]

[CR41] 41.Arabidopsis Genome I. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. doi: 10.1038/35048692. [DOI] [PubMed] [Google Scholar]

[CR42] 42.Grabherr MG, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–652. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12:357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Pertea M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Haas BJ, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–5666. doi: 10.1093/nar/gkg770. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008;24:637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]

[CR48] 48.Cantarel BL, et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18:188–196. doi: 10.1101/gr.6743907. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Slater GS, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31. doi: 10.1186/1471-2105-6-31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 2008;9:R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Filiault DL, et al. The Aquilegia genome provides insight into adaptive radiation and reveals an extraordinarily polymorphic chromosome with a unique history. Elife. 2018;7:e36426. doi: 10.7554/eLife.36426. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Wu S, et al. The genome sequence of star fruit (Averrhoa carambola) Hortic Res. 2020;7:95. doi: 10.1038/s41438-020-0307-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Tuskan GA, et al. The genome of black cottonwood, Populus trichocarpa. Science. 2006;313:1596–1604. doi: 10.1126/science.1128691. [DOI] [PubMed] [Google Scholar]

[CR55] 55.Tu L, et al. Genome of Tripterygium wilfordii and identification of cytochrome P450 involved in triptolide biosynthesis. Nat Commun. 2020;11:971. doi: 10.1038/s41467-020-14776-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.Duan N, et al. Genome re-sequencing reveals the history of apple and supports a two-stage model for fruit enlargement. Nat Commun. 2017;8:249. doi: 10.1038/s41467-017-00336-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Xie D, et al. The wax gourd genomes offer insights into the genetic diversity and ancestral cucurbit karyotype. Nat Commun. 2019;10:5158. doi: 10.1038/s41467-019-13185-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR58] 58.Fu R, et al. Genome-wide analyses of introgression between two sympatric Asian oak species. Nat Ecol Evol. 2022;6:924–935. doi: 10.1038/s41559-022-01754-7. [DOI] [PubMed] [Google Scholar]

[CR59] 59.Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238. doi: 10.1186/s13059-019-1832-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR60] 60.Emms, D. M. & Kelly, S. STAG: Species tree inference from all genes. bioRxiv, 267914 (2018).

[CR61] 61.Sun P, et al. WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant. 2022;15:1841–1851. doi: 10.1016/j.molp.2022.10.018. [DOI] [PubMed] [Google Scholar]

[CR62] 62.Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[CR63] 63.Huerta-Cepas J, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research. 2018;47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR64] 64.Huerta-Cepas J, et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol Biol Evol. 2017;34:2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]

[CR66] 66.Consortium TU. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research. 2020;49:D480–D489. doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR67] 67.Coordinators NR. Database resources of the National Center for Biotechnology Information. Nucleic Acids Research. 2013;42:D7–D17. doi: 10.1093/nar/gkt1146. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR68] 68.Jones P, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR69] 69.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR70] 70.Kalvari I, et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Research. 2020;49:D192–D200. doi: 10.1093/nar/gkaa1047. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR71] 71.Kalvari I, et al. Non-coding RNA analysis using the Rfam database. Curr Protoc Bioinformatics. 2018;62:e51. doi: 10.1002/cpbi.51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR72] 72.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR73] 73.Goel M, Sun H, Jiao WB, Schneeberger K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019;20:277. doi: 10.1186/s13059-019-1911-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR74] 74.Goel M, Schneeberger K. plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics. 2022;38:2922–2926. doi: 10.1093/bioinformatics/btac196. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR75] 75.2022. NCBI Sequence Read Archive (SRR22412655) SRR22412655

[CR76] 76.2022. NCBI Sequence Read Archive (SRR22026041) SRR22026041

[CR77] 77.2022. NCBI Sequence Read Archive (SRR22026042) SRR22026042

[CR78] 78.2022. NCBI Sequence Read Archive (SRR22026043) SRR22026043

[CR79] 79.2022. NCBI Assembly. GCA_027190085.1

[CR80] 80.2022. NCBI Assembly. GCA_027186245.1

[CR81] 81.Zhao SW, 2023. Haplotype-resolved genome assembly of Coriaria nepalensis, a non-legume nitrogen-fixing shrub associated with Frankia. figshare. [DOI] [PMC free article] [PubMed]

[CR82] 82.Li, H. J. A. P. A. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997v2 (2013).

[CR83] 83.Danecek P, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10:giab008. doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR84] 84.Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:245. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Haplotype-resolved genome assembly of Coriaria nepalensis a non-legume nitrogen-fixing shrub

Shi-Wei Zhao

Jing-Fang Guo

Lei Kong

Shuai Nie

Xue-Mei Yan

Tian-Le Shi

Xue-Chan Tian

Hai-Yao Ma

Yu-Tao Bao

Zhi-Chao Li

Zhao-Yang Chen

Ren-Gang Zhang

Yong-Peng Ma

Yousry A El-Kassaby

Ilga Porth

Wei Zhao

Jian-Feng Mao

Abstract

Background & Summary

Methods

Sample collection, library construction, and genome size estimation

Fig. 1.

De novo genome assembly

Fig. 2.

Table 1.

Fig. 3.

Repeat annotation

Table 2.

Protein-coding genes prediction and other annotations

Table 3.

Genome comparison between haplotype assemblies

Fig. 4.

Data Records

Technical Validation

Fig. 5.

Table 4.

Acknowledgements

Author contributions

Funding

Code availability

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases