A draft genome for Spatholobus suberectus

Shuangshuang Qin; Lingqing Wu; Kunhua Wei; Ying Liang; Zhijun Song; Xiaolei Zhou; Shuo Wang; Mingjie Li; Qinghua Wu; Kaijian Zhang; Yuanyuan Hui; Shuying Wang; Jianhua Miao; Zhongyi Zhang

doi:10.1038/s41597-019-0110-x

. 2019 Jul 4;6:113. doi: 10.1038/s41597-019-0110-x

A draft genome for Spatholobus suberectus

Shuangshuang Qin ^1,^2,^#, Lingqing Wu ^3,^#, Kunhua Wei ², Ying Liang ², Zhijun Song ², Xiaolei Zhou ², Shuo Wang ², Mingjie Li ¹, Qinghua Wu ², Kaijian Zhang ³, Yuanyuan Hui ³, Shuying Wang ³, Jianhua Miao ^2,^✉, Zhongyi Zhang ^1,^4,^✉

PMCID: PMC6609623 PMID: 31273216

Abstract

Spatholobus suberectus Dunn (S. suberectus), which belongs to the Leguminosae, is an important medicinal plant in China. Owing to its long growth cycle and increased use in human medicine, wild resources of S. suberectus have decreased rapidly and may be on the verge of extinction. De novo assembly of the whole S. suberectus genome provides us a critical potential resource towards biosynthesis of the main bioactive components and seed development regulation mechanism of this plant. Utilizing several sequencing technologies such as Illumina HiSeq X Ten, single-molecule real-time sequencing, 10x Genomics, as well as new assembly techniques such as FALCON and chromatin interaction mapping (Hi-C), we assembled a chromosome-scale genome about 798 Mb in size. In total, 748 Mb (93.73%) of the contig sequences were anchored onto nine chromosomes with the longest scaffold being 103.57 Mb. Further annotation analyses predicted 31,634 protein-coding genes, of which 93.9% have been functionally annotated. All data generated in this study is available in public databases.

Subject terms: Plant sciences, Genomics, Genome, DNA sequencing

Design Type(s)	sequence assembly objective • sequence annotation objective
Measurement Type(s)	whole genome sequencing assay
Technology Type(s)	DNA sequencing
Factor Type(s)	growth condition
Sample Characteristic(s)	Spatholobus suberectus • leaf

Open in a new tab

Machine-accessible metadata file describing the reported data (ISA-Tab format)

Background & Summary

Spatholobus suberectus Dunn is widely used as a food supplement in tea, wine, and soup as well as being one of the most important Chinese medicinal plants (Fig. 1a) for treatment of various diseases such as blood stasis syndrome, abnormal menstruation, and rheumatism¹. It is mainly distributed in Fujian, Guangdong, Yunnan Province, and the Guangxi Zhuang Autonomous Region of China². The vine stem of S. suberectus, called “chicken blood vines” in China due to an outflow of red juice outflow when the vine stem is injured (Fig. 1b), is the critical medicinal component³. Pharmacological and clinical studies have demonstrated that S. suberectus exhibits various functions against oxidation⁴, viruses⁵, bacteria⁶, cancer⁷, and platelets⁸. The crud drug of S. suberectus is therefore used in many patented Chinese medicines, and the market demand for the wild resource is increasing rapidly. But unlike other Leguminosae plants, the seed setting rate of S. suberectus is low (Fig. 1c), and most of the fruit falls off before seed maturation, which results in a low natural reproductive capacity. The growth cycle of S. suberectus is very long and the crud drug must grow for more than seven years before it can be used as medicine. These factors have combined to decrease the wild resources of S. suberectus in China to the verge of extinction.

Fig. 1 — Morphological character of S. *suberectus*. (a) A picture of S. *suberectus* plant. (b) The vine stem of S. *suberectus* is called “chicken blood vines”. (c) The pod of S. *suberectus* has only one seed.

To investigate biosynthesis of the main bioactive components and seed development mechanism needed for future S. suberectus production we generated a high-quality draft version of the S. suberectus genome. Whole-genome sequencing of several species in Leguminosae plants have been performed, for instance, Lotus japonicus⁹, Glycine max¹⁰, Medicago truncatula¹¹, Glycyrrhiza uralensis¹², Cicer arietinum¹³, and Cajanus cajan¹⁴, however, there are few reports of Subtribe Erythrininae Benth, containing nine genera of Leguminosae². As one of the members of this subtribe, genomic information of S. suberectus can fill this gap.

The genome size of S. suberectus, a diploid (2n = 18) species, was estimated to be 793 Mb using 17-mer frequency distribution analysis with SOAPdenovo. In this study, we combined sequences generated on the Illumina, PacBio, and 10X Genomics GemCode platform as well as the new assembly technique FALCON to generate the first draft genome assembly of S. suberectus. The assembled genome is about 798 Mb with scaffold and contig N50 sizes of 6.9 Mb and 2.1 Mb, respectively. The S. suberectus assembly was further refined using 233.19 Gb Hi-C data: 748 Mb (93.73%) of the contig sequences were anchored onto nine chromosomes, the scaffold N50 was improved to be 86.99 Mb, and the longest scaffold was 103.57 Mb.

Almost half of the S. suberectus genome (47.82%) was occupied by repetitive elements, the largest amount of which was long terminal repeat retrotransposons (17.32%). Combined with homology-based predictions, de novo predictions and transcriptome-based predictions, 31,634 protein-coding genes with an average transcript size of 1,097.55 bp were predicted in the genome. In total, 93.9% (29,688) of protein-coding genes were successfully functionally annotated.

Methods

Plant materials and DNA extraction

S. suberectus samples from Nanning, Guangxi Zhuang Autonomous Region, China (22°51′28″N, 108°22′2″E) were selected for genome sequencing. The samples were kept at the Guangxi Botanical Garden of Medicinal Plants for breeding and research purposes. Total genomic DNA was isolated from fresh young leaves of 8-year-old S. suberectus using the Plant DNA Kit (TIANGEN) according to the manufacturer’s instructions.

Library construction and sequencing

The DNA was sheared by a Covaris® M220 focused-ultrasonicator^TM (Covaris, Woburn, Massachusetts, USA). The sheared DNA, with fragment sizes of 250 bp and 450 bp, was processed using the TrueSeq DNA PCR-Free LT Library Kit protocol. PCR products were purified (AMPure XP system) and library quality was assessed on an Agilent Bioanalyzer 2100 system. These PCR-Free libraries were sequenced with a HiSeq X Ten instrument as 150 bp paired-end reads. In total, 77.73 Gb of raw sequence data were generated (Table 1).

Table 1.

The sizes of sequencing data using various sequencing platforms.

Pair-end libraries	Platform	Insert size	Total Data(G)	Read length (bp)	Sequence Coverage(X)
Illumina	Illumina HiSeq	250 bp	41.89	150	52.82
Illumina	Illumina HiSeq	450 bp	35.84	150	45.20
Pacbio reads	Pacbio Sequel	20 kb	63.27	—	79.79
10×	Illumina HiSeq	20 kb	123.09	150	155.22
Hi-C	Illumina HiSeq	350 bp	233.19	150	293.92

Open in a new tab

Sheared DNA (40 μg) was purified and concentrated with AMPure PB beads (PacBio) and further used for SMRTbell preparation according to the manufacturer’s protocol (Pacific Biosciences; 20-kb template preparation using BluePippin (Sage Science) size selection system with a 15-kb cut-off). The libraries were then sequenced with a PacBio sequel instrument (Pacific Biosciences, Menlo 31 Park, CA, USA). A total of 11 SMRT Cells were used to yield 79.79-fold genome coverage of sequence data (Table 1), consisting of 63.27 Gb sequence data with an N50 read length of 14,288 bp (Table 2).

Table 2.

Statistics of characteristics of Pacbio long-read.

Read_type	Read_base	Read_Number	Read_length (max)	Read_length (mean)	Read_length (N50)
Subreads	63,270,110,556	6,710,707	122,873	9,428	14,288

Open in a new tab

The linked read sequencing libraries were constructed on a 10X Genomics GemCode platform¹⁵. Sample indexing and partition barcoded libraries were prepared using the Chromium Genome Reagent Kit (10x Genomics) according to the manufacturer’s instructions. The barcode sequencing library was first quantified by Qubit2.0, insert size was checked using an Agilent2100, and finally quantified by qPCR. The 123.09 Gb library was sequenced with 150 bp paired-end reads on an Illumina HiSeq X Ten platform (Table 1).

For the Hi-C library, chromatin was fixed in place with formaldehyde in the nucleus. Fixed chromatin was digested with DpnII restriction endonuclease, 5′ overhangs were filled in with biotinylated nucleotides, and free blunt ends were ligated. After ligation, cross-links were reversed, and the DNA was purified from protein. Purified DNA was treated to remove biotin that was not internal to the ligated fragments. The DNA was then sheared to a mean fragment size of 350 bp, and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adaptors. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The libraries were sequenced on an Illumina HiSeq platform to produce 233.19 Gb Hi-C sequence data (Table 1). The quality of Hi-C sequencing was evaluated using HiCUP¹⁶. The effect rate (%) = Unique di-tigs/Total Reads Processed = 4,356,614/10,000,000 = 43.57% (Table 3). Typically, 35.91% of the alignable read pairs represent interchromosomal interactions. Eleven percent represents intrachromosomal interactions between fragments less than 10 kb apart and 53.09% are intrachromosomal read pairs that are more than 10 kb apart (Table 3).

Table 3.

Statistics of Hi-C sequencing and mapping.

Statistics of mapping
	Read1	Read2
Total Reads	10,000,000	10,000,000
Unique Alignments	7,869,514	7,702,126
Multiple Alignments	859,867	832,203
Failed To Align	866,895	1,073,869
Unique Mapped Paired-end Reads	6,056,459	6,056,459
Statistics of valid reads
Unique Mapped Paired-end Reads	6,056,459
Invalid Paired-end Pairs	1,699,845
Valid Paired-end Reads	4,356,614
Valid Rate (%)	43.56
Cis-close (<10 Kbp)	478,994
Cis-far (>10 Kbp)	2,313,017
Trans	1,564,603

Open in a new tab

Cis-close (<10 Kbp): interactions between intrachromosomal read pairs less than 10 kb apart.

Cis-far (>10 Kbp): interactions between intrachromosomal read pairs more than 10 kb apart.

Trans: the alignable read pairs represent interchromosomal interactions.

Estimation of the S. suberectus genome size

Quality-filtered reads from the Illumina platform were subjected to 17-mer frequency distribution analysis with SOAPdenovo¹⁷. K-mer 17 was selected to estimate the genome size and heterozygosity of S. suberectus (Fig. 2). We plotted the distribution of k-mer depth against frequency with a main peak occurring at the depth of 40 (Fig. 2). Based on the total number of k-mers (32,476,446,092), the S. suberectus genome size was calculated to be approximately 793.39 Mb, using the following formula: genome size = k-mer_Number/Peak_Depth and Revised Gsize = Genome size × (1-Error Rate). The heterozygosity of the S. suberectus genome is 0.74%.

Fig. 2 — Estimation of S. *suberectus* genome size by K-mer analysis.

Genome assembly

De novo assembly of the 63.27 Gb PacBio single-molecule long reads from SMRT Sequencing was performed using FALCON (https://github.com/PacificBiosciences/FALCON/)¹⁸. In order to get enough corrected reads, the longest 60 subreads were first selected as seed reads to do error correction. Then error-corrected reads were aligned to each other and assembled into genomic contigs using FALCON with parameters length_cutoff_pr = 5000, max_diff = 120, max_cov = 130. The draft assembly was polished using the quiver algorithm. Pilon was used to perform error correction of p-contigs with 98.02X coverage of short paired-end reads generated from Illumina HiSeq Platforms¹⁹. The assembly consisted of 1,954 contigs, with a contig N50 length of 2.06 Mb (total length = 794 Mb) (Table 4).

Table 4.

Summary of S. suberectus genome assembly using PacBio long reads.

Sample ID	Length	Number
Sample ID	Contig (bp)	Contig
Total	794,088,373	1,954
Max	8,229,915	—
Number >=2000	—	1,928
N50	2,057,658	114
N60	1,446,732	161
N70	1,036,389	226
N80	673,988	322

Open in a new tab

We used BWA-MEM²⁰ to align the 10X Genomics data to the assembly using default settings. Scaffolding was performed by fragScaff (in vitro, long-range sequence information for de novo genome assembly via transposase contiguity) with the barcoded sequencing reads.

The assembly consisted of 1,146 scaffolds, with the scaffold N50 length improving to 6.9 Mb (total length = 798 Mb) and contig N50 of 2.1 Mb (Table 5). The genome assembly size is similar to the estimated genome size by k-mer analysis.

Table 5.

Summary of S. suberectus genome assembly using PacBio long reads and 10X genomics data.

Sample ID	Length		Number
Sample ID	Contig (bp)	Scaffold(bp)	Contig	Scaffold
Total	794,088,373	798,435,360	1,954	1,146
Max	8,229,915	27,701,983	—	—
Number >=2000	—	—	1,928	1,120
N50	2,057,658	6,903,381	114	34
N60	1,446,732	5,179,305	161	47
N70	1,036,389	3,931,704	226	64
N80	673,988	2,630,391	322	89

Open in a new tab

The input de novo assembly, shotgun reads, and Dovetail Hi-C library reads were used as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies²¹. Shotgun and Dovetail Hi-C library sequences were aligned to the draft input assembly using a modified SNAP read mapper (http://snap.cs.berkeley.edu). The separations of Dovetail Hi-C read pairs mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins, score prospective joins, and make joins above a threshold. After scaffolding, shotgun sequences were used to close gaps between contigs.

The S. suberectus assembly was further refined using 233.19 Gb Hi-C data (Table 1): 748 Mb (93.73%) of the contig sequences were anchored onto nine chromosomes (Fig. 3). The scaffold N50 was finally improved to be 86.99 Mb and the longest scaffold was 103.57 Mb.

Fig. 3 — Diagrammatic sketch of the annotation pipeline.

Indentification of repetitive elements in S. suberectus

Tandem Repeat Finder²² was employed to identify tandem repeats in the S. suberectus genome. RepeatMasker (http://www.repeatmasker.org) and RepeatProteinMasker²³ were used against Repbase²⁴ to identify known transposable element repeats. In addition, RepeatModeler (http://www.repeatmasker.org/RepeatModeler.html), RepeatScout (http://www.repeatmasker.org/)²⁵, PILER (http://www.drive5.com/piler/)²⁶, and LTR_Finder (http://tlife.fudan.edu.cn/ltr_finder)²⁷ were utilized to identify de novo evolved repeats (Fig. 3).

The combined results show that almost half of the S. suberectus genome (47.82%) was occupied by repetitive elements (Fig. 4b–e). Among these, long terminal repeat (LTR) retrotransponsons represent the largest amount of repetitive elements, reaching 17.32% of the genome, fewer than soybean (42%)¹⁰ and chickpea (46%)²⁸, but are similar to Lotus japonicus (18%)⁹. LTR/Copia repeats were the most abundant, making up 10.06% of the genome (Fig. 4d), followed by LTR/Gypsy elements (6.61%; Fig. 4e).

Gene annotation

Genes in the S. suberectus genome were annotated using multiple methods, including homology-based predictions, de novo predictions and transcriptome-based predictions (Fig. 3). For de novo predictions, Augustus²⁹, GENSCAN³⁰, GlimmerHMM³¹, geneid³² and SNAP³³ analysis were performed on the repeat-masked genome, with parameters trained from Arabidopsis thaliana. Predicted protein sequences from Nelumbo nucifera (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/plant/Nelumbo_nucifera/latest_assembly_versions/GCF_000365185.1_Chinese_Lotus_1.1 version 1.1), Arabidopsis thaliana (ftp://ftp.ensemblgenomes.org/pub/plants/release-32/gff3/arabidopsis_thaliana/, version 10.32), Glycine max (ftp://ftp.ensemblgenomes.org/pub/plants/release-32/fasta/glycine_max/dna/, version 1.0), Petunia axillaris (ftp://ftp.solgenomics.net/genomes/Petunia_axillaris/, version 1.6.2), Solanum lycopersicum (ftp://ftp.ensemblgenomes.org/pub/plants/release-32/fasta/solanum_lycopersicum/, release-32), and Oryza sativa (ftp://ftp.ensemblgenomes.org/pub/plants/release-32/fasta/oryza_sativa/, version 1.0) were used for homology-based predictions. First, query sequences were subjected to tblastn analysis with an Expect (E)-value cutoff of 1e-5. BLAST hits corresponding to reference proteins were concatenated by Solar software, and low-quality records were removed. The genomic sequence of each reference protein was extended upstream and downstream by 2,000 bp to represent a protein-coding region. Gene structures contained in each protein region were predicted using GeneWise software³⁴. For transcriptome-based predictions, RNA from five organs (root, petiole, leaves, flowers, and stems) was isolated and RNA-seq data were used for gene annotation, processed by TopHat and Cufflinks³⁵. RNA-seq data were also assembled by Trinity³⁶. PASA³⁷ software (http://pasapipeline.github.io/) was then used to generate a full transcriptome-based genome annotation. The homology, de novo, and transcriptomic gene sets were merged to form a comprehensive and non-redundant reference gene set using EVidenceModeler³⁸ software. Next, PASA³⁷ was used to generate UTRs as suggested by the RNA-seq data.

Our analysis indicates that 31,634 protein-coding genes with an average transcript size of 1,097.55 bp were predicted in the genome (Fig. 4a).

Functional annotation of the protein-coding genes was carried out using blastp (E-value cut-off 1e-05) against SwissProt³⁹ and NR databases. Protein domains were annotated by searching against InterPro⁴⁰ and Pfam database⁴¹, using InterProScan and HMMER (http://hmmer.janelia.org), respectively. The GO terms for genes were obtained from the corresponding InterPro or Pfam entry. The pathways in which the genes might be involved were assigned by BLAST against the KEGG database⁴² with the E-value cut-off of 1e-05.

Overall, 79% (24,976), 70.8% (22,394), and 82.5% (26,082) of genes showed enrichment in InterPro, KEGG, and GO respectively. In total, 93.9% (29,688) of protein-coding genes were successfully annotated for conserved functional motifs or functional terms.

Non-coding RNA annotation

Annotation of tRNA was performed using tRNAscan-SE⁴³ software with default parameters. rRNA annotation was based on homology with rRNAs from several diverse higher plant species (not shown), using blastn with ‘E-value = 1e-5’. miRNA and snRNA genes were predicted by INFERNAL software⁴⁴ using the Rfam database⁴⁵.

The final results included 820 miRNA, 672 tRNA, 261 rRNA, and 550 snRNA with average lengths of 117.33, 75.32, 305.41 and 115.50 bp respectively.

Data Records

This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession QUWT00000000⁴⁶. The version described in this paper is version QUWT01000000. Raw read files are available at NCBI Sequence Read Archive⁴⁷. All the annotation tables containing results of an analysis of the draft genome are available at figshare⁴⁸.

Technical Validation

Evaluation of the completeness of the S. suberectus genome assembly

To estimate the quality of genome assembly, short reads were mapped back to the consensus genome using BWA⁴⁹ and an overall 97.29% mapping rate was found, suggesting that our assembly results contained comprehensive genomic information. Gene region completeness was evaluated by RNA-Seq data (Table S1): of the 53,538 transcripts assembled by Trinity³⁶, 99.62% could be mapped to our genome assembly, and 95.94% were considered as complete (more than 90% of the transcript could be aligned to one continuous scaffold).

The completeness of gene regions was further assessed using CEGMA (conserved core eukaryotic gene mapping approach)⁵⁰: 240 of 248 (96.77%) conserved core eukaryotic genes from CEGMA were captured in our assembly, and 206 (83.06%) of these were complete (Table S2). Furthermore, we performed BUSCO (Benchmarking Universal Single-Copy)⁵¹ analysis based on a benchmark of 956 conserved plant genes, of which 96% had complete gene coverage (including 18% duplicated ones), 1% were fragmented and only 2.6% were missing (Table S3). These data largely support a high quality S. suberectus genome assembly, which can be used for further investigation.

Supplementary Information

ISA-Tab metadata file

Download metadata file^{(3.2KB, zip)}

Supplementary information

Supplementary Tables^{(73.4KB, docx)}

Acknowledgements

We thank Huizhen Lv for offering photos of S. suberectus. This study was supported by the Guangxi science and technology research project (AB16450012), the National Natural Science Foundation of China (81503179, 81473309), the National Public Welfare Special Project of China “Quality Guarantee system of Chinese herbal medicines” (201507002), the China Agriculture Research System (CARS-21), the Guangxi science and technology research project (AA18242040).

Author Contributions

J.H.M. and Z.Y.Z. designed the project. S.S.Q. and L.Q.W. analyzed data and wrote the paper. S.S.Q., K.H.W. and Y.L. performed experiments. S.S.Q., Z.J.S., X.L.Z., S.W. and Q.H.W. contributed samples, materials, or data. M.J.L., K.J.Z., Y.Y.H. and S.Y.W. helped with the data analysis and examined the results.

Code Availability

The execution of this work involved many software tools, whose versions, settings and parameters are described below.

(1) SOAPdenovo: version 3.0, default parameters; (2) FALCON: version 3.1, length_cutoff_pr = 5000, max_diff = 120, max_cov = 130; (3) HiCUP: version 0.5.10, (4) HiRise: Dovetail Genomics LLC, Santa Cruz, CA, USA; (5) BWA: version 0.7.8, default parameters; (6) Tandem Repeat Finder: version 409, default parameters; (7) RepeatMasker: version 4.0.5, default parameters; (8) Repbase: version 15.02; (9) RepeatModeler: version 1.0.11, default parameters; (10) RepeatScout: version 1.0.5, default parameters; (11) PILER: version 1.06, default parameters; (12) LTR_FINDER: version 1.0.7, default parameters; (13) Augustus: version 3.0.2, default parameters; (14) GENSCAN: version 1.0, default parameters; (15) geneid: version 1.4, default parameters; (16) GlimmerHMM: version 3.0.2, default parameters; (17) SNAP: version 11-29-2013; (18) BLAST: version 2.2.26, default parameters; (19) GeneWise: version 2.2.0, default parameters; (20) TopHat: version 2.0.8, default parameters; (21) Cufflinks: version 2.1.1, default parameters; (22) Trinity: version 2.4.0, default parameters; (23) PASA: version 2.3.3, default parameters; (24) EVidenceModeler: version 1.1.1, default parameters; (25) InterPro: version 5.16, default parameters; (26) Pfam database: version 03-30-2016; (27) InterProScan: version 4.8, default parameters; (28) NR database: version 08-10-2015; (29) KEGG database: version 08-31-2015; (30) SwissProt database: version 05-24-2016; (31) HMMER: version 3.1b1, default parameters; (32) tRNAscan-SE: version 1.3.1, default parameters; (33) BUSCO: version 3.0.2, Embryophyta Version odb9; (34) CEGMA: version 2.5.

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Shuangshuang Qin and Lingqing Wu.

Contributor Information

Jianhua Miao, Email: mjh1962@vip.163.com.

Zhongyi Zhang, Email: zyzhang@fafu.edu.cn.

ISA-Tab metadata

is available for this paper at 10.1038/s41597-019-0110-x.

Supplementary information

is available for this paper at 10.1038/s41597-019-0110-x.

References

1.Lee MH, Lin YP, Hsu FL, Zhan GR, Yen KY. Bioactive constituents of Spatholobus suberectus in regulating tyrosinase-related proteins and mRNA in HEMn cells. Phytochemistry. 2006;67:1262–1270. doi: 10.1016/j.phytochem.2006.05.008. [DOI] [PubMed] [Google Scholar]
2.Wu, Z. Y., Raven, P. H. & Hong, D. Y. Flora of China. (Beijing: Science Press & St. Louis: Missouri Botanical Garden Press, 2010).
3.Cui YJ, Liu P, Chen RY. Studies on the active constituents in vine stem of Spatholobus suberectus. Chin. J. Chin. Mater. Med. 2005;30:121–123. [PubMed] [Google Scholar]
4.Fu YF, et al. Immunomodulatory and antioxidant effects of total flavonoids of Spatholobus suberectus Dunn on PCV2 infected mice. Sci. Rep. 2017;7:8676. doi: 10.1038/s41598-017-09340-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Chen SR, et al. In Vitro Study on Anti-Hepatitis C Virus Activity of Spatholobus suberectus Dunn. Molecules. 2016;21:1367. doi: 10.3390/molecules21101367. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Cho H, et al. Spatholobus suberectus Dunn. constituents inhibit sortase A and Staphylococcus aureus cell clumping to fibrinogen. Arch. Pharm. Res. 2017;40:518–523. doi: 10.1007/s12272-016-0884-8. [DOI] [PubMed] [Google Scholar]
7.Peng F, Meng C, Zhou Q, Chen J, Xiong L. Cytotoxic Evaluation against Breast Cancer Cells of Isoliquiritigenin Analogues from Spatholobus suberectus and Their Synthetic Derivatives. J. Nat. Prod. 2016;79:248–251. doi: 10.1021/acs.jnatprod.5b00774. [DOI] [PubMed] [Google Scholar]
8.Lee BJ, et al. Antiplatelet effects of Spatholobus suberectus via inhibition of the glycoprotein IIb/IIIa receptor. J. Ethnopharmacol. 2011;134:460–467. doi: 10.1016/j.jep.2010.12.039. [DOI] [PubMed] [Google Scholar]
9.Sato S, et al. Genome Structure of the Legume, Lotus japonicus. DNA Res. 2008;15:227–239. doi: 10.1093/dnares/dsn008. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Schmutz J, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463:178–183. doi: 10.1038/nature08670. [DOI] [PubMed] [Google Scholar]
11.Young ND, et al. The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature. 2011;480:520–524. doi: 10.1038/nature10625. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Mochida K, et al. Draft genome assembly and annotation of Glycyrrhiza uralensis, a medicinal legume. Plant J. 2017;89:181–194. doi: 10.1111/tpj.13385. [DOI] [PubMed] [Google Scholar]
13.Gupta S, et al. Draft genome sequence of Cicer reticulatum L., the wild progenitor of chickpea provides a resource for agronomic trait improvement. DNA Res. 2016;24:10. doi: 10.1093/dnares/dsw042. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Varshney RK, et al. Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat. Biotechnol. 2012;30:83–89. doi: 10.1038/nbt.2022. [DOI] [PubMed] [Google Scholar]
15.Weisenfeld NI, Kumar V, Shah P, Church DM, Jaffe DB. Direct determination of diploid genome sequences. Genome Res. 2017;27:757–767. doi: 10.1101/gr.214874.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Wingett S, et al. HiCUP: pipeline for mapping and processing Hi-C. data. F1000Res. 2015;4:1310. doi: 10.12688/f1000research.7334.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Luo R, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. 2012;1:18–18. doi: 10.1186/2047-217X-1-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Chin C, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Walker Bruce J., Abeel Thomas, Shea Terrance, Priest Margaret, Abouelliel Amr, Sakthikumar Sharadha, Cuomo Christina A., Zeng Qiandong, Wortman Jennifer, Young Sarah K., Earl Ashlee M. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE. 2014;9(11):e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at, https://arxiv.org/abs/1303.3997 (2013).
21.Putnam NH, et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 2016;26:342–350. doi: 10.1101/gr.193474.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinf. 2009;4:Unit 4.10. doi: 10.1002/0471250953.bi0410s25. [DOI] [PubMed] [Google Scholar]
24.Jurka J, et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 2005;110:462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]
25.Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21(Suppl 1):i351–358. doi: 10.1093/bioinformatics/bti1018. [DOI] [PubMed] [Google Scholar]
26.Edgar RC, Myers EW. PILER: identification and classification of genomic repeats. Bioinformatics. 2005;21(Suppl 1):i152–158. doi: 10.1093/bioinformatics/bti1003. [DOI] [PubMed] [Google Scholar]
27.Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35:W265–268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Varshney RK, et al. Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement. Nat. Biotechnol. 2013;31:240–246. doi: 10.1038/nbt.2491. [DOI] [PubMed] [Google Scholar]
29.Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004;32:W309–312. doi: 10.1093/nar/gkh379. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. [DOI] [PubMed] [Google Scholar]
31.Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20:2878–2879. doi: 10.1093/bioinformatics/bth315. [DOI] [PubMed] [Google Scholar]
32.Blanco E, Parra G, Guigó R. Using geneid to identify genes. Curr. Protoc. Bioinf. 2007;18:Unit 4.3. doi: 10.1002/0471250953.bi0403s18. [DOI] [PubMed] [Google Scholar]
33.Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5:59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Birney E, Durbin R. Using GeneWise in the Drosophila annotation experiment. Genome Res. 2000;10:547–548. doi: 10.1101/gr.10.4.547. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 2012;7:562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Haas BJ, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Haas BJ, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–5666. doi: 10.1093/nar/gkg770. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9:R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Bairoch AM, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000;28:45–48. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Mulder N, Apweiler R. InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol. Biol. 2007;396:59–70. doi: 10.1007/978-1-59745-515-2_5. [DOI] [PubMed] [Google Scholar]
41.Punta M, et al. The Pfam protein families database. Nucleic Acids Res. 2004;28:263–266. doi: 10.1093/nar/28.1.263. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25:1335–1337. doi: 10.1093/bioinformatics/btp157. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–2935. doi: 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Qin SS. 2019. Draft genome of Spatholobus suberectus. GenBank. QUWT00000000
47.2019. NCBI Sequence Read Archive. SRP157950
48.Qin SS. 2019. Draft genome of Spatholobus suberectus. figshare. [DOI]
49.Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. (Oxford University Press, 2009). [DOI] [PMC free article] [PubMed]
50.Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23:1061–1067. doi: 10.1093/bioinformatics/btm071. [DOI] [PubMed] [Google Scholar]
51.Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Qin SS. 2019. Draft genome of Spatholobus suberectus. GenBank. QUWT00000000
2019. NCBI Sequence Read Archive. SRP157950
Qin SS. 2019. Draft genome of Spatholobus suberectus. figshare. [DOI]

Supplementary Materials

Download metadata file^{(3.2KB, zip)}

Supplementary Tables^{(73.4KB, docx)}

Data Availability Statement

The execution of this work involved many software tools, whose versions, settings and parameters are described below.

[CR1] 1.Lee MH, Lin YP, Hsu FL, Zhan GR, Yen KY. Bioactive constituents of Spatholobus suberectus in regulating tyrosinase-related proteins and mRNA in HEMn cells. Phytochemistry. 2006;67:1262–1270. doi: 10.1016/j.phytochem.2006.05.008. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Wu, Z. Y., Raven, P. H. & Hong, D. Y. Flora of China. (Beijing: Science Press & St. Louis: Missouri Botanical Garden Press, 2010).

[CR3] 3.Cui YJ, Liu P, Chen RY. Studies on the active constituents in vine stem of Spatholobus suberectus. Chin. J. Chin. Mater. Med. 2005;30:121–123. [PubMed] [Google Scholar]

[CR4] 4.Fu YF, et al. Immunomodulatory and antioxidant effects of total flavonoids of Spatholobus suberectus Dunn on PCV2 infected mice. Sci. Rep. 2017;7:8676. doi: 10.1038/s41598-017-09340-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Chen SR, et al. In Vitro Study on Anti-Hepatitis C Virus Activity of Spatholobus suberectus Dunn. Molecules. 2016;21:1367. doi: 10.3390/molecules21101367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Cho H, et al. Spatholobus suberectus Dunn. constituents inhibit sortase A and Staphylococcus aureus cell clumping to fibrinogen. Arch. Pharm. Res. 2017;40:518–523. doi: 10.1007/s12272-016-0884-8. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Peng F, Meng C, Zhou Q, Chen J, Xiong L. Cytotoxic Evaluation against Breast Cancer Cells of Isoliquiritigenin Analogues from Spatholobus suberectus and Their Synthetic Derivatives. J. Nat. Prod. 2016;79:248–251. doi: 10.1021/acs.jnatprod.5b00774. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Lee BJ, et al. Antiplatelet effects of Spatholobus suberectus via inhibition of the glycoprotein IIb/IIIa receptor. J. Ethnopharmacol. 2011;134:460–467. doi: 10.1016/j.jep.2010.12.039. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Sato S, et al. Genome Structure of the Legume, Lotus japonicus. DNA Res. 2008;15:227–239. doi: 10.1093/dnares/dsn008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Schmutz J, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463:178–183. doi: 10.1038/nature08670. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Young ND, et al. The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature. 2011;480:520–524. doi: 10.1038/nature10625. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Mochida K, et al. Draft genome assembly and annotation of Glycyrrhiza uralensis, a medicinal legume. Plant J. 2017;89:181–194. doi: 10.1111/tpj.13385. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Gupta S, et al. Draft genome sequence of Cicer reticulatum L., the wild progenitor of chickpea provides a resource for agronomic trait improvement. DNA Res. 2016;24:10. doi: 10.1093/dnares/dsw042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Varshney RK, et al. Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat. Biotechnol. 2012;30:83–89. doi: 10.1038/nbt.2022. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Weisenfeld NI, Kumar V, Shah P, Church DM, Jaffe DB. Direct determination of diploid genome sequences. Genome Res. 2017;27:757–767. doi: 10.1101/gr.214874.116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Wingett S, et al. HiCUP: pipeline for mapping and processing Hi-C. data. F1000Res. 2015;4:1310. doi: 10.12688/f1000research.7334.1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Luo R, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. 2012;1:18–18. doi: 10.1186/2047-217X-1-18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Chin C, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Walker Bruce J., Abeel Thomas, Shea Terrance, Priest Margaret, Abouelliel Amr, Sakthikumar Sharadha, Cuomo Christina A., Zeng Qiandong, Wortman Jennifer, Young Sarah K., Earl Ashlee M. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE. 2014;9(11):e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at, https://arxiv.org/abs/1303.3997 (2013).

[CR21] 21.Putnam NH, et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 2016;26:342–350. doi: 10.1101/gr.193474.115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinf. 2009;4:Unit 4.10. doi: 10.1002/0471250953.bi0410s25. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Jurka J, et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 2005;110:462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21(Suppl 1):i351–358. doi: 10.1093/bioinformatics/bti1018. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Edgar RC, Myers EW. PILER: identification and classification of genomic repeats. Bioinformatics. 2005;21(Suppl 1):i152–158. doi: 10.1093/bioinformatics/bti1003. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35:W265–268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Varshney RK, et al. Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement. Nat. Biotechnol. 2013;31:240–246. doi: 10.1038/nbt.2491. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004;32:W309–312. doi: 10.1093/nar/gkh379. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20:2878–2879. doi: 10.1093/bioinformatics/bth315. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Blanco E, Parra G, Guigó R. Using geneid to identify genes. Curr. Protoc. Bioinf. 2007;18:Unit 4.3. doi: 10.1002/0471250953.bi0403s18. [DOI] [PubMed] [Google Scholar]

[CR33] 33.Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5:59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Birney E, Durbin R. Using GeneWise in the Drosophila annotation experiment. Genome Res. 2000;10:547–548. doi: 10.1101/gr.10.4.547. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 2012;7:562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Haas BJ, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Haas BJ, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–5666. doi: 10.1093/nar/gkg770. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9:R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Bairoch AM, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000;28:45–48. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Mulder N, Apweiler R. InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol. Biol. 2007;396:59–70. doi: 10.1007/978-1-59745-515-2_5. [DOI] [PubMed] [Google Scholar]

[CR41] 41.Punta M, et al. The Pfam protein families database. Nucleic Acids Res. 2004;28:263–266. doi: 10.1093/nar/28.1.263. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25:1335–1337. doi: 10.1093/bioinformatics/btp157. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–2935. doi: 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Qin SS. 2019. Draft genome of Spatholobus suberectus. GenBank. QUWT00000000

[CR47] 47.2019. NCBI Sequence Read Archive. SRP157950

[CR48] 48.Qin SS. 2019. Draft genome of Spatholobus suberectus. figshare. [DOI]

[CR49] 49.Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. (Oxford University Press, 2009). [DOI] [PMC free article] [PubMed]

[CR50] 50.Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23:1061–1067. doi: 10.1093/bioinformatics/btm071. [DOI] [PubMed] [Google Scholar]

[CR51] 51.Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

PERMALINK

A draft genome for Spatholobus suberectus

Shuangshuang Qin

Lingqing Wu

Kunhua Wei

Ying Liang

Zhijun Song

Xiaolei Zhou

Shuo Wang

Mingjie Li

Qinghua Wu

Kaijian Zhang

Yuanyuan Hui

Shuying Wang

Jianhua Miao

Zhongyi Zhang

Abstract

Background & Summary

Fig. 1.

Methods

Plant materials and DNA extraction

Library construction and sequencing

Table 1.

Table 2.

Table 3.

Estimation of the S. suberectus genome size

Fig. 2.

Genome assembly

Table 4.

Table 5.

Fig. 3.

Indentification of repetitive elements in S. suberectus

Fig. 4.

Gene annotation

Non-coding RNA annotation

Data Records

Technical Validation

Evaluation of the completeness of the S. suberectus genome assembly

Supplementary Information

ISA-Tab metadata file

Supplementary information

Acknowledgements

Author Contributions

Code Availability

Competing Interests

Footnotes

Contributor Information

ISA-Tab metadata

Supplementary information

References

Associated Data

Data Citations

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases