Nelumbo genome database, an integrative resource for gene expression and variants of Nelumbo nucifera

Hui Li; Xingyu Yang; Yue Zhang; Zhiyan Gao; Yuting Liang; Jinming Chen; Tao Shi

doi:10.1038/s41597-021-00828-8

. 2021 Jan 29;8:38. doi: 10.1038/s41597-021-00828-8

Nelumbo genome database, an integrative resource for gene expression and variants of Nelumbo nucifera

Hui Li ^1,^2,^3,^#, Xingyu Yang ^4,^#, Yue Zhang ^1,^2,³, Zhiyan Gao ^1,^2,³, Yuting Liang ⁴, Jinming Chen ^1,^2,^✉, Tao Shi ^1,^2,^✉

PMCID: PMC7846841 PMID: 33514746

Abstract

Sacred lotus (Nelumbo nucifera, or lotus) is one of the most widely grown aquatic plant species with important uses, such as in water gardening and in vegetable and herbal medicine. A public genomic database of lotus would facilitate studies of lotus and other aquatic plant species. Here, we constructed an integrative database: the Nelumbo Genome Database (NGD, http://nelumbo.biocloud.net). This database is a collection of the most updated lotus genome assembly and contains information on both gene expression in different tissues and coexpression networks. In the NGD, we also integrated genetic variants and key traits from our 62 newly sequenced lotus cultivars and 26 previously reported cultivars, which are valuable for lotus germplasm studies. As applications including BLAST, BLAT, Primer, Annotation Search, Variant and Trait Search are deployed, users can perform sequence analyses and gene searches via the NGD. Overall, the valuable genomic resources provided in the NGD will facilitate future studies on population genetics and molecular breeding of lotus.

Subject terms: Plant molecular biology, Agricultural genetics

Measurement(s)	reference genome data • whole genome sequencing • transcriptome
Technology Type(s)	Hi-C • PacBio Sequel System • Illumina sequencing • RNA sequencing • DNA sequencing
Sample Characteristic - Organism	Nelumbo nucifera

Open in a new tab

Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.13487271

Background & Summary

Sacred lotus (Nelumbo nucifera, or lotus) is an early-diverging eudicot with important value in terms of understanding the origin and evolution of eudicots^1,2. Lotus has a widespread native distribution, ranging from Asia to northern Australia, and it is one of the most economically important aquatic plant species, with widespread uses, such as in water gardening and in vegetable and herbal medicine^3–5. In horticulture, lotus is classified into three cultivated types according to their utilization: seed lotus, flower lotus and rhizome lotus. The genome of lotus (2n = 16, assembly size = 821.2 Mb) has been sequenced and assembled, providing an unprecedented opportunity for genetic studies and molecular breeding of lotus^6–8.

Since the first draft genome assembly of the lotus variety China Antique was released⁹, many genomic studies have been carried out on lotus, such as whole-genome resequencing^7,10,11, transcriptomic^12–14, miRNA-based^15–17 and gene family studies^18,19. The recent chromosome-level assembly of the China Antique genome facilitates genome-wide studies of functional genes and the evolution of lotus, but a web-based public database of lotus is still unavailable⁸. Due to public demand for an integrative genomic resource of lotus, we report our Nelumbo Genome Database (NGD, nelumbo.biocloud.net), which comprehensively houses, processes and integrates the newest assembly of lotus variety China Antique, the expression profiles of various tissues, genetic variants and phenotypes of 88 key lotus cultivars (Fig. 1).

Fig. 1 — A schematic of the data collection and utilities for the Nelumbo Genome Database (NGD).

Major datasets

The NGD is a collection of the new lotus assembly and annotations⁸, of which 34,481 genes harbor complete ORFs (> = 30 aa). A total of 28,676 genes were defined as high-confidence genes, with 14,991, 20,878, 15,276, 5,924, 29,095, 20,325, and 28,310 annotated genes in the KOG, PFAM, GO, KEGG, Nr, SwissProt and TrEMBL databases, respectively (Table 1). The NGD also houses the sequences of 150,589 unique transcript isoforms based on RNA-seq and PacBio SMRT methods from our previous studies^14,17. It also contains the sequences of 1,517 lotus transcription factors (TFs), which are classified into 56 TF (sub)families. Furthermore, sequence data of lotus gene families, transposable elements (TEs) and other repeats are also present in the NGD, and information concerning gene expression levels and highly coexpressed genes from data from a coexpression (WGCNA) network based on 69 RNA-seq samples from 11 lotus tissues (seed coat, cotyledon, receptacle, carpel, stamen, petal, rhizome, leaf, root, petiole and apical bud tissues) is present in the NGD. Furthermore, information concerning a total of 26,939,834 high-quality SNPs (single nucleotide polymorphisms), 4,177,974 InDels and key horticultural traits from 88 lotus cultivars is present in the NGD.

Table 1.

Summary of functional annotations of lotus genes in different databases.

Database	Number of annotated genes
GO	15276
KEGG	5924
KOG	14991
Pfam	20878
Swiss Prot	20325
TrEMBL	28310
Nr	29095

Open in a new tab

Uses

Through data collection and downstream processing, our platform provides the most complete lotus genome assembly for browsing via GBrowse or JBrowse^20,21. Genes, DNA sequences, amino acids, SNPs and InDels can be viewed via GBrowse. The gene information page includes gene-splicing structures, sequences, and functional annotations such as those from the PFAM, KEGG, GO, KOG, SwissProt, TrEMBL and Nr databases. RNA-seq-based expression profiles across different tissues are also retrievable and can be visualized via a heatmap. Searching genes by keywords is also possible in the NGD. Additionally, coexpressed genes of a query gene in the WGCNA-derived network can be retrieved by setting a weight threshold; these coexpressed genes are likely involved in the same biological process as the query gene. Coding sequences and genomic sequences can be searched based on sequence similarity via BLAST or BLAT. Primers for the PCR experiments can also be designed directly in the NGD.

Methods

Data processing

Gene predictions on our chromosome-level genome assembly were performed using transcriptomes, gene homology and ab initio identification. A list of publicly available RNA-seq datasets, which mainly contain samples of the China Antique variety, were downloaded from the NCBI SRA database (Online-only Table 1). First, the corrected consensus PacBio full-length transcripts were mapped to the lotus reference genome using GMAP¹⁴. RNA-seq reads (Illumina) were then mapped to the genome using the HISAT2-StringTie pipeline²². All the transcripts were further merged using TACO²³. Coding DNA sequences (CDSs) of transcripts were predicted using Transdecoder (https://github.com/TransDecoder). Second, homology-based gene prediction was performed using GeMoMa, which used genome sequences and gene coordinates from Arabidopsis thaliana²⁴, Carica papaya²⁵, Vitis vinifera²⁶, Macadamia ternifolia (Proteales)²⁷ and Brachypodium distachyon²⁸ as inputs. Third, ab initio prediction was performed using Braker2, which used transcript coordinates of RNA-seq as a guide²⁹. Finally, all predictions were merged, and for each gene with more than one gene model (transcripts), the longest one was chosen as the representative gene model. Genes with an ORF less than 30 aa were discarded. Further, high-confidence gene sets were defined as those whose genes that either were homologous to those in other plant species in Plant Plaza 4.0 (https://bioinformatics.psb.ugent.be/plaza/versions/plaza_v4_dicots/) or were supported by RNA-seq (FPKM > 0.1). To quantify the expression of each gene in different lotus tissue samples, FPKM values across different RNA-seq samples were obtained via StringTie²². A coexpression network of different genes based on the expression profile was constructed using the WGCNA (v1.0) package in R³⁰. Specifically, genes with an average FPKM > 0.1 and a coefficient of variation (CV) of gene expression (FPKM) > 2 were retained for the WGCNA. Genes were clustered hierarchically based on Topological Overlap Matrix³¹ and were assigned to nine modules (minimum module size of 600 and minimum module similarity of 0.5). The weight values between genes were used to represent the connectivity between genes.

Online-only Table 1.

Information and source of genomic data used in Nelumbo Genome Database.

Datatype	Platform	Cultivar Name	Utility	Tissue/developmental stage	Accession number (NCBI SRA)
DNA	PacBio SMRT	China Antique	Genome assembly	Young leaf	SRR7549129
DNA	PacBio SMRT	China Antique	Genome assembly	Young leaf	SRR7549130
DNA	Illumina & Hi-C	China Antique	Genome assembly	Young leaf	SRR7615553
DNA	Illumina & Hi-C	China Antique	Genome assembly	Young leaf	SRR7631523
RNA	PacBio SMRT	China Antique	Gene & isoform prediction	mixed tissue samples	SRR8182148
RNA	Illumina	China Antique	Gene expression	Seed-coat-DAP6	SRR7880135
RNA	Illumina	China Antique	Gene expression	Seed-coat-DAP6	SRR7880135
RNA	Illumina	China Antique	Gene expression	Seed-coat-DAP6	SRR7880135
RNA	Illumina	China Antique	Gene expression	Seed-coat-DAP12	SRR7880134
RNA	Illumina	China Antique	Gene expression	Seed-coat-DAP12	SRR7880134
RNA	Illumina	China Antique	Gene expression	Seed-coat-DAP12	SRR7880134
RNA	Illumina	China Antique	Gene expression	Seed-coat-DAP18	SRR7880133
RNA	Illumina	China Antique	Gene expression	Seed-coat-DAP18	SRR7880133
RNA	Illumina	China Antique	Gene expression	Seed-coat-DAP18	SRR7880133
RNA	Illumina	China Antique	Gene expression	Petal	SRR8169126
RNA	Illumina	China Antique	Gene expression	Petal	SRR8169126
RNA	Illumina	China Antique	Gene expression	Immature-stamen	SRR8169123
RNA	Illumina	China Antique	Gene expression	Immature-stamen	SRR8169123
RNA	Illumina	China Antique	Gene expression	unpollinated-carpel	SRR8169124
RNA	Illumina	China Antique	Gene expression	unpollinated-carpel	SRR8169124
RNA	Illumina	China Antique	Gene expression	Immature-receptacle	SRR8169125
RNA	Illumina	China Antique	Gene expression	Immature-receptacle	SRR8169125
RNA	Illumina	China Antique	Gene expression	Mature-stamen	SRR8169127
RNA	Illumina	China Antique	Gene expression	Mature-stamen	SRR8169127
RNA	Illumina	China Antique	Gene expression	Pollinated-carpel	SRR8169128
RNA	Illumina	China Antique	Gene expression	Pollinated-carpel	SRR8169128
RNA	Illumina	China Antique	Gene expression	Mature-receptacle	SRR8169129
RNA	Illumina	China Antique	Gene expression	Mature-receptacle	SRR8169129
RNA	Illumina	China Antique	Gene expression	Cotyledon-DAP9	SRR6432938
RNA	Illumina	China Antique	Gene expression	Cotyledon-DAP9	SRR6432938
RNA	Illumina	China Antique	Gene expression	Cotyledon-DAP9	SRR6432938
RNA	Illumina	China Antique	Gene expression	Cotyledon-DAP12	SRR6432941
RNA	Illumina	China Antique	Gene expression	Cotyledon-DAP12	SRR6432941
RNA	Illumina	China Antique	Gene expression	Cotyledon-DAP12	SRR6432941
RNA	Illumina	China Antique	Gene expression	Cotyledon-DAP15	SRR6432942
RNA	Illumina	China Antique	Gene expression	Cotyledon-DAP15	SRR6432942
RNA	Illumina	China Antique	Gene expression	Cotyledon-DAP15	SRR6432942
RNA	Illumina	China Antique	Gene expression	lotus root	SRR826691
RNA	Illumina	China Antique	Gene expression	Leaf	SRR830399
RNA	Illumina	China Antique	Gene expression	Leaf	SRR831069
RNA	Illumina	China Antique	Gene expression	Leaf	SRR831070
RNA	Illumina	China Antique	Gene expression	Leaf	SRR831071
RNA	Illumina	China Antique	Gene expression	Leaf	SRR831072
RNA	Illumina	China Antique	Gene expression	rhizome Internode	SRR831082
RNA	Illumina	China Antique	Gene expression	rhizome Internode	SRR831083
RNA	Illumina	China Antique	Gene expression	rhizome Internode	SRR831084
RNA	Illumina	China Antique	Gene expression	rhizome Internode	SRR831086
RNA	Illumina	China Antique	Gene expression	rhizome Internode	SRR831087
RNA	Illumina	China Antique	Gene expression	Petiole	SRR831088
RNA	Illumina	China Antique	Gene expression	Petiole	SRR831089
RNA	Illumina	China Antique	Gene expression	Petiole	SRR831090
RNA	Illumina	China Antique	Gene expression	Petiole	SRR831167
RNA	Illumina	China Antique	Gene expression	Petiole	SRR831168
RNA	Illumina	China Antique	Gene expression	Root	SRR831169
RNA	Illumina	China Antique	Gene expression	Root	SRR831170
RNA	Illumina	China Antique	Gene expression	Root	SRR831171
RNA	Illumina	China Antique	Gene expression	Root	SRR831172
RNA	Illumina	China Antique	Gene expression	Root	SRR831173
RNA	Illumina	China Antique	Gene expression	rhizome apical meristem	SRR831190
RNA	Illumina	China Antique	Gene expression	rhizome elongation Zone	SRR831693
RNA	Illumina	cultivated rhizome lotus (CRL) plants	Gene expression	Apical buds	SRR879365
RNA	Illumina	wild flower lotus (WFL) plants	Gene expression	Apical buds	SRR879367
RNA	Illumina	Zhouou	Gene expression	rhizome (stolon/internode)	SRR2052526
RNA	Illumina	Zhouou	Gene expression	rhizome (stolon/internode)	SRR2052539
RNA	Illumina	Zhouou	Gene expression	rhizome (later swelling stage/internode)	SRR2052541
RNA	Illumina	Fenhonglingxiao	Gene expression	rhizome (stolon/internode)	SRR2052538
RNA	Illumina	Fenhonglingxiao	Gene expression	rhizome (stolon/internode)	SRR2052542
RNA	Illumina	Zhouou	Gene expression	rhizome (later swelling stage/internode)	SRR2052551
RNA	Illumina	Zhouou	Gene expression	rhizome (middle swelling stage/internode)	SRR2052560
RNA	Illumina	Fenhonglingxiao	Gene expression	rhizome (middle swelling stage/internode)	SRR2052568
RNA	Illumina	Fenhonglingxiao	Gene expression	rhizome (middle swelling stage/internode)	SRR2052569
RNA	Illumina	Fenhonglingxiao	Gene expression	rhizome (later swelling stage/internode)	SRR2052570
RNA	Illumina	Zhouou	Gene expression	rhizome (later swelling stage/internode)	SRR2052571
RNA	Illumina	Fenhonglingxiao	Gene expression	rhizome (later swelling stage/internode)	SRR2052573
DNA	Illumina	shentong	Whole-genome resequencing	young leaf	SRR8325858
DNA	Illumina	fozuolian	Whole-genome resequencing	young leaf	SRR8325857
DNA	Illumina	yuloutai	Whole-genome resequencing	young leaf	SRR8325860
DNA	Illumina	jiaoyangzuiwu	Whole-genome resequencing	young leaf	SRR8325859
DNA	Illumina	xiaolianzuo	Whole-genome resequencing	young leaf	SRR8325862
DNA	Illumina	huoyan	Whole-genome resequencing	young leaf	SRR8325868
DNA	Illumina	yongshi	Whole-genome resequencing	young leaf	SRR8325864
DNA	Illumina	chongbanbayilian	Whole-genome resequencing	young leaf	SRR8325863
DNA	Illumina	ziyulian	Whole-genome resequencing	young leaf	SRR8325866
DNA	Illumina	hongwanlian	Whole-genome resequencing	young leaf	SRR8325865
DNA	Illumina	baixuegongzhu	Whole-genome resequencing	young leaf	SRR8325880
DNA	Illumina	baiwanwan	Whole-genome resequencing	young leaf	SRR8325879
DNA	Illumina	yudie	Whole-genome resequencing	young leaf	SRR8325878
DNA	Illumina	fenwanlian	Whole-genome resequencing	young leaf	SRR8325877
DNA	Illumina	taizhenchuyu	Whole-genome resequencing	young leaf	SRR8325884
DNA	Illumina	manaohong	Whole-genome resequencing	young leaf	SRR8325883
DNA	Illumina	luoxiayingxue	Whole-genome resequencing	young leaf	SRR8325882
DNA	Illumina	mudanxianzi	Whole-genome resequencing	young leaf	SRR8325881
DNA	Illumina	pinghuqiuyue	Whole-genome resequencing	young leaf	SRR8325872
DNA	Illumina	tianshanbitai	Whole-genome resequencing	young leaf	SRR8325871
DNA	Illumina	lihuabai	Whole-genome resequencing	young leaf	SRR8325836
DNA	Illumina	hongjuan	Whole-genome resequencing	young leaf	SRR8325837
DNA	Illumina	xinlingmei	Whole-genome resequencing	young leaf	SRR8325834
DNA	Illumina	shuangjie	Whole-genome resequencing	young leaf	SRR8325835
DNA	Illumina	fenghuazhengmao	Whole-genome resequencing	young leaf	SRR8325840
DNA	Illumina	juhong	Whole-genome resequencing	young leaf	SRR8325841
DNA	Illumina	cenglinjinran	Whole-genome resequencing	young leaf	SRR8325838
DNA	Illumina	fenmudan	Whole-genome resequencing	young leaf	SRR8325839
DNA	Illumina	chanjuan	Whole-genome resequencing	young leaf	SRR8325843
DNA	Illumina	cuitai	Whole-genome resequencing	young leaf	SRR8325844
DNA	Illumina	zixiabei	Whole-genome resequencing	young leaf	SRR8325870
DNA	Illumina	bixuedanxin	Whole-genome resequencing	young leaf	SRR8325869
DNA	Illumina	xiaofengliangyue	Whole-genome resequencing	young leaf	SRR8325853
DNA	Illumina	xiaotaihong	Whole-genome resequencing	young leaf	SRR8325847
DNA	Illumina	nanjinghong	Whole-genome resequencing	young leaf	SRR8325874
DNA	Illumina	shuimeiren	Whole-genome resequencing	young leaf	SRR8325873
DNA	Illumina	yubanbai	Whole-genome resequencing	young leaf	SRR8325876
DNA	Illumina	yuhua	Whole-genome resequencing	young leaf	SRR8325875
DNA	Illumina	liuhuo	Whole-genome resequencing	young leaf	SRR8325867
DNA	Illumina	xiaoxia	Whole-genome resequencing	young leaf	SRR8325861
DNA	Illumina	baiyun	Whole-genome resequencing	young leaf	SRR8325895
DNA	Illumina	baiqianye	Whole-genome resequencing	young leaf	SRR8325848
DNA	Illumina	jinzhuluoyupan	Whole-genome resequencing	young leaf	SRR8325849
DNA	Illumina	hanhong	Whole-genome resequencing	young leaf	SRR8325850
DNA	Illumina	fengximudan	Whole-genome resequencing	young leaf	SRR8325851
DNA	Illumina	suzhibingzi	Whole-genome resequencing	young leaf	SRR8325852
DNA	Illumina	chongbanyizhangqing	Whole-genome resequencing	young leaf	SRR8325842
DNA	Illumina	zhongshanhongtai	Whole-genome resequencing	young leaf	SRR8325854
DNA	Illumina	shennvzi	Whole-genome resequencing	young leaf	SRR8325855
DNA	Illumina	yanzhilu	Whole-genome resequencing	young leaf	SRR8325856
DNA	Illumina	xinjie	Whole-genome resequencing	young leaf	SRR8325892
DNA	Illumina	chaitoufeng	Whole-genome resequencing	young leaf	SRR8325891
DNA	Illumina	caiyunfeidu	Whole-genome resequencing	young leaf	SRR8325890
DNA	Illumina	lianxia	Whole-genome resequencing	young leaf	SRR8325889
DNA	Illumina	pizhenhong	Whole-genome resequencing	young leaf	SRR8325888
DNA	Illumina	haoyue	Whole-genome resequencing	young leaf	SRR8325887
DNA	Illumina	jiaoniang	Whole-genome resequencing	young leaf	SRR8325886
DNA	Illumina	danxiuqiu	Whole-genome resequencing	young leaf	SRR8325885
DNA	Illumina	hongdenggaozhao	Whole-genome resequencing	young leaf	SRR8325894
DNA	Illumina	fenxia	Whole-genome resequencing	young leaf	SRR8325893
DNA	Illumina	zhaoyun	Whole-genome resequencing	young leaf	SRR8325845
DNA	Illumina	baimeilian	Whole-genome resequencing	young leaf	SRR8325846
DNA	Illumina	hongtailian	Whole-genome resequencing	young leaf	SRR7159815
DNA	Illumina	shuguangxiu	Whole-genome resequencing	young leaf	SRR7159817
DNA	Illumina	donghuxinhong	Whole-genome resequencing	young leaf	SRR7159818
DNA	Illumina	xiangbihe	Whole-genome resequencing	young leaf	SRR7159820
DNA	Illumina	fengjuanhongqi	Whole-genome resequencing	young leaf	SRR7159821
DNA	Illumina	xingkongmudan	Whole-genome resequencing	young leaf	SRR7159822
DNA	Illumina	honghehuan	Whole-genome resequencing	young leaf	SRR7159825
DNA	Illumina	jinlinghuodu	Whole-genome resequencing	young leaf	SRR7159826
DNA	Illumina	xiaoqu	Whole-genome resequencing	young leaf	SRR7159827
DNA	Illumina	xiaojingling	Whole-genome resequencing	young leaf	SRR7159828
DNA	Illumina	zuidongfeng	Whole-genome resequencing	young leaf	SRR7159829
DNA	Illumina	hongyun	Whole-genome resequencing	young leaf	SRR7159830
DNA	Illumina	jinlingzhixing	Whole-genome resequencing	young leaf	SRR7159831
DNA	Illumina	danyue	Whole-genome resequencing	young leaf	SRR7159832
DNA	Illumina	qinhuaibaiyu	Whole-genome resequencing	young leaf	SRR7159833
DNA	Illumina	baige	Whole-genome resequencing	young leaf	SRR4308656
DNA	Illumina	baiyangdianhonglian	Whole-genome resequencing	young leaf	SRR4308659
DNA	Illumina	donghonghua	Whole-genome resequencing	young leaf	SRR4308651
DNA	Illumina	fenhonglingxiao	Whole-genome resequencing	young leaf	SRR4308650
DNA	Illumina	heilongjianghonglian	Whole-genome resequencing	young leaf	SRR4308657
DNA	Illumina	jianxuan17	Whole-genome resequencing	young leaf	SRR4308643
DNA	Illumina	puzheheibailian	Whole-genome resequencing	young leaf	SRR4308658
DNA	Illumina	qianbanlian	Whole-genome resequencing	young leaf	SRR4308655
DNA	Illumina	weishanhonglian	Whole-genome resequencing	young leaf	SRR4308645
DNA	Illumina	xihuhonglian	Whole-genome resequencing	young leaf	SRR4308660
DNA	Illumina	zhigaowushang	Whole-genome resequencing	young leaf	SRR4308649

Open in a new tab

Gene functions were annotated using the Gene Ontology (GO), KEGG, KOG, Pfam, SwissProt, TrEMBL, and Nr databases via KOBAS 3.0, BlastKOALA, PfamScan and BLAST^32,33. As protein domains are conserved units shared by related genes, we clustered genes into domain families (gene families) according to the HMM Pfam domain annotations³⁴. In addition, all transcription factors (TFs) were predicted and clustered into TF families via PlantTFDB 4.0³⁵.

There were 88 recorded lotus cultivars chosen in this study as representing various floral traits (color, shape, flowering time, etc.). Among these cultivars, 62 were sequenced in our current study, while the sequences of 26 with detailed phenotypic records were downloaded from the NCBI database (Online-only Table 2). Genomic DNA was first extracted from young leaves by the CTAB method³⁶, and then DNA libraries were constructed by cutting the DNA into 250~280 bp fragments using a NEBNext® Ultra DNA Library Prep Kit for Illumina (NEB, USA) following the manufacturer’s recommendations. Paired-end reads (PE150) were sequenced on an Illumina NovaSeq. 6000 (San Diego, CA, USA), which generated approximately 16 × -depth data for each cultivar sample. Clean reads were obtained by removing the adapters and low-quality reads, including those comprising > 10% N, with < 20% low-quality bases, with low-quality/ambiguous fragments at the read ends within a 5 bp window and with a quality < 20 via FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). The clean reads were mapped to the reference genome by BWA-men³⁷. Variants were subsequently called by pipeline via GATK 4.0 (Genome Analysis Toolkit) with further SNP hard-filtering parameters (“QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0” and InDel hard-filtering parameters of “QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0”)³⁸. The phenotypes and some images of these cultivars were collected from reference books^39,40; the phenotypes were further validated across two years of field investigations. Images of floral traits for these cultivars displayed in the NGD were taken mostly during flowering at the Wuhan Institute of Landscape Architecture (Wuhan, China).

Online-only Table 2.

Mapping rate and depth on reference genome for 88 lotus cultivars.

Sample ID	Sample Name	Mapping rate	Depth	NCBI SRR
63	shentong	99.01%	11.9566	SRR8325858
74	fozuolian	98.97%	12.4726	SRR8325857
75	yuloutai	98.89%	14.0595	SRR8325860
86	jiaoyangzuiwu	98.12%	13.0646	SRR8325859
102	xiaolianzuo	99.00%	13.4425	SRR8325862
107	huoyan	98.81%	12.3096	SRR8325868
116	yongshi	98.96%	11.8835	SRR8325864
146	chongbanbayilian	99.13%	11.7565	SRR8325863
160	ziyulian	99.07%	13.6243	SRR8325866
163	hongwanlian	98.68%	13.5703	SRR8325865
169	baixuegongzhu	99.07%	12.748	SRR8325880
218	baiwanwan	99.24%	16.2777	SRR8325879
221	yudie	98.71%	16.6949	SRR8325878
235	fenwanlian	99.32%	16.2768	SRR8325877
239	taizhenchuyu	99.28%	16.1965	SRR8325884
274	manaohong	99.38%	16.5347	SRR8325883
296	luoxiayingxue	99.40%	16.9159	SRR8325882
305	mudanxianzi	99.23%	16.5897	SRR8325881
317	pinghuqiuyue	99.40%	18.1952	SRR8325872
319	tianshanbitai	99.35%	17.3546	SRR8325871
322	lihuabai	99.30%	18.5732	SRR8325836
334	hongjuan	99.09%	17.2609	SRR8325837
338	xinlingmei	99.32%	16.7434	SRR8325834
365	shuangjie	99.06%	11.9673	SRR8325835
368	fenghuazhengmao	99.40%	18.83	SRR8325840
380	juhong	99.37%	17.0724	SRR8325841
381	cenglinjinran	99.29%	16.1654	SRR8325838
388	fenmudan	99.27%	15.0731	SRR8325839
419	chanjuan	99.34%	17.5767	SRR8325843
421	cuitai	99.36%	18.721	SRR8325844
440	zixiabei	99.29%	15.9078	SRR8325870
450	bixuedanxin	99.35%	17.1246	SRR8325869
451	xiaofengliangyue	99.41%	17.1499	SRR8325853
452	xiaotaihong	99.38%	17.3033	SRR8325847
453	nanjinghong	99.34%	16.9085	SRR8325874
461	shuimeiren	99.37%	16.9447	SRR8325873
467	yubanbai	99.30%	15.5537	SRR8325876
468	yuhua	99.35%	17.4529	SRR8325875
495	liuhuo	99.20%	15.6028	SRR8325867
566	xiaoxia	99.40%	17.5277	SRR8325861
591	baiyun	99.45%	16.3501	SRR8325895
600	baiqianye	99.32%	17.2098	SRR8325848
619	jinzhuluoyupan	99.43%	15.6951	SRR8325849
640	hanhong	97.50%	17.2591	SRR8325850
642	fengximudan	99.34%	17.7888	SRR8325851
643	suzhibingzi	99.48%	17.5635	SRR8325852
652	chongbanyizhangqing	99.50%	16.64	SRR8325842
653	zhongshanhongtai	99.40%	16.5431	SRR8325854
656	shennvzi	99.31%	16.3544	SRR8325855
674	yanzhilu	99.46%	17.481	SRR8325856
678	xinjie	97.81%	16.2709	SRR8325892
709	chaitoufeng	99.33%	15.3606	SRR8325891
770	caiyunfeidu	99.43%	16.9894	SRR8325890
782	lianxia	99.33%	18.4933	SRR8325889
807	pizhenhong	99.23%	16.1976	SRR8325888
832	haoyue	98.68%	16.1062	SRR8325887
846	jiaoniang	99.34%	16.6935	SRR8325886
847	danxiuqiu	99.37%	17.3691	SRR8325885
855	hongdenggaozhao	99.18%	15.4943	SRR8325894
896	fenxia	99.46%	15.3978	SRR8325893
908	zhaoyun	99.43%	16.3684	SRR8325845
911	baimeilian	99.45%	20.3636	SRR8325846
7159815	hongtailian	99.31%	11.5062	SRR7159815
7159817	shuguangxiu	99.18%	12.0447	SRR7159817
7159818	donghuxinhong	99.34%	11.7097	SRR7159818
7159820	xiangbihe	99.03%	11.582	SRR7159820
7159821	fengjuanhongqi	99.14%	11.1591	SRR7159821
7159822	xingkongmudan	98.93%	12.5915	SRR7159822
7159825	honghehuan	98.30%	11.5102	SRR7159825
7159826	jinlinghuodu	98.91%	14.5597	SRR7159826
7159827	xiaoqu	99.16%	12.7112	SRR7159827
7159828	xiaojingling	97.92%	11.7524	SRR7159828
7159829	zuidongfeng	99.33%	13.1111	SRR7159829
7159830	hongyun	99.24%	13.9575	SRR7159830
7159831	jinlingzhixing	99.27%	13.6104	SRR7159831
7159832	danyue	99.16%	12.4043	SRR7159832
7159833	qinhuaibaiyu	98.41%	11.6404	SRR7159833
BG	baige	99.36%	20.3992	SRR4308656
BYDHL	baiyangdianhonglian	99.11%	3.68911	SRR4308659
DH2	donghonghua	99.27%	22.267	SRR4308651
FH	fenhonglingxiao	99.04%	18.6101	SRR4308650
HLJHL	heilongjianghonglian	98.92%	3.93594	SRR4308657
JX	jianxuan17	99.27%	20.6081	SRR4308643
PZHBL	puzheheibailian	99.17%	5.14056	SRR4308658
QBL	qianbanlian	99.11%	3.83764	SRR4308655
WSHL	weishanhonglian	99.14%	7.54628	SRR4308645
XHHL	xihuhonglian	99.01%	4.30606	SRR4308660
ZGWS	zhigaowushang	99.07%	20.4065	SRR4308649

Open in a new tab

Database construction

All genomic sequence, annotation, expression, and genetic variation data were stored via MySQL on a Ubuntu server. A user-friendly website was developed using HTML5 and JavaScript; this website which can be accessed through different browsers, such as Google Chrome and Firefox. Gene models and transcript isoforms are provided via GBrowse and JBrowse. Heatmaps of gene expression are plotted via R, and query searches are achieved via JavaScript and Java. Common utilities for genomic studies such as BLAST, BLAT and Primer Design are also deployed and accessible.

Data Records

The genomic raw PacBio sequencing data are available in the NCBI Sequence Read Archive (SRA) database under accession numbers SRR7549129⁴¹ and SRR7549130⁴², and the Illumina and Hi-C sequencing data are deposited under SRR7615553⁴³ and SRR7631523⁴⁴, which helped us in our genome assembly (Online-only Table 1). Raw whole-genome resequencing reads for 62 strains can be downloaded from the NCBI database under Bioproject accession SRP173547⁴⁵, and the resequencing data of the other 22 cultivars are also accessible via the NCBI SRA^46,47 (Online-only Table 2). The latest assembly and annotations of the ‘China Antique’ lotus variety is deposited in the Nelumbo Genome Database (Download links: http://nelumbo.biocloud.net/downloadData/download?path = NNU.genomic.fa and http://nelumbo.biocloud.net/downloadData/download?path = NNU.gff3). Additionally, this Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession DUZY00000000, and the version described in this paper is version DUZY01000000⁴⁸. Improved gene and repeat (including transposable element) predictions (GFF3), coding and peptide sequences (FASTA), gene and transcript functional annotations, gene expression and coexpression profiles, SNP and InDel variations, phenotypic traits and images for 88 lotus strains have been uploaded into the Figshare database⁴⁹ and deployed in the newly developed Nelumbo Genome Database (http://nelumbo.biocloud.net).

Technical Validation

Quality control of genome annotation, expression, and genome resequencing was performed during data processing for the NGD.

Genome annotation

We used a set of conserved single-copy plant genes from the BUSCO database to assess the completeness of gene annotations⁵⁰. Compared with previous annotations of the lotus variety China Antique (BUSCOs = 74.6%), our new annotation version provides much improved, complete BUSCOs (97.5%), and 41,140 out of 46,713 annotated genes with either complete or partial ORFs (88%) were validated by 69 transcriptome datasets (Figure S1a), which suggests relatively high quality and completeness of the genome assembly and gene annotations (Table 2).

Table 2.

BUSCO assessment of the completeness of gene annotation.

	Gene number^a	BUSCO ratio^a	Gene number^b	BUSCO ratio^b
Complete BUSCOs	1404	97.5%	1074	74.6%
Complete and single-copy BUSCOs	750	52.1%	947	65.8%
Complete and duplicated BUSCOs	654	45.4%	127	8.8%
Fragmented BUSCOs	16	1.1%	152	10.6%
Missing BUSCOs	20	1.4%	214	14.9%
Total BUSCO groups searched	1440		1440

Open in a new tab

^aThe current genome assembly and annotation of var. China Antique⁸.

^bEarly genome assembly and annotation of var. China Antique⁹.

Gene expression

To ensure that the FPKMs of genes accurately reflect the gene expression in different tissues and at different developmental stages, hierarchical clustering of gene FPKMs across different RNA-seq samples of the variety China Antique was performed via Expander 6.0⁵¹. All sample repeats clustered together, while all developmental stages from the same tissue clustered together, except for one petiole sample, and the relative expression of randomly selected genes in different tissues was validated through qRT-PCR (Fig. 2 and Figure S1b). Therefore, we confirmed the accuracy of FPKM as an indicator of lotus gene expression.

Fig. 2 — Hierarchical clustering of different RNA-seq samples based on a gene expression matrix (FPKM) from the lotus variety China Antique. Note that only a small portion of the genes are shown in the heatmap.

Whole-genome resequencing

Before genome mapping, adapters and low-quality Illumina reads were filtered and removed (see the Methods). Base content, error rate, insert size distribution and log-transformed read coverage across the lotus chromosomes were checked, all of which met the criteria for downstream analyses (Fig. 3). The quality of genome mapping was also checked. The average mapping rate for cultivars from this study was 99.18%, while the numbers in the other cultivars collected from two previous studies were 98.98% and 99.13% (Online-only Table 2). The average depth for the cultivars from the current study was 16.1×, while the depth was 12.4× and 11.8× for cultivars from the other two reports (Online-only Table 2). To ensure the final quality of SNPs and InDels called by the GATK pipeline, stringent hard-filtering parameters were set (see the Methods). Because the majority of alleles in the SNP data set are expected to be shared by at least two individuals, we plotted the frequency of SNPs according to the minor allele count (MAC) across the 88 cultivars. Indeed, most of the SNPs had MACs ≥ 2, while the SNP density peaked around MACs of four or five (Fig. 4). SNP variants were further validated and visualized using IGVtools (http://software.broadinstitute.org/software/igv/igvtools) (Figure S1c).

Fig. 3 — Quality evaluation of the genome resequencing data of cultivars, including the base content (a), error rate (b), insert size distribution (c) and log-transformed read coverage across eight lotus chromosomes (d), as demonstrated by the example of lotus cultivar Xiaoxia. The resequencing quality met the criterion for downstream variant calling analyses.

Fig. 4 — Distribution of SNPs according to the minor allele count (MAC) across 88 lotus cultivars.

Supplementary information

Supplementary file^{(328.5KB, docx)}

Acknowledgements

This work was supported by grants from the Strategic Priority Research Program CAS (No. XDB31000000), the National Natural Science Foundation of China (No. 31570220, No. 31870208 and No. 31700197), the Youth Innovation Promotion Association of the Chinese Academy of Sciences (No. 2019335), the Bureau of Landscaping and Forestry of Wuhan Municipality (No. WHGF2019A10), the Hubei Provincial Natural Science Foundation of China (No. 2019CFB275), and the Hubei Chenguang Talented Youth Development Foundation. We thank Razgar Seyed Rahmani from Ghent University for the discussion.

Online-only Tables

Author contributions

Genome sequencing, assembly and annotation: T.S.; RNA-seq and gene expression: Y.Z., X.Y.; genome resequencing of cultivars: H.L., X.Y.; phenotype collection: X.Y., Z.G.; sample collection and experiments: C.W., Y.L.; communications, web design and conceptualization: J.C.; manuscript writing and revising: H.L., J.C. and T.S.

Code availability

The genomic and transcriptomic sequence data were produced by corresponding software provided by the sequencing platform manufacturer, and the software (including versions, parameters and settings) used for genome assembly was cited in the Methods section, with default parameters used when no detailed parameters were mentioned. The code for the NGD construction in Java is available in the Figshare database⁴⁹.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Hui Li, Xingyu Yang.

Contributor Information

Jinming Chen, Email: jmchen@wbgcas.cn.

Tao Shi, Email: shitao323@wbgcas.cn.

Supplementary information

The online version contains supplementary material available at 10.1038/s41597-021-00828-8.

References

1.Gandolfo MA, Nixon KC, Crepet WL. Cretaceous flowers of Nymphaeaceae and implications for complex insect entrapment pollination mechanisms in early angiosperms. Proc. Natl. Acad. Sci. USA. 2004;101:8056–8060. doi: 10.1073/pnas.0402473101. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Zheng C, Sankoff D. Practical halving; the Nelumbo nucifera evidence on early eudicot evolution. Comput Biol Chem. 2014;50:75–81. doi: 10.1016/j.compbiolchem.2014.01.010. [DOI] [PubMed] [Google Scholar]
3.Hayes V, Schneider EL, Carlquist S. Floral development of Nelumbo nucifera (Nelumbonaceae) Int. J. Plant Sci. 2000;161:S183–S191. doi: 10.1086/317577. [DOI] [Google Scholar]
4.Slocum, P. D. Waterlilies and Lotuses: Species, Cultivars, and New Hybrids. (Timber Press, 2005).
5.Zhou M, et al. Identification and comparison of anti-inflammatory ingredients from different organs of lotus nelumbo by UPLC/Q-TOF and PCA coupled with a NF-kappaB reporter gene assay. PloS One. 2013;8:e81971. doi: 10.1371/journal.pone.0081971. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Cheng T, et al. Development and identification of three functional markers associated with starch content in lotus (Nelumbo nucifera) Sci. Rep. 2020;10:4242. doi: 10.1038/s41598-020-60736-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Li Y, et al. Comparative population genomics reveals genetic divergence and selection in lotus. Nelumbo nucifera. BMC Genom. 2020;21:146. doi: 10.1186/s12864-019-6376-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Shi T, et al. Distinct expression and methylation patterns for genes with different fates following a single whole-genome duplication in flowering plants. Mol. Biol. Evol. 2020;37:2394–2413. doi: 10.1093/molbev/msaa105. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Ming R, et al. Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.) Genome Biol. 2013;14:R41. doi: 10.1186/gb-2013-14-5-r41. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Huang L, et al. Whole genome re-sequencing reveals evolutionary patterns of sacred lotus (Nelumbo nucifera) J Integr Plant Biol. 2018;60:2–15. doi: 10.1111/jipb.12606. [DOI] [PubMed] [Google Scholar]
11.Zhao M, et al. Detection of highly differentiated genomic regions between lotus (Nelumbo nucifera Gaertn.) with contrasting plant architecture and their functional relevance to plant architecture. Front. Plant Sci. 2018;9:1219. doi: 10.3389/fpls.2018.01219. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Yang M, et al. Transcriptomic analysis of the regulation of rhizome formation in temperate and tropical Lotus (Nelumbo nucifera) Sci. Rep. 2015;5:13059. doi: 10.1038/srep13059. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Li J, et al. Systematic transcriptomic analysis provides insights into lotus (Nelumbo nucifera) seed development. Plant Growth Regul. 2018;86:339–350. doi: 10.1007/s10725-018-0433-1. [DOI] [Google Scholar]
14.Zhang Y, Nyong AT, Shi T, Yang P. The complexity of alternative splicing and landscape of tissue-specific expression in lotus (Nelumbo nucifera) unveiled by Illumina- and single-molecule real-time-based RNA-sequencing. DNA Res. 2019;26:301–311. doi: 10.1093/dnares/dsz010. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Zheng Y, et al. Genome-wide analysis of microRNAs in sacred lotus, Nelumbo nucifera (Gaertn) Tropical Plant Biol. 2013;6:117–130. doi: 10.1007/s12042-013-9127-z. [DOI] [Google Scholar]
16.Shi T, Wang K, Yang P. The evolution of plant microRNAs: insights from a basal eudicot sacred lotus. Plant J. 2017;89:442–457. doi: 10.1111/tpj.13394. [DOI] [PubMed] [Google Scholar]
17.Zhang Y, Rahmani RS, Yang X, Chen J, Shi T. Integrative expression network analysis of microRNA and gene isoforms in sacred lotus. BMC Genom. 2020;21:429. doi: 10.1186/s12864-020-06853-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Wang Y, et al. Genome-wide identification and characterization of GRAS transcription factors in sacred lotus (Nelumbo nucifera) PeerJ. 2016;4:e2388. doi: 10.7717/peerj.2388. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Li H, Yang X, Lu M, Chen J, Shi T. Gene expression and evolution of Family-1 UDP-glycosyltransferases—insights from an aquatic flowering plant (sacred lotus) Aquat. Bot. 2020;166:103270. doi: 10.1016/j.aquabot.2020.103270. [DOI] [Google Scholar]
20.Chui R, Jaromczyk JW, Moore N, Schardl CL. FPD2GB2: automating a transition from a customized genome browser to GBrowse2. BMC Bioinform. 2013;14:A17. doi: 10.1186/1471-2105-14-S17-A17. [DOI] [Google Scholar]
21.Buels R, et al. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 2016;17:66. doi: 10.1186/s13059-016-0924-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 2016;11:1650–1667. doi: 10.1038/nprot.2016.095. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Niknafs YS, Pandian B, Iyer HK, Chinnaiyan AM, Iyer MK. TACO produces robust multisample transcriptome assemblies from RNA-seq. Nat. Methods. 2017;14:68–70. doi: 10.1038/nmeth.4078. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Michael TP, et al. High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat. Commun. 2018;9:541. doi: 10.1038/s41467-018-03016-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Ming R, et al. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) Nature. 2008;452:991–996. doi: 10.1038/nature06856. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Jaillon O, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007;449:463–467. doi: 10.1038/nature06148. [DOI] [PubMed] [Google Scholar]
27.Nock CJ, Baten A, King GJ. Complete chloroplast genome of Macadamia integrifolia confirms the position of the Gondwanan early-diverging eudicot family Proteaceae. BMC Genom. 2014;15:S13. doi: 10.1186/1471-2164-15-S9-S13. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Fox SE, et al. Sequencing and de novo transcriptome assembly of Brachypodium sylvaticum (Poaceae) Appl. Surf. Sci. 2013;1:apps.1200011. doi: 10.3732/apps.1200011. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Hoff KJ, Lomsadze A, Borodovsky M, Stanke M. Whole-genome annotation with BRAKER. Methods Mol. Biol. 2019;1962:65–95. doi: 10.1007/978-1-4939-9173-0_5. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 2005;4:Article17. doi: 10.2202/1544-6115.1128. [DOI] [PubMed] [Google Scholar]
32.Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004;32:D277–280. doi: 10.1093/nar/gkh063. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Ai C, Kong L. CGPS: A machine learning-based approach integrating multiple gene set analysis tools for better prioritization of biologically relevant pathways. J Genet Genomics. 2018;45:489–504. doi: 10.1016/j.jgg.2018.08.002. [DOI] [PubMed] [Google Scholar]
34.El-Gebali S, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47:D427–D432. doi: 10.1093/nar/gky995. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Jin J, et al. PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 2017;45:D1040–D1045. doi: 10.1093/nar/gkw982. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Cota-Sánchez JH, Remarchuk K, Ubayasena K, Ready-to-use DNA. extracted with a CTAB method adapted for herbarium specimens and mucilaginous plant tissue. Plant Mol. Biol. Rep. 2006;24:161–167. doi: 10.1007/BF02914055. [DOI] [Google Scholar]
37.Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–595. doi: 10.1093/bioinformatics/btp698. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.McKenna A, et al. The genome analysis toolkit: a mapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Zhang, X. & Wang, Q. New Lotus Flower Cultivars in China. (China Forestry Publishing House, 2011).
40.Zhang, X. & Wang, Q. Lotus Flower Cultivars in China. (China Forestry Publishing House, 2005).
41.2018. NCBI Sequence Read Archive. SRR7549129
42.2018. NCBI Sequence Read Archive. SRR7549130
43.2018. NCBI Sequence Read Archive. SRR7615553
44.2018. NCBI Sequence Read Archive. SRR7631523
45.2018. NCBI Sequence Read Archive. SRP173547
46.2018. NCBI Sequence Read Archive. SRP145546
47.2016. NCBI Sequence Read Archive. SRP090666
48.2020. GenBank whole genome shotgun sequencing project. DUZY00000000
49.Li H, 2020. Nelumbo genome database, an integrative resource for gene expression and variants of Nelumbo nucifera. figshare. [DOI] [PMC free article] [PubMed]
50.Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
51.Sharan R, Maron-Katz A, Shamir R. CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics. 2003;19:1787–1799. doi: 10.1093/bioinformatics/btg232. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

2018. NCBI Sequence Read Archive. SRR7549129
2018. NCBI Sequence Read Archive. SRR7549130
2018. NCBI Sequence Read Archive. SRR7615553
2018. NCBI Sequence Read Archive. SRR7631523
2018. NCBI Sequence Read Archive. SRP173547
2018. NCBI Sequence Read Archive. SRP145546
2016. NCBI Sequence Read Archive. SRP090666
2020. GenBank whole genome shotgun sequencing project. DUZY00000000
Li H, 2020. Nelumbo genome database, an integrative resource for gene expression and variants of Nelumbo nucifera. figshare. [DOI] [PMC free article] [PubMed]

Supplementary Materials

Supplementary file^{(328.5KB, docx)}

Data Availability Statement

[CR1] 1.Gandolfo MA, Nixon KC, Crepet WL. Cretaceous flowers of Nymphaeaceae and implications for complex insect entrapment pollination mechanisms in early angiosperms. Proc. Natl. Acad. Sci. USA. 2004;101:8056–8060. doi: 10.1073/pnas.0402473101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Zheng C, Sankoff D. Practical halving; the Nelumbo nucifera evidence on early eudicot evolution. Comput Biol Chem. 2014;50:75–81. doi: 10.1016/j.compbiolchem.2014.01.010. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Hayes V, Schneider EL, Carlquist S. Floral development of Nelumbo nucifera (Nelumbonaceae) Int. J. Plant Sci. 2000;161:S183–S191. doi: 10.1086/317577. [DOI] [Google Scholar]

[CR4] 4.Slocum, P. D. Waterlilies and Lotuses: Species, Cultivars, and New Hybrids. (Timber Press, 2005).

[CR5] 5.Zhou M, et al. Identification and comparison of anti-inflammatory ingredients from different organs of lotus nelumbo by UPLC/Q-TOF and PCA coupled with a NF-kappaB reporter gene assay. PloS One. 2013;8:e81971. doi: 10.1371/journal.pone.0081971. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Cheng T, et al. Development and identification of three functional markers associated with starch content in lotus (Nelumbo nucifera) Sci. Rep. 2020;10:4242. doi: 10.1038/s41598-020-60736-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Li Y, et al. Comparative population genomics reveals genetic divergence and selection in lotus. Nelumbo nucifera. BMC Genom. 2020;21:146. doi: 10.1186/s12864-019-6376-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Shi T, et al. Distinct expression and methylation patterns for genes with different fates following a single whole-genome duplication in flowering plants. Mol. Biol. Evol. 2020;37:2394–2413. doi: 10.1093/molbev/msaa105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Ming R, et al. Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.) Genome Biol. 2013;14:R41. doi: 10.1186/gb-2013-14-5-r41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Huang L, et al. Whole genome re-sequencing reveals evolutionary patterns of sacred lotus (Nelumbo nucifera) J Integr Plant Biol. 2018;60:2–15. doi: 10.1111/jipb.12606. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Zhao M, et al. Detection of highly differentiated genomic regions between lotus (Nelumbo nucifera Gaertn.) with contrasting plant architecture and their functional relevance to plant architecture. Front. Plant Sci. 2018;9:1219. doi: 10.3389/fpls.2018.01219. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Yang M, et al. Transcriptomic analysis of the regulation of rhizome formation in temperate and tropical Lotus (Nelumbo nucifera) Sci. Rep. 2015;5:13059. doi: 10.1038/srep13059. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Li J, et al. Systematic transcriptomic analysis provides insights into lotus (Nelumbo nucifera) seed development. Plant Growth Regul. 2018;86:339–350. doi: 10.1007/s10725-018-0433-1. [DOI] [Google Scholar]

[CR14] 14.Zhang Y, Nyong AT, Shi T, Yang P. The complexity of alternative splicing and landscape of tissue-specific expression in lotus (Nelumbo nucifera) unveiled by Illumina- and single-molecule real-time-based RNA-sequencing. DNA Res. 2019;26:301–311. doi: 10.1093/dnares/dsz010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Zheng Y, et al. Genome-wide analysis of microRNAs in sacred lotus, Nelumbo nucifera (Gaertn) Tropical Plant Biol. 2013;6:117–130. doi: 10.1007/s12042-013-9127-z. [DOI] [Google Scholar]

[CR16] 16.Shi T, Wang K, Yang P. The evolution of plant microRNAs: insights from a basal eudicot sacred lotus. Plant J. 2017;89:442–457. doi: 10.1111/tpj.13394. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Zhang Y, Rahmani RS, Yang X, Chen J, Shi T. Integrative expression network analysis of microRNA and gene isoforms in sacred lotus. BMC Genom. 2020;21:429. doi: 10.1186/s12864-020-06853-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Wang Y, et al. Genome-wide identification and characterization of GRAS transcription factors in sacred lotus (Nelumbo nucifera) PeerJ. 2016;4:e2388. doi: 10.7717/peerj.2388. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Li H, Yang X, Lu M, Chen J, Shi T. Gene expression and evolution of Family-1 UDP-glycosyltransferases—insights from an aquatic flowering plant (sacred lotus) Aquat. Bot. 2020;166:103270. doi: 10.1016/j.aquabot.2020.103270. [DOI] [Google Scholar]

[CR20] 20.Chui R, Jaromczyk JW, Moore N, Schardl CL. FPD2GB2: automating a transition from a customized genome browser to GBrowse2. BMC Bioinform. 2013;14:A17. doi: 10.1186/1471-2105-14-S17-A17. [DOI] [Google Scholar]

[CR21] 21.Buels R, et al. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 2016;17:66. doi: 10.1186/s13059-016-0924-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 2016;11:1650–1667. doi: 10.1038/nprot.2016.095. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Niknafs YS, Pandian B, Iyer HK, Chinnaiyan AM, Iyer MK. TACO produces robust multisample transcriptome assemblies from RNA-seq. Nat. Methods. 2017;14:68–70. doi: 10.1038/nmeth.4078. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Michael TP, et al. High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat. Commun. 2018;9:541. doi: 10.1038/s41467-018-03016-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Ming R, et al. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) Nature. 2008;452:991–996. doi: 10.1038/nature06856. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Jaillon O, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007;449:463–467. doi: 10.1038/nature06148. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Nock CJ, Baten A, King GJ. Complete chloroplast genome of Macadamia integrifolia confirms the position of the Gondwanan early-diverging eudicot family Proteaceae. BMC Genom. 2014;15:S13. doi: 10.1186/1471-2164-15-S9-S13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Fox SE, et al. Sequencing and de novo transcriptome assembly of Brachypodium sylvaticum (Poaceae) Appl. Surf. Sci. 2013;1:apps.1200011. doi: 10.3732/apps.1200011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Hoff KJ, Lomsadze A, Borodovsky M, Stanke M. Whole-genome annotation with BRAKER. Methods Mol. Biol. 2019;1962:65–95. doi: 10.1007/978-1-4939-9173-0_5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 2005;4:Article17. doi: 10.2202/1544-6115.1128. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004;32:D277–280. doi: 10.1093/nar/gkh063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Ai C, Kong L. CGPS: A machine learning-based approach integrating multiple gene set analysis tools for better prioritization of biologically relevant pathways. J Genet Genomics. 2018;45:489–504. doi: 10.1016/j.jgg.2018.08.002. [DOI] [PubMed] [Google Scholar]

[CR34] 34.El-Gebali S, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47:D427–D432. doi: 10.1093/nar/gky995. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Jin J, et al. PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 2017;45:D1040–D1045. doi: 10.1093/nar/gkw982. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Cota-Sánchez JH, Remarchuk K, Ubayasena K, Ready-to-use DNA. extracted with a CTAB method adapted for herbarium specimens and mucilaginous plant tissue. Plant Mol. Biol. Rep. 2006;24:161–167. doi: 10.1007/BF02914055. [DOI] [Google Scholar]

[CR37] 37.Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–595. doi: 10.1093/bioinformatics/btp698. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.McKenna A, et al. The genome analysis toolkit: a mapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Zhang, X. & Wang, Q. New Lotus Flower Cultivars in China. (China Forestry Publishing House, 2011).

[CR40] 40.Zhang, X. & Wang, Q. Lotus Flower Cultivars in China. (China Forestry Publishing House, 2005).

[CR41] 41.2018. NCBI Sequence Read Archive. SRR7549129

[CR42] 42.2018. NCBI Sequence Read Archive. SRR7549130

[CR43] 43.2018. NCBI Sequence Read Archive. SRR7615553

[CR44] 44.2018. NCBI Sequence Read Archive. SRR7631523

[CR45] 45.2018. NCBI Sequence Read Archive. SRP173547

[CR46] 46.2018. NCBI Sequence Read Archive. SRP145546

[CR47] 47.2016. NCBI Sequence Read Archive. SRP090666

[CR48] 48.2020. GenBank whole genome shotgun sequencing project. DUZY00000000

[CR49] 49.Li H, 2020. Nelumbo genome database, an integrative resource for gene expression and variants of Nelumbo nucifera. figshare. [DOI] [PMC free article] [PubMed]

[CR50] 50.Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[CR51] 51.Sharan R, Maron-Katz A, Shamir R. CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics. 2003;19:1787–1799. doi: 10.1093/bioinformatics/btg232. [DOI] [PubMed] [Google Scholar]

PERMALINK

Nelumbo genome database, an integrative resource for gene expression and variants of Nelumbo nucifera

Hui Li

Xingyu Yang

Yue Zhang

Zhiyan Gao

Yuting Liang

Jinming Chen

Tao Shi

Abstract

Background & Summary

Fig. 1.

Major datasets

Table 1.

Uses

Methods

Data processing

Online-only Table 1.

Online-only Table 2.

Database construction

Data Records

Technical Validation

Genome annotation

Table 2.

Gene expression

Fig. 2.

Whole-genome resequencing

Fig. 3.

Fig. 4.

Supplementary information

Acknowledgements

Online-only Tables

Author contributions

Code availability

Competing interests

Footnotes

Contributor Information

Supplementary information

References

Associated Data

Data Citations

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases