Skip to main content
Scientific Data logoLink to Scientific Data
. 2023 Sep 9;10:605. doi: 10.1038/s41597-023-02516-1

Chromosome level genome assembly of colored calla lily (Zantedeschia elliottiana)

Yi Wang 1,#, Tuo Yang 2,#, Di Wang 1,3, Rongxin Gou 1,3, Yin Jiang 1,3, Guojun Zhang 3, Yuhong Zheng 4, Dan Gao 5, Liyang Chen 5, Xiuhai Zhang 1,, Zunzheng Wei 1,
PMCID: PMC10492805  PMID: 37689767

Abstract

The colored calla lily is an ornamental floral plant native to southern Africa, belonging to the Zantedeschia genus of the Araceae family. We generated a high-quality chromosome-level genome of the colored calla lily, with a size of 1,154 Mb and a contig N50 of 42 Mb. We anchored 98.5% of the contigs (1,137 Mb) into 16 pseudo-chromosomes, and identified 60.18% of the sequences (694 Mb) as repetitive sequences. Functional annotations were assigned to 95.1% of the predicted protein-coding genes (36,165). Additionally, we annotated 469 miRNAs, 1,652 tRNAs, 10,033 rRNAs, and 1,677 snRNAs. Furthermore, Gypsy-type LTR retrotransposons insertions in the genome are the primary factor causing significant genome size variation in Araceae species. This high-quality genome assembly provides valuable resources for understanding genome size differences within the Araceae family and advancing genomic research on colored calla lily.

Subject terms: Plant evolution, Plant genetics

Background & Summary

Zantedeschia spp, commonly known as calla lily, is a perennial herbaceous flowering plant belonging to genus Zantedeschia of the family Araceae. It is typically found in swamps and hills regions of South Africa1,2. Through its unique spathes and decorative foliage, calla lily has become popular tubers flowering plants worldwide. It is usually divided into two groups: white calla lily and colored calla lily3. Colored calla lily is a significant economic horticultural crop that have been among the top cut flower and tuber exports in New Zealand for the past three decades, while also contributing substantially to the horticultural export revenues of the Netherlands and the United States. Furthermore, the tubers of colored calla lilies have medicinal value and are effective in treating certain gastrointestinal and trauma-related illnesses.

Through k-mer and flow cytometry analysis, the genome size of Zantedeschia elliottiana cv. ‘Jingcai Yangguang’ was ~1.2 Gb, with a genome heterozygosity of 1.9% and a repeat sequence proportion of 67.84% (Figs. 1, 2). The de-novo assembly of the genome used 84.30X Illumina paired-end short reads (100.31 Gb), 36.92X HiFi reads (43.93 Gb) and 141.45X Hi-C reads (168.18 Gb). We first assembled the genome by HiFi reads and generated a 1,154 Mb contig sequence with 42 Mb contig N50 size (Table 1). Using Hi-C reads, 98.50% of the contigs were anchored into 16 pseudo-chromosomes (Fig. 3, Table 1). The transposable elements content of the total genome in the final annotation is 60.18%, of which LTR retroelement accounted for the largest proportion (51.54%). On the contrary, the proportion of DNA transposons was only 3.73% (Table 2). A total of 36,165 protein-coding genes were predicted, of which 95.1% could be functionally annotated through the InterPro4, Pfam5, Swiss-Prot6, NCBI Non-redundant protein (NR)7 and Kyoto Encyclopedia of Genes and Genomes (KEGG)8 databases (Table 3). In addition, 10,033 rRNA, 1,677 snRNA, 469 miRNA and 1,652 tRNA in Zantedeschia elliottiana cv. ‘Jingcai Yangguang’ genome were obtained by non-coding RNA annotation (Table 4). Using BUSCO evaluation, 98% of the core genes can be identified, including 95.7% of complete single-copy genes and 2.3% of duplicated genes (Table 1). 93.83~95.23% of RNA-seq reads from eight Zantedeschia elliottiana cv. ‘Jingcai Yangguang’ tissues (tuber, leaf, pistil, root, spathe, stamen, stem and style) could be mapped to the genome. 99.02% of Illumina reads and 98.42% of HiFi reads were correctly mapped to the genome. The LTR Assembly Index (LAI) of the genome was 18.43, which directly proved that the genome has high continuity (Table 1). LTR insertion time analysis showed that Araceae plants had different LTR bursts during genome evolution, and different types of LTR have different burst states. For Copia-type LTR retrotransposons, Pistia stratiotes and Zantedeschia elliottiana cv. ‘Jingcai Yangguang’ had the same insertion time. Interestingly, Amorphophallus konjac and Colocasia esculenta experienced two outbreaks of Copia and Gypsy. The time interval between the two outbreaks of Colocasia esculenta were obvious, while Amorphophallus konjac were close. Analysis also showed that Gypsy of Pistiastratiotes had recently experienced an outbreak (Fig. 4a). As a branch of Araceae family, Lemnaceae plantshave a smaller genome size and number of genes than True-Araceae plants. However, the genome size of True-Araceae plants is not related to the number of genes. Correlation analysis further explained the high correlation between genome size and transposable elements. Gypsy-type LTR retrotransposons had the highest correlation with genome size (Fig. 4b).

Fig. 1.

Fig. 1

Genome size estimation of Zantedeschia elliottiana cv. ‘Jingcai Yangguang’ by flow cytometry. Tomato and maize were used as internal references to genome size estimation.

Fig. 2.

Fig. 2

Genome size estimation of Zantedeschia elliottiana cv. ‘Jingcai Yangguang’ using Illumina reads.

Table 1.

Summary of the Z. elliottiana genome.

Assembly characteristics Z. elliottiana cv. ‘Jingcai Yangguang’
Total length of contigs (bp) 1,154,500,755
N50 length of contigs (bp) 42,376,536
Total number of contigs 375
Longest contigs 80,375,493
Total gap size (bp) 6,700
Total sequences anchored to the pseudo-chromosomes (bp) 1,137,238,020
place rate (%) 98.50
Number of annotated genes 36,165
Percentage of transposon element sequences (%) 60.18
Complete BUSCOs (%) 98.00
Fragmented BUSCOs (%) 0.80
Missed BUSCOs (%) 1.20
LAI 18.43

Fig. 3.

Fig. 3

Characteristics of Zantedeschia elliottiana cv. ‘Jingcai Yangguang’ genome. (a) Hi-C heatmap of the Zantedeschia elliottiana cv. ‘Jingcai Yangguang’ genome. (b) Circos plot of Zantedeschia elliottiana cv. ‘Jingcai Yangguang’ genome. (a) Gene density, (b) TE density, (c) Tandem repeats density, (d) GC content and syntenic blocks.

Table 2.

Classification of repetitive sequences in Z. elliottiana cv. ‘Jingcai Yangguang’ genome.

Z. elliottiana cv. ‘Jingcai Yangguang’
Without N gaps: 1,154,500,755
Repetitive sequences Length (bp) Ratio (%) in genome
LTR Retroelement 595,082,160 51.54
Gypsy (LTR) 387,224,277 33.54
Copia (LTR) 194,442,787 16.84
LINE 47,578,872 4.12
SINE 293,029 0.03
DNA transposons 43,021,358 3.73
Other/Unspecified/Unknown 29,613,899 2.57

Table 3.

Statistics of gene functional annotation.

Database Z. elliottiana cv. ‘Jingcai Yangguang’
Gene numbers Ratio (%)
NR 23,081 63.80
Swiss-Prot 16,854 46.60
KEGG 16,690 46.10
Pfam 19,198 53.10
GO 15,276 42.20
Annotated 34,406 95.10
Total 36,165

Table 4.

Classification of non-coding RNAs in Z. elliottiana cv. ‘Jingcai Yangguang’ genome.

Type Number Average length (bp) Total length (bp) % of genome
miRNA 469 117.95 55,320 0.004792
tRNA 1,652 75.21 124,254 0.010763
rRNA 18S 1,571 1759.02 2,763,428 0.239360
28S 6,091 143.58 874,541 0.075750
5.8S 1,532 159.30 244,048 0.021139
5S 839 115.46 96,874 0.008391
snRNA CD-box 1,408 106.49 149,933 0.012987
HACA-box 69 147.61 10,185 0.000882
splicing 200 134.86 26,971 0.002336

Fig. 4.

Fig. 4

The influence of LTRs on genome size. (a) The insertion time of LTRs (Copia and Gypsy) was predicted by 4Dtv. Pstr, Pistia stratiotes; Akon, Amorphophallus konjac; Zell, Zantedeschia elliottiana cv. ‘Jingcai Yangguang’; Pped, Pinellia pedatisecta; Cesc, Colocasia esculenta. (b) Analysis of the correlation between the total length of LTRs and the genome size.

Here, a high-quality chromosome-level assembly of Zantedeschia elliottiana cv. ‘Jingcai Yangguang’ was assembled, revealing the fundamental cause of genome size variation in the Araceae family.

Methods

Sample collection and sequencing

‘Jingcai Yangguang’ is a variant of Zantedeschia elliottiana cv. ‘Black Magic’ with a chromosome number of 2n = 2x = 32. It was initially cultivated in 2015 by Di Zhou, a former associate researcher in our team. Its young leaves were collected for genome sequencing, and the sequencing material was sourced from the same plant to ensure accuracy of the sequencing. Eight tissues (tuber, leaf, pistil, root, spathe, stamen, stem and style) were sampled for transcriptome sequencing, and the sequencing results were used for gene structure annotation.

The FastPure Plant DNA Isolation Mini Kit (Vazyme, CHN) was employed for DNA extraction from leaf tissue. In liquid nitrogen, fresh leaves were pulverized into a fine powder, and genomic DNA was isolated according to the manufacturer’s guidelines. NanoDrop 2000 (Thermo Scientific, USA) and gel electrophoresis were utilized to evaluate the concentration and purity of the isolated DNA.

The high-quality DNA was used to construct a genomic library, and the library construction and sequencing work were completed at Novogene Co., Ltd. in Beijing. The library is then size-selected using BluePippin (Sage Science, USA) to obtain fragments of the desired size range, which is typically ~15 kb for HiFi sequencing. The purified and size-selected library is then sequenced on the PacBio Sequel II system (Pacifc Biosciences, USA). For Illumina sequencing, a short-read sequencing library was constructed with an insert size of ~250 bp and sequenced on an Illumina NovaSeq. 6,000 platform (Illumina, USA). The Hi-C library was constructed using the same leaf sample as previously described. Briefly, nuclear DNA was fixed with formaldehyde and digested with the restriction enzyme DpnII (NEB, UK). Biotinylated nucleotides were added to the termini of the fragmented DNA, followed by enrichment and size selection to obtain fragments approximately 500 bp. The library was sequenced on the Illumina NovaSeq. 6,000 platform (Illumina, USA).

The RNAprep Pure Plant Kit (TIANGEN, CHN) was used to extract RNA from 8 different tissues (tuber, leaf, pistil, root, spathe, stamen, stem and style). The tissue samples were ground with liquid nitrogen and lysis buffer was added to extract RNA. The RNA was isolated according to the manufacturer’s guidelines. RNA-seq libraries were generated and sequenced on an NovaSeq. 6,000 platform (Illumina, USA).

Genome size estimation

Two methods, k-mer and flow cytometry analysis, were employed to estimate the genome size of Zantedeschia elliottiana cv. ‘Jingcai Yangguang’. For flow cytometry analysis, the DNA content of Zantedeschia elliottiana cv. ‘Jingcai Yangguang’ was assessed using the BD Accuri C6 flow cytometer (BD Biosciences, USA), with tomato and maize as reference standards (Fig. 1). The frequency distribution of k-mer was assessed using Jellyfish (v1.0.0) (-C -m 21 -G 2)9. Using GenomeScope (v2.0) (-p 2 -k 21)10 to calculate the genome size and heterozygosity level with k-mer size = 21 (Fig. 2).

De-novo genome assembly

Firstly, contigs were assembled from HiFi reads using hifiasm (v0.19.5) (https://github.com/chhylp123/hifiasm) with default parameters. Subsequently, Hi-C reads were aligned to contigs using HICUP (v0.7.3)11 to evaluate the efficiency of data. Following that, contigs were anchored into 16 pseudo-chromosomes using YaHS (v1.1) with default parameters (Fig. 3). Finally, the assembled genome was manually corrected with Juicebox (v1.11.08) (Table 1)12.

Completeness evaluation of the assembled genome

Benchmarking Universal Single-Copy Orthologs (BUSCO v5.4.5, embryophyta_odb10)13, and LTR Assembly Index (LAI, LTR_retriever v2.9.0)14 were used to determine the completeness of the genome, respectively (Table 1).

Genome prediction and annotation

The annotation pipeline employed for predicting repeat elements consisted of both homology-based and de-novo approaches. In the homology-based approach, alignment searches were conducted against the Repbase database (http://www.girinst.org/repbase)15 to identify homologous evidence, which was subsequently predicted using RepeatProteinMask (v4.1.0) (http://www.repeatmasker.org/). For de-novo annotation, a de-novo library was constructed using LTR_FINDER (v1.07)16, RepeatScout (v1.0.6) (http://www.repeatmasker.org/)17, and RepeatModeler (v2.0.4) (http://www.repeatmasker.org/RepeatModeler.html)18. The annotation process was then performed using Repeatmasker (v4.1.0) (http://repeatmasker.org/)19.

To annotate the gene structure, a strategy incorporating de-novo prediction, protein-based homology, and transcriptome were employed. Protein sequences from Amorphophallus konjac, Colocasia esculenta, Lemna minuta, Spirodela polyrhiza, Pistia stratiotes and Pinellia pedatisecta were mapped to their respective genome using WUblast (v2.0)20. GeneWise (v2.4.1)21 was utilized to predict the gene structures in the genomic regions identified by WUblast (v2.0). The gene structures generated by GeneWise (v2.4.1) were referred to as the Homo-set. Additionally, gene models produced by PASA (v2.5.2)22, which served as training data for de-novo gene prediction programs. Five de-novo gene prediction programs, namely AUGUSTUS (v2.5.5)23, Genscan (v1.0)24, Geneid (v1.4)25, GlimmerHMM (v3.0.1)26 and SNAP (v2013.11.29)27, were employed to predict coding regions within the repeat-masked genome. To perform transcript-based annotations, the clean data were aligned to the genome assembly using TopHat (v2.0)28, and Cufflinks (v2.1.1)29. These results were combined by EVidenceModeler (v1.1.1)22, which generated a non-redundant set of gene annotations.

The predicted protein sequences were functionally annotated through searches in five databases: NR7, InterPro4, KEGG8, Pfam5 and Swiss-Prot6. Gene Ontology (GO)30 annotation was performed using InterProScan (v5.52–86.0)31 (Table 3). Blast (v2.2.26) (E-value threshold of 1E-5) were used to align the protein sequences of Zantedeschia elliottiana to these databases for gene function annotation.

Noncoding RNA (ncRNA) annotation was conducted using tRNAScan (v1.4)32 and blast (v2.2.26)33 for predicting tRNA and rRNA, respectively. Furthermore, miRNA and snRNA were identified through alignment with the Rfam database34 using INFERNAL (v1.0)35.

Estimation of LTR retrotransposons insertion timing

The full-length LTR retrotransposons were aligned to the ClariTeRep36 datasets using blastn (blast, v2.2.26). The insertion time of each LTR retrotransposon was calculated. The alignment of the 5’ and 3’ LTRs was performed using MUSCLE (v5.1)37, and the EMBOSS software package (v6.6.0)38 was used to calculate the accumulated divergence39.

Data Records

The raw data (PacBio HiFi reads, Illumina reads, and Hi-C sequencing reads) used for genome assembly were deposited in the SRA at NCBI SRR24273711-SRR242737144043.

The RNA-seq data were deposited in the SRA at NCBI SRR24273483-SRR242734904451. The genome assembly and annotation files are available in Figshare (10.6084/m9.figshare.22656112)52 and GenBank under the accession JARZZO00000000053.

Technical Validation

Firstly, the Hi-C heatmap exhibits the accuracy of genome assembly, with relatively independent Hi-C signals observed between the 16 pseudo-chromosomes (Fig. 2a). Moreover, we aligned RNA and DNA reads to the final determined genome to assess the accuracy of genome assembly. For the alignment of DNA reads, Illumina reads were aligned using BWA (v0.7.17)54 with default parameters, while HiFi reads were aligned using minimap2 (v2.24-r1122)55 with default parameters. The mapping rate for Illumina reads was 99.02%, while the mapping rate for HiFi reads was 98.42%. For the alignment of RNA reads, transcriptomic data from different tissues were individually mapped to the final determined genome using HISAT2 (v2.2.1)56 with default parameters. The mapping rates for the respective tissue-specific transcriptomic data ranged from 93.83% to 95.23%. Furthermore, we evaluated the completeness of the genome using BUSCO (v5.4.5, embryophyta_odb10)13, and LAI (LTR_retriever, v2.9.0)14 (Table 1). Overall, these assessments individually confirmed the accuracy and completeness of the genome assembly.

Acknowledgements

This work was supported by grants from the National Natural Science Foundation of China (32071812), Beijing Academy of Agriculture and Forestry Sciences Specific Projects for Building Technology Innovation Capacity (KJCX20230108; KJCX20230801; KJCX20230811).

Author contributions

Z.W. and X.Z. designed the study and led the research. Y.W. and T.Y. wrote the draft manuscript. Y.W., T.Y., D.G. and L.C. contribute to the genome assembly and annotation. Y.W., T.Y., D.W., R.G., Y.J., D.G. and L.C. participated in genome evolution analysis. Z.W., X.Z., G.Z. and Y.Z. contributed substantially to the revisions. The final manuscript has been read and approved by all authors.

Code availability

All data processing commands and pipelines were carried out in accordance with the instructions and guidelines provided by the relevant bioinformatic software. There were no custom scripts or code utilized in this study.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Yi Wang, Tuo Yang.

Contributor Information

Xiuhai Zhang, Email: zhangxiuhai@baafs.net.cn.

Zunzheng Wei, Email: weizunzheng@163.com.

References

  • 1.Letty, C. The Genus Zantedeschia. (1973).
  • 2.Yao J. L, Rowland RE, Cohen D. Karyotype studies in the genus Zantedeschia (Araceae) S. Afr. J. Bot. 1994;60:4–7. doi: 10.1016/S0254-6299(16)30653-6. [DOI] [Google Scholar]
  • 3.De Hertogh, A. & Le Nard, M. The physiology of flower bulbs. (1993).
  • 4.Finn RD, et al. InterPro in 2017—beyond protein family and domain annotations. Nucleic Acids Res. 2016;45:D190–D199. doi: 10.1093/nar/gkw1107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Finn RD, et al. Pfam: the protein families database. Nucl. Acids Res. 2013;42:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bairoch A. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucl Acids Res. 2000;28:45–48. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.O’Leary NA, et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucl. Acids Res. 2016;44:D733–D745. doi: 10.1093/nar/gkv1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2015;44:D457–D462. doi: 10.1093/nar/gkv1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Marcais G, Kingsford C. Jellyfish: A fast k-mer counter. Tutorialis e Manuais. 2012;1:1–8. [Google Scholar]
  • 10.Ranallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020;11:1432. doi: 10.1038/s41467-020-14998-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Research4 (2015). [DOI] [PMC free article] [PubMed]
  • 12.Burton JN, et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol. 2013;31:1119–1125. doi: 10.1038/nbt.2727. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  • 14.Ou S, Chen J, Jiang N. Assessing genome assembly quality using the LTR Assembly Index (LAI) Nucl Acids Res. 2018;46:e126–e126. doi: 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jurka J, et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]
  • 16.Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35:W265–W268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21:i351–i358. doi: 10.1093/bioinformatics/bti1018. [DOI] [PubMed] [Google Scholar]
  • 18.Flynn JM, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Tarailo‐Graovac, M. & Chen, N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Curr. Protoc. Bioinformatics25 (2009). [DOI] [PubMed]
  • 20.She R, Chu JS-C, Wang K, Pei J, Chen N. genBlastA: Enabling BLAST to identify homologous gene sequences. Genome Res. 2008;19:143–149. doi: 10.1101/gr.082081.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14:988–995. doi: 10.1101/gr.1865504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9:R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Stanke M, et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34:W435–W439. doi: 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. [DOI] [PubMed] [Google Scholar]
  • 25.Guigó R. Assembling Genes from Predicted Exons in Linear Time with Dynamic Programming. J. Comput. Biol. 1998;5:681–702. doi: 10.1089/cmb.1998.5.681. [DOI] [PubMed] [Google Scholar]
  • 26.Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20:2878–2879. doi: 10.1093/bioinformatics/bth315. [DOI] [PubMed] [Google Scholar]
  • 27.Korf I. Gene finding in novel genomes. BMC Bioinform. 2004;5:1–9. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kim D, et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7:562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Ashburner M, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Mulder, N. & Apweiler, R. InterPro and InterProScan. Humana Press, 59–70 (2007). [DOI] [PubMed]
  • 32.Lowe TM, Eddy SR. tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence. Nucl Acids Res. 1997;25:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Mount, D. W. Using the Basic Local Alignment Search Tool (BLAST). Cold Spring Harb Protoc, 17 (2007). [DOI] [PubMed]
  • 34.Griffiths-Jones S. Rfam: annotating non-coding RNAs in complete genomes. Nucl Acids Res. 2004;33:D121–D124. doi: 10.1093/nar/gki081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25:1335–1337. doi: 10.1093/bioinformatics/btp157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Daron J, et al. Organization and evolution of transposable elements along the bread wheat chromosome 3B. Genome biology. 2014;15:1–15. doi: 10.1186/s13059-014-0546-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Rice P, Longden I, Bleasby A. EMBOSS: the European molecular biology open software suite. Trends Genet. 2000;16:276–277. doi: 10.1016/S0168-9525(00)02024-2. [DOI] [PubMed] [Google Scholar]
  • 39.Ma J, Bennetzen JL. Rapid recent growth and divergence of rice nuclear genomes. Proc. Natl. Acad. Sci. USA. 2004;101:12404–12410. doi: 10.1073/pnas.0403715101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273711
  • 41.2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273712
  • 42.2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273713
  • 43.2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273714
  • 44.2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273483
  • 45.2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273484
  • 46.2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273485
  • 47.2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273486
  • 48.2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273487
  • 49.2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273488
  • 50.2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273489
  • 51.2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273490
  • 52.Yang T. 2023. Genome annotation files of Zantedeschia elliottiana ‘Jingcai Yangguang. figshare. [DOI]
  • 53.Wang Y. 2023. Zantedeschia hybrid cultivar cultivar Jingcaiyangguang, whole genome shotgun sequencing project. GenBank. JARZZO000000000
  • 54.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997 (2013).
  • 55.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. 2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273711
  2. 2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273712
  3. 2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273713
  4. 2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273714
  5. 2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273483
  6. 2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273484
  7. 2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273485
  8. 2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273486
  9. 2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273487
  10. 2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273488
  11. 2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273489
  12. 2023. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/sra/SRR24273490
  13. Yang T. 2023. Genome annotation files of Zantedeschia elliottiana ‘Jingcai Yangguang. figshare. [DOI]
  14. Wang Y. 2023. Zantedeschia hybrid cultivar cultivar Jingcaiyangguang, whole genome shotgun sequencing project. GenBank. JARZZO000000000

Data Availability Statement

All data processing commands and pipelines were carried out in accordance with the instructions and guidelines provided by the relevant bioinformatic software. There were no custom scripts or code utilized in this study.


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES