Version Changes
Revised. Amendments from Version 1
We fixed some spelling mistakes and added information and links about the genome annotation of sugarcane cultivar SP80-3280
Abstract
Sugarcane commercial cultivar SP80-3280 has been used as a model for genomic analyses in Brazil. Here we present a draft genome sequence employing Illumina TruSeq Synthetic Long reads. The dataset is available from NCBI BioProject with accession PRJNA272769.
Keywords: sugarcane, long reads, polyploid, genomics
Introduction
Sugarcane is an economically important crop used as source of sugar, ethanol and electricity generation 1. Sugarcane has a haploid genome of ~1Gpb, however, modern sugarcane cultivars are polyploids derived from interspecific hybridization between S. officinarum L. and S. spontaneum L., reaching up to 130 chromosomes distributed among ~12 homo(eo)logous groups 2, 3, with a total genome size reaching 10Gpb 4. Its complex genome structure has hampered genome sequencing, assembly and annotation. Partial genomic sequences are available 5– 8, as well as transcriptome sequences 9– 11, but there are no whole genome assemblies available to date. Here we used the Illumina TruSeq Synthetic Long Read sequencing technology to survey the genome of the polyploid cultivar SP80-3280. The generated long reads, their assembly and genome annotation have been made public and will provide useful information for functional genomics studies.
Materials and methods
The leaf rolls of greenhouse grown, two-month old plants of sugarcane cultivar SP80-3280 (provided by Centro de Tecnologia Canavieira, Piracicaba, São Paulo), were collected and immediately frozen in liquid nitrogen. The plant tissue was ground up to become fine powder, and high molecular weight DNA was extracted from 100 mg of fresh frozen tissue using CTAB (Sigma-Aldrich, USA) and chloroform:isoamyl alcohol (Sigma-Aldrich, USA) as previously described 12. 6µg of DNA were sent to Illumina (CA, USA) for DNA sequencing using TruSeq Synthetic long read technology 13, through their FastTrack Sequencing Service. Sequencing was performed on an Illumina HiSeq2000 system using paired-end chemistry. Nine long read libraries, each generating approx. 600Mbps, were generated, giving an estimated coverage between 4 and 5 of the monoploid genome. A total of 1,378,917 reads longer than 1.5Kbp, or 5,642,855,018 bases, were generated. The underlying 1,966,604,928 short reads amount to 393,320,985,600bp, which would translate to an estimated coverage of 393x of the haploid genome. The maximum read length was 20,918bp, with 36% of the reads being longer than 4.5Kbp. Possible contaminants were removed by comparison against the NCBI’s nucleotide database using BLAST 14, keeping only the long reads with best hits against Viridiplantae, resulting in 1,224,061 useful for assembly. Prior to assembly, long reads originating from mitochondria (NC_008360.1) and chloroplast (NC_005878.2) were excluded using mirabait ( http://mira-assembler.sourceforge.net/). Reads longer than 1.5Kbp were assembled using Celera’s WGS Assembler v8.2 15, using similar parameters as previously described 13, except for some of the error parameters that were left in their default settings, i.e., ‘unitiger=bogart, merSize=31, ovlMinLen=100’, and the parameters ovlErrorRate, cnsErrorRate, cgwErrorRate, utgGraphErrorRate, utgGraphErrorLimit, utgMergeErrorRate, utgMergeErrorLimit. A non-redundant assembly was created using CD-HIT 16, merging 100% identical sequences and sub-sequences. RNASeq data previously generated in our group 17 for the same cultivar was exploited for gene prediction using BRAKER1 18 and PASA 19, as well as sugarcane transcript data (ESTs), and Sorghum bicolor proteins using Exonerate 20, all gene evidence was integrated to generate a high quality gene prediction set with Evidence Modeller 21, leading to 153,078 predicted protein-coding genes.
Data availability
The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2017 Riaño-Pachón DM and Mattiello L
Raw sequencing data are available at NCBI SRA; the long reads with accession number SRX845504, and the underlying short reads with accessions SRX853961 to SRX853969. The SP80-3280 assembly is available with accession number GCA_002018215.1. All data can be found under the BioProject PRJNA272769. Genome annotation is available from https://figshare.com/projects/Sugarcane_SP80-3280_draft_genome_annotation/22327
Acknowledgements
The authors are grateful to Larissa Prado da Cruz (CTBE/CNPEM) for assistance with molecular biology procedures.
Funding Statement
This work was supported by institutional funds from CTBE/CNPEM to DMRP and a Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) grant to LM (2012/23345-0). The research was developed with support from CENAPAD-SP (Centro Nacional de Processamento de Alto Desempenho em São Paulo), project UNICAMP/FINEP-MCT.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
[version 2; referees: 2 approved]
References
- 1. Long SP, Karp A, Buckeridge MS, et al. : Feedstocks for Biofuels and Bioenergy. In Bioenergy & Sustainability: bridging the gaps (eds. Souza GM, Victoria RL, Joly CA & Verdade LM), UNESCO.2015;302–347. Reference Source [Google Scholar]
- 2. Grivet L, Arruda P: Sugarcane genomics: depicting the complex genome of an important tropical crop. Curr Opin Plant Biol. 2002;5(2):122–127. 10.1016/S1369-5266(02)00234-0 [DOI] [PubMed] [Google Scholar]
- 3. D’Hont A: Unraveling the genome structure of polyploids using FISH and GISH; examples of sugarcane and banana. Cytogenet Genome Res. 2005;109(1–3):27–33. 10.1159/000082378 [DOI] [PubMed] [Google Scholar]
- 4. Le Cunff L, Garsmeur O, Raboin LM, et al. : Diploid/polyploid syntenic shuttle mapping and haplotype-specific chromosome walking toward a rust resistance gene ( Bru1) in highly polyploid sugarcane (2 n approximately 12 x approximately 115). Genetics. 2008;180(1):649–660. 10.1534/genetics.108.091355 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Miller JR, Dilley KA, Harkins DM, et al. : Initial genome sequencing of the sugarcane CP 96-1252 complex hybrid [version 1; referees: 1 approved]. F1000Res. 2017;6:688 10.12688/f1000research.11629.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Grativol C, Regulski M, Bertalan M, et al. : Sugarcane genome sequencing by methylation filtration provides tools for genomic research in the genus Saccharum. Plant J. 2014;79(1):162–172. 10.1111/tpj.12539 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Okura VK, de Souza RS, de Siqueira Tada SF, et al. : BAC-Pool Sequencing and Assembly of 19 Mb of the Complex Sugarcane Genome. Front Plant Sci. 2016;7:342. 10.3389/fpls.2016.00342 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. de Setta N, Monteiro-Vitorello CB, Metcalfe CJ, et al. : Building the sugarcane genome for biotechnology and identifying evolutionary trends. BMC Genomics. 2014;15(1):540. 10.1186/1471-2164-15-540 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Mattiello L, Riaño-Pachón DM, Martins MC, et al. : Physiological and transcriptional analyses of developmental stages along sugarcane leaf. BMC Plant Biol. 2015;15:300. 10.1186/s12870-015-0694-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Hoang NV, Furtado A, Mason PJ, et al. : A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing. BMC Genomics. 2017;18(1):395. 10.1186/s12864-017-3757-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Belesini AA, Carvalho FMS, Telles BR, et al. : De novo transcriptome assembly of sugarcane leaves submitted to prolonged water-deficit stress. Genet Mol Res. 2017;16(2). 10.4238/gmr16028845 [DOI] [PubMed] [Google Scholar]
- 12. Porebski S, Bailey LG, Baum BR: Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol Biol Rep. 1997;15(1):8–15. 10.1007/BF02772108 [DOI] [Google Scholar]
- 13. McCoy RC, Taylor RW, Blauwkamp TA, et al. : Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS One. 2014;9(9): e106689. 10.1371/journal.pone.0106689 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Altschul SF, Gish W, Miller W, et al. : Basic local alignment search tool. J Mol Biol. 1990;215(3): 403–410. 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
- 15. Myers EW, Sutton GG, Delcher AL, et al. : A Whole-Genome Assembly of Drosophila. Science. 2000;287(5461):2196–2204. 10.1126/science.287.5461.2196 [DOI] [PubMed] [Google Scholar]
- 16. Fu L, Niu B, Zhu Z, et al. : CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23): 3150–3152. 10.1093/bioinformatics/bts565 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Riaño-Pachón DM, Mattiello L, Cruz LP: Surveying the complex polyploid sugarcane genome sequence using synthetic long reads. Technical Memorandum Centro Nacional de Pesquisa em Energia e Materiais.2016. 10.13140/RG.2.1.3468.0565 [DOI] [Google Scholar]
- 18. Hoff KJ, Lange S, Lomsadze A, et al. : BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2016;32(5):767–9. 10.1093/bioinformatics/btv661 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Haas BJ, Delcher AL, Mount SM, et al. : Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31(19):5654–66. 10.1093/nar/gkg770 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Slater GS, Birney E: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31. 10.1186/1471-2105-6-31 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Haas BJ, Salzberg SL, Zhu W, et al. : Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9(1):R7. 10.1186/gb-2008-9-1-r7 [DOI] [PMC free article] [PubMed] [Google Scholar]