Skip to main content
F1000Research logoLink to F1000Research
. 2017 Jul 3;6:861. Originally published 2017 Jun 9. [Version 2] doi: 10.12688/f1000research.11859.2

Draft genome sequencing of the sugarcane hybrid SP80-3280

Diego Mauricio Riaño-Pachón 1,2,a, Lucia Mattiello 1,3
PMCID: PMC5499785  PMID: 28713559

Version Changes

Revised. Amendments from Version 1

We fixed some spelling mistakes and added information and links about the genome annotation of sugarcane cultivar SP80-3280

Abstract

Sugarcane commercial cultivar SP80-3280 has been used as a model for genomic analyses in Brazil. Here we present a draft genome sequence employing Illumina TruSeq Synthetic Long reads. The dataset is available from NCBI BioProject with accession PRJNA272769.

Keywords: sugarcane, long reads, polyploid, genomics

Introduction

Sugarcane is an economically important crop used as source of sugar, ethanol and electricity generation 1. Sugarcane has a haploid genome of ~1Gpb, however, modern sugarcane cultivars are polyploids derived from interspecific hybridization between S. officinarum L. and S. spontaneum L., reaching up to 130 chromosomes distributed among ~12 homo(eo)logous groups 2, 3, with a total genome size reaching 10Gpb 4. Its complex genome structure has hampered genome sequencing, assembly and annotation. Partial genomic sequences are available 58, as well as transcriptome sequences 911, but there are no whole genome assemblies available to date. Here we used the Illumina TruSeq Synthetic Long Read sequencing technology to survey the genome of the polyploid cultivar SP80-3280. The generated long reads, their assembly and genome annotation have been made public and will provide useful information for functional genomics studies.

Materials and methods

The leaf rolls of greenhouse grown, two-month old plants of sugarcane cultivar SP80-3280 (provided by Centro de Tecnologia Canavieira, Piracicaba, São Paulo), were collected and immediately frozen in liquid nitrogen. The plant tissue was ground up to become fine powder, and high molecular weight DNA was extracted from 100 mg of fresh frozen tissue using CTAB (Sigma-Aldrich, USA) and chloroform:isoamyl alcohol (Sigma-Aldrich, USA) as previously described 12. 6µg of DNA were sent to Illumina (CA, USA) for DNA sequencing using TruSeq Synthetic long read technology 13, through their FastTrack Sequencing Service. Sequencing was performed on an Illumina HiSeq2000 system using paired-end chemistry. Nine long read libraries, each generating approx. 600Mbps, were generated, giving an estimated coverage between 4 and 5 of the monoploid genome. A total of 1,378,917 reads longer than 1.5Kbp, or 5,642,855,018 bases, were generated. The underlying 1,966,604,928 short reads amount to 393,320,985,600bp, which would translate to an estimated coverage of 393x of the haploid genome. The maximum read length was 20,918bp, with 36% of the reads being longer than 4.5Kbp. Possible contaminants were removed by comparison against the NCBI’s nucleotide database using BLAST 14, keeping only the long reads with best hits against Viridiplantae, resulting in 1,224,061 useful for assembly. Prior to assembly, long reads originating from mitochondria (NC_008360.1) and chloroplast (NC_005878.2) were excluded using mirabait ( http://mira-assembler.sourceforge.net/). Reads longer than 1.5Kbp were assembled using Celera’s WGS Assembler v8.2 15, using similar parameters as previously described 13, except for some of the error parameters that were left in their default settings, i.e., ‘unitiger=bogart, merSize=31, ovlMinLen=100’, and the parameters ovlErrorRate, cnsErrorRate, cgwErrorRate, utgGraphErrorRate, utgGraphErrorLimit, utgMergeErrorRate, utgMergeErrorLimit. A non-redundant assembly was created using CD-HIT 16, merging 100% identical sequences and sub-sequences. RNASeq data previously generated in our group 17 for the same cultivar was exploited for gene prediction using BRAKER1 18 and PASA 19, as well as sugarcane transcript data (ESTs), and Sorghum bicolor proteins using Exonerate 20, all gene evidence was integrated to generate a high quality gene prediction set with Evidence Modeller 21, leading to 153,078 predicted protein-coding genes.

Data availability

The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2017 Riaño-Pachón DM and Mattiello L

Raw sequencing data are available at NCBI SRA; the long reads with accession number SRX845504, and the underlying short reads with accessions SRX853961 to SRX853969. The SP80-3280 assembly is available with accession number GCA_002018215.1. All data can be found under the BioProject PRJNA272769. Genome annotation is available from https://figshare.com/projects/Sugarcane_SP80-3280_draft_genome_annotation/22327

Acknowledgements

The authors are grateful to Larissa Prado da Cruz (CTBE/CNPEM) for assistance with molecular biology procedures.

Funding Statement

This work was supported by institutional funds from CTBE/CNPEM to DMRP and a Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) grant to LM (2012/23345-0). The research was developed with support from CENAPAD-SP (Centro Nacional de Processamento de Alto Desempenho em São Paulo), project UNICAMP/FINEP-MCT.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[version 2; referees: 2 approved]

References

  • 1. Long SP, Karp A, Buckeridge MS, et al. : Feedstocks for Biofuels and Bioenergy. In Bioenergy & Sustainability: bridging the gaps (eds. Souza GM, Victoria RL, Joly CA & Verdade LM), UNESCO.2015;302–347. Reference Source [Google Scholar]
  • 2. Grivet L, Arruda P: Sugarcane genomics: depicting the complex genome of an important tropical crop. Curr Opin Plant Biol. 2002;5(2):122–127. 10.1016/S1369-5266(02)00234-0 [DOI] [PubMed] [Google Scholar]
  • 3. D’Hont A: Unraveling the genome structure of polyploids using FISH and GISH; examples of sugarcane and banana. Cytogenet Genome Res. 2005;109(1–3):27–33. 10.1159/000082378 [DOI] [PubMed] [Google Scholar]
  • 4. Le Cunff L, Garsmeur O, Raboin LM, et al. : Diploid/polyploid syntenic shuttle mapping and haplotype-specific chromosome walking toward a rust resistance gene ( Bru1) in highly polyploid sugarcane (2 n approximately 12 x approximately 115). Genetics. 2008;180(1):649–660. 10.1534/genetics.108.091355 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Miller JR, Dilley KA, Harkins DM, et al. : Initial genome sequencing of the sugarcane CP 96-1252 complex hybrid [version 1; referees: 1 approved]. F1000Res. 2017;6:688 10.12688/f1000research.11629.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Grativol C, Regulski M, Bertalan M, et al. : Sugarcane genome sequencing by methylation filtration provides tools for genomic research in the genus Saccharum. Plant J. 2014;79(1):162–172. 10.1111/tpj.12539 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Okura VK, de Souza RS, de Siqueira Tada SF, et al. : BAC-Pool Sequencing and Assembly of 19 Mb of the Complex Sugarcane Genome. Front Plant Sci. 2016;7:342. 10.3389/fpls.2016.00342 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. de Setta N, Monteiro-Vitorello CB, Metcalfe CJ, et al. : Building the sugarcane genome for biotechnology and identifying evolutionary trends. BMC Genomics. 2014;15(1):540. 10.1186/1471-2164-15-540 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Mattiello L, Riaño-Pachón DM, Martins MC, et al. : Physiological and transcriptional analyses of developmental stages along sugarcane leaf. BMC Plant Biol. 2015;15:300. 10.1186/s12870-015-0694-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Hoang NV, Furtado A, Mason PJ, et al. : A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing. BMC Genomics. 2017;18(1):395. 10.1186/s12864-017-3757-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Belesini AA, Carvalho FMS, Telles BR, et al. : De novo transcriptome assembly of sugarcane leaves submitted to prolonged water-deficit stress. Genet Mol Res. 2017;16(2). 10.4238/gmr16028845 [DOI] [PubMed] [Google Scholar]
  • 12. Porebski S, Bailey LG, Baum BR: Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol Biol Rep. 1997;15(1):8–15. 10.1007/BF02772108 [DOI] [Google Scholar]
  • 13. McCoy RC, Taylor RW, Blauwkamp TA, et al. : Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS One. 2014;9(9): e106689. 10.1371/journal.pone.0106689 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Altschul SF, Gish W, Miller W, et al. : Basic local alignment search tool. J Mol Biol. 1990;215(3): 403–410. 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
  • 15. Myers EW, Sutton GG, Delcher AL, et al. : A Whole-Genome Assembly of Drosophila. Science. 2000;287(5461):2196–2204. 10.1126/science.287.5461.2196 [DOI] [PubMed] [Google Scholar]
  • 16. Fu L, Niu B, Zhu Z, et al. : CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23): 3150–3152. 10.1093/bioinformatics/bts565 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Riaño-Pachón DM, Mattiello L, Cruz LP: Surveying the complex polyploid sugarcane genome sequence using synthetic long reads. Technical Memorandum Centro Nacional de Pesquisa em Energia e Materiais.2016. 10.13140/RG.2.1.3468.0565 [DOI] [Google Scholar]
  • 18. Hoff KJ, Lange S, Lomsadze A, et al. : BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2016;32(5):767–9. 10.1093/bioinformatics/btv661 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Haas BJ, Delcher AL, Mount SM, et al. : Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31(19):5654–66. 10.1093/nar/gkg770 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Slater GS, Birney E: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31. 10.1186/1471-2105-6-31 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Haas BJ, Salzberg SL, Zhu W, et al. : Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9(1):R7. 10.1186/gb-2008-9-1-r7 [DOI] [PMC free article] [PubMed] [Google Scholar]
F1000Res. 2017 Jun 21. doi: 10.5256/f1000research.12814.r23667

Referee response for version 1

Chakravarthi Mohan 1

The data note entitled ' Draft genome sequencing of the sugarcane hybrid SP80-3280' is perhaps the first report describing the whole genome of sugarcane, a complex polyploid and its availability in NCBI will be a boon to sugarcane researchers.

The study is well planned, executed and well drafted. The data presented here would be particularly useful for functional genomic studies in sugarcane.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2017 Jun 23.
Diego Mauricio Riaño-Pachón 1

Dear Dr. Mohan,

thanks you for your review of our data note. In version 2 of the note we have added links for the genome annotation in addition to the genome assembly.

Best regards,

Diego

F1000Res. 2017 Jun 15. doi: 10.5256/f1000research.12814.r23398

Referee response for version 1

Jason Miller 1

Summary:

The Data Note, "Draft genome sequencing of the sugarcane hybrid SP80-3280", describes a sugarcane genome assembly that is available at NCBI. The TruSeq method was applied to a monoploid sugarcane cultivar to generate a 1.2 gigabase assembly with a 8433 contig N50 according to GenBank. This is the first sugarcane genome assembly so it will be of interest to the field. This data note is especially useful because it describes the sequence filtering by size, blast, mirabit, and cd-hit prior to release.

Suggestions:

The sentence, “there are not whole genome assemblies available”, probably should say “there are no whole genome assemblies available”. The text could be made clearer by presenting all the statics for underlying short reads before getting to the synthetic long read stats, and by specifying that the blast filter was applied to the long reads. I would appreciate a reference for Celera Assembler, but that is just me.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2017 Jun 23.
Diego Mauricio Riaño-Pachón 1

Dear Dr. Miller,

thank you very much for your review of our data note. We have followed your main suggestions, and they are available as version 2 of the data note.

Best regards,

Diego

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2017 Riaño-Pachón DM and Mattiello L

    Raw sequencing data are available at NCBI SRA; the long reads with accession number SRX845504, and the underlying short reads with accessions SRX853961 to SRX853969. The SP80-3280 assembly is available with accession number GCA_002018215.1. All data can be found under the BioProject PRJNA272769. Genome annotation is available from https://figshare.com/projects/Sugarcane_SP80-3280_draft_genome_annotation/22327


    Articles from F1000Research are provided here courtesy of F1000 Research Ltd

    RESOURCES