Skip to main content
Data in Brief logoLink to Data in Brief
. 2018 Jan 8;17:256–260. doi: 10.1016/j.dib.2018.01.002

Draft genome assembly of Colletotrichum musae, the pathogen of banana fruit

Wilson José da Silva Junior a,, Raul Maia Falcão b, Lucas Christian de Sousa-Paula b, Nicolau Sbaraini c, Willie Anderson dos Santos Vieira a, Waléria Guerreiro Lima d, Sérgio de Sá Leitão Paiva Junior b, Charley Christian Staats c, Augusto Schrank c, Ana Maria Benko-Iseppon b, Valdir de Queiroz Balbino b,1, Marcos Paz Saraiva Câmara a,1
PMCID: PMC5790810  PMID: 29387740

Abstract

Colletotrichum musae is an important cosmopolitan pathogenic fungus that causes anthracnose in banana fruit. The entire genome of C. musae isolate GM20 (CMM 4420), originally isolated from infected banana fruit from Alagoas State, Brazil, was sequenced and annotated. The pathogen genomic DNA was sequenced on HiSeq Illumina platform. The C. musae GM20 genome has 50,635,197 bp with G + C content of 53.74% and in its present assembly has 2763 scaffolds, harboring 13,451 putative genes with an average length of 1626 bp. Gene prediction and annotation was performed by Funannotate pipeline, using a pattern for gene identification based on BUSCO.


Specifications Table

Subject area Biology
More specific subject area Microbiology, Agricultural, Genomics.
Type of data Genome sequence data
How data was acquired Illumina HiSeq. 2500 Next Generation Platforms
Data format Assembled genome sequence.
Experimental factors Genomic DNA was extract from mycelial growth in culture medium.
Experimental features Genome of Colletotrichum musae strain GM20 was sequenced and assembled.
Data source location Colletotrichum musae strain GM20 was isolated from banana lesions, in Maceio, Pernambuco Brazil.
Data accessibility The Colleotrichum musae GM20 genome is available in DDBJ/ENA/GenBank under the accession number NWMS01000000.
Related research article
Data accessibility https://www.ncbi.nlm.nih.gov/nuccore/NWMS00000000

Value of the Data

  • Colletotrichum musae is the causal agent of anthracnose in banana fruits, the main disease post-harvest worldwide.

  • This is the first genome sequence of Colletotrichum musae using next-generation sequencing available in public database.

  • The published genome data herein will facilitate biology, pathogenicity, evolution and interaction pathogen-host studies of Colletotrichum musae, through comparative genomes studies of Colletotrichum spp. and related species.

1. Data

Fungi infection in plants is the most frequent cause of extensive loses in Agriculture. The fact that many endophytic fungi can case infection adds further complexity to fungal plant pathogens. Banana (Musa sp.) is one of the world's important food crops and a staple food for more than 400 million people [1]. Over 100 million tons are produced worldwide at some 5 million hectares and the cultivated area is expected to increase in the future [2]. However, banana fruits are highly susceptible to pathogens, and anthracnose disease caused by fungi from Colletotrichum genus is amongst the most frequents. Colletotrichum comprises over 100 species that are able to infect and damage diverse crops around the world [3].

Due to its ubiquity, substantial destruction capacity and scientific importance as a model of pathosystems, Colletotrichum spp. are among the top 10 of most important plant pathogens according to the international community of plant pathology researchers [4]. Colletotrichum musae (Berk. and M.A. Curtis), the causative agent of anthracnose, is a major post-harvest pathogen of banana fruits and causes severe global crop losses [5]. The disease develops from a latent fungal infection during pre-harvest, originated from spores that are present in immature fruits in the field. Symptoms, such as patches on the bark (brown to black color) and depressed lesions, appear in the ripening of the fruits. Furthermore, under high humidity, the formation of salmon-colored acervuli can be observed [6]. The infection thus accounts for a reduction in fruit viability during maturation, transport and storage periods [7], leading to a commercial depreciation and shortening fruit's shelf life.

To circumvent post-harvest losses, chemical fungicides are usually adopted, but other side-methods (e.g., radiation treatment, hot water removal, refrigeration, induced resistance and biological control agents) have also been applied [8]. However, chemical fungicide usage has been limited by potential harmful effects to human health and environment. Besides, fungal pathogens are known to quickly develop resistance to chemical defensives [9].

Furthermore, the absence of available genomic sequences from C. musae is one of the main limitations for best characterization of fungal virulence determinants and development of improved management strategies. Here we report, for the first time, the whole genome sequence of the C. musae strain GM20 (CMM 4420) isolated from infected banana fruit from Alagoas, Brazilian Northeast State.

In recent years, several phytopathogenic fungal genomes have been published boosting the discovery of virulence determinants in these species. Expectedly, our analysis will encourage further studies of C. musae biology, which should provide better details about host-pathogen interaction, leading to new management measures.

2. Experimental design, materials, and methods

2.1. DNA extraction and genome sequence

The GM20 isolate of C. musae was cultured, and DNA was extracted as previously described [10]. Whole shotgun genome sequence of C. musae GM20 was generated using the Illumina HiSeq. 2500 platform (Illumina, San Diego, CA) at the Center for Functional Genomics - Universidade de São Paulo (Piracibaba, Brazil). The libraries were prepared with the Illumina Nextera XT DNA Library Prep Kit (Illumina, San Diego, CA) and the sequencing was performed on a HiSeq Flow Cell v4 with HiSeq SBS Kit v4 (Illumina, San Diego, CA), leading to 100 bp paired-reads (2×).

2.2. De novo assembly and genes annotation

The shotgun sequencing produced 13,273,851 paired reads. Initially, FastQC [11] was applied to analyse reads quality, and adapters were trimmed using FASTX-Toolkit 0.0.13 (http://hannonlab.cshl.edu/fastx_toolkit). Originally, three assemblers were tested: ABySS 2.0.2 [12]; SPAdes 1.10 [13]; Velvet 1.1 [14], with SPAdes showing the best results (12,435 contigs >500 bp). Additionally, Redundans [15] posteriorly ran for scaffolds assembly.

Assembly statistics were generated by QUAST 3.9 (Table 1) [16]. Gene prediction and annotation was carried out with Funannotate pipeline [17] BUSCO 2.0 [18] [parameters: Sordariomycetes database (Verticillium longporum selected as closely-related species)] to generate the training files for two genome predictors: GeneMark-ES [19] and AUGUSTUS [20]. Moreover, BUSCO 2.0 was employed to evaluate genome completeness, based on conservation of single-copy benchmarking universal single-copy orthologs (BUSCOs).

Table 1.

Genome assembly statics for Colletotrichum musae GM20.

C. musae GM20
Assembly size 50.7 Mb
Coverage sequencing 100×
Sequencing technology Illumina HiSeq. 2500
Number of scaffolds 2763
N50 scaffolds length 32,818
Number of contigs 10,618
Number of predicts genes 13,451
Overall GC content 53.74
Public access to genome NWMS01000000

The final assembly of the C. musae GM20 genome was determined to be 50,635,197 bp with a G+C content of 53.74% in 2763 scaffolds (maximum 208,119 bp; N50 32,818 bp), and 13,451 genes were predicted. This whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession number NWMS00000000. The version described is this paper is version NWMS01000000.

BUSCO analysis showed a high degree of completeness with a BUSCO score of 96.3%, of which 1263 genes were complete BUSCOs, four were complete duplicated BUSCOs, 23 were fragmented BUSCOs, and 25 were missing BUSCO orthologs out of the 1315 BUSCO groups searched.

Acknowledgments

The authors thank to CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, Brazil) (23038.010050/2013-04) Bio-Computational Program and CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) (310871/2014-0) for financial support and fellowships.

Footnotes

Transparency document

Supplementary data associated with this article can be found in the online version at doi:10.1016/j.dib.2018.01.002.

Contributor Information

Wilson José da Silva Junior, Email: wilson_jsjunior@hotmail.com.

Raul Maia Falcão, Email: rmf4@cin.ufpe.br.

Lucas Christian de Sousa-Paula, Email: lcsousapaula@gmail.com.

Nicolau Sbaraini, Email: nicolausbaraini@icloud.com.

Willie Anderson dos Santos Vieira, Email: andersonvieira12@gmail.com.

Waléria Guerreiro Lima, Email: wagueli@hotmail.com.

Sérgio de Sá Leitão Paiva Junior, Email: sslpaiva@gmail.com.

Charley Christian Staats, Email: staats@ufrgs.br.

Augusto Schrank, Email: aschrank@cbiot.ufrgs.br.

Ana Maria Benko-Iseppon, Email: ana.iseppon@gmail.com.

Valdir de Queiroz Balbino, Email: valdir@ufpe.br.

Marcos Paz Saraiva Câmara, Email: marcos.camara@ufrpe.br.

Transparency document. Supplementary material

Supplementary material

mmc1.docx (13.1KB, docx)

References

  • 1.Holscher D., Dhakshinamoorthy S., Alexandrov T., Becker M., Bretschneider T., Buerkert A., Crecelius A.C., De Waele D., Elsen A., Heckel D.G., Heklau H., Hertweck C., Kai M., Knop K., Krafft C., Maddula R.K., Matthaus C., Popp J., Schneider B., Schubert U.S., Sikora R. a, Svato A., Swennen R.L. Phenalenone-type phytoalexins mediate resistance of banana plants (Musa spp.) to the burrowing nematode Radopholus similis. Proc. Natl. Acad. Sci. 2014;111:105–110. doi: 10.1073/pnas.1314168110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.FAO, Food and agricultural organization. 〈http://www.fao.org/home/en/〉, 2017 (Accessed 01 Jan 2017).
  • 3.Cannon P.F., Damm U., Johnston P.R., Weir B.S. Colletotrichum - current status and future directions. Stud. Mycol. 2012;73:181–213. doi: 10.3114/sim0014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Dean R., Van Kan J.A.L., Pretorius Z.A., Hammond-Kosack K.E., Di Pietro A., Spanu P.D., Rudd J.J., Dickman M., Kahmann R., Ellis J., Foster G.D. The Top 10 fungal pathogens in molecular plant pathology. Mol. Plant Pathol. 2012;13 doi: 10.1111/j.1364-3703.2011.00783.x. (804–804) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Maqbool M., Ali A., Ramachandran S., Smith D.R., Alderson P.G. Control of postharvest anthracnose of banana using a new edible composite coating. Crop Prot. 2010;29:1136–1141. [Google Scholar]
  • 6.Ranasinghe L.S., Jayawardena B., Abeywickrama K. Use of waste generated from cinnamon bark oil (Cinnamomum zeylanicum Blume) extraction as a post harvest treatment for Embul banana. J. Food Agric. Environ. 2003;1:340–344. 〈http://www.world-food.net〉 [Google Scholar]
  • 7.Slabaugh W.R., Grove M.D. Postharvest diseases of bananas and their control. Plant Dis. 1982;66:746–750. [Google Scholar]
  • 8.Zhimo V.Y., Dilip D., Sten J., Ravat V.K., Bhutia D.D., Panja B., Saha J. Antagonistic Yeasts for Biocontrol of the banana postharvest anthracnose pathogen Colletotrichum musae. J. Phytopathol. 2017;165:35–43. [Google Scholar]
  • 9.Sonah H., Deshmukh R.K., Bélanger R.R. Computational prediction of effector proteins in fungi: opportunities and challenges. Front. Plant Sci. 2016;7:1–14. doi: 10.3389/fpls.2016.00126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Doyle J.J.D.J.L.J.J., Doyle J.J.D.J.L.J.J. Isolation of plant DNA from fresh tissue. Focus (Madison) 1990;12:13–15. [Google Scholar]
  • 11.Andrews S. FastQC: a quality control tool for high throughput sequence data. Babraham Bioinform. 2010 http://www.bioinformatics.babraham.ac.uk/projects/ (citeulike-article-id:11583827) [Google Scholar]
  • 12.Simpson J.T., Wong K., Jackman S.D., Schein J.E., Jones S.J.M., Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19:1117–1123. doi: 10.1101/gr.089532.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M., Kulikov A.S., Lesin V.M., Nikolenko S.I., Pham S., Prjibelski A.D., Pyshkin A.V., Sirotkin A.V., Vyahhi N., Tesler G., Alekseyev M.A., Pevzner P.A. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zerbino D.R. Using the Velvet de novo assembler for short-read sequencing technologies. Curr. Protoc. Bioinform. 2010 doi: 10.1002/0471250953.bi1105s31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Pryszcz L.P., Gabaldón T. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 2016;44:e113. doi: 10.1093/nar/gkw294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gurevich A., Saveliev V., Vyahhi N., Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.J.M. Palmer, Funannotate: a Fungal Genome Annotation and Comparative Genomics Pipeline〈https://github.com/nextgenusfs/funannotate〉, 2016.
  • 18.Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  • 19.Besemer J. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 2001;29:2607–2618. doi: 10.1093/nar/29.12.2607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Stanke M., Steinkamp R., Waack S., Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004;32 doi: 10.1093/nar/gkh379. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.docx (13.1KB, docx)

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES