ABSTRACT
Here, we report a hybrid genome of Marasmius tenuissimus strain MS-2, a cacao thread blight disease causing isolate that was collected from cacao leaves in Tafo, Eastern region, Ghana. The final assembly consists of 2,083 contigs spanning 69,843,039 bp, with 49.21% GC content, 92.6% BUSCO completeness, and scaffold N50 186,871.
KEYWORDS: plant pathology, cacao, mycology, marasmiacaea, thread blight, foliar disease, Theobroma, Marasmius
ANNOUNCEMENT
Thread blight disease (TBD) is a foliar disease affecting cacao plantation regions in Africa, South America, and Asia (1–3). TBD is caused by fungi in the Marasmiinaea suborder of basidiomycetous fungi, associated with hyphal aggregates called rhizomorphs (4, 5). Surveys have shown that Marasmius tenuissimus was the most frequently isolated fungus associated with the disease in Ghana and Peru (2, 5). Here, we sequenced the genome of M. tenuissimus strain MS-2, isolated from TBD-infected cacao in Tafo, Eastern region, Ghana, using a hybrid sequencing approach combining Illumina paired-end HiSeq sequencing with Oxford Nanopore Technology (ONT) MinION sequencing as an in-house alternative to our previous use of SMRT sequencing to assemble M. tenuissimus strain GH-37 (6). We supplemented short-read sequencing with long-reads to improve the contiguity of the assembly (7).
Mycelia were cultured in V8 following established methods (4). Mycelial tissue filtered from the liquid was frozen in liquid nitrogen and lyophilized for 3 days before isolating DNA using the CTAB method as previously described (5). Genomic paired-end libraries for MS-2 were generated using internal methods (electrophoresis-based fragment selection) by BGI Genomics (BGI-Shenzhen, Shenzhen, China) and sequenced via Illumina HiSeq 2000. CTAB-extracted DNA was sequenced at USDA-BARC-SPCL using an ONT MinION R10.4.1 flow cell following ONT Ligation Sequencing Kit V14 library preparation.
Illumina adaptors were trimmed from the short-read sequences using Trimmomatic v.39 (ILLUMINACLIP:2:30:10:2:True LEADING:20 TRAILING:20 SLIDINGWINDOW:5:20) (8). Genome size was predicted using GenomeScope (9) with Jellyfish v2.2.9 (10) with k = 21. Short-reads were assembled using ABySS v2.3.5 (k = 110, kc = 2) (11) and scaffold quality for parameters was evaluated using Quast v5.2.0 (12) and BUSCO v5.4.5 (13) with agaricales_odb10. Haplotigs were removed using PurgeHaplotigs v1.1.2 (a = 50) (14). The assembly was polished using Pilon v1.23 (8 rounds) (15).
Assembly of the ONT reads was performed using SMARTdenovo (SD) v1.0.0 (16) and Flye v2.9.1 (--nano-hq) (17), followed by purging with PurgeHaplotigs and polishing with Pilon. SD and Flye assemblies were polished for 9 and 11 rounds, respectively. Assemblies were merged using Quickmerge v0.3 (18) with the SD assembly as the backbone. Quality was assessed with Quast and BUSCO as described above.
The polished Abyss assembly and the merged SD + Flye assembly were concatenated to generate a single assembly. Duplicate sequence information was purged via PurgeHaplotigs, and the final assembly was polished with Pilon (six rounds). RepeatModeler v2.0.4 (19) was used to discover novel repetitive elements, which were masked, along with RepBase v21.09 repeats (20), by RepeatMasker v4.1.5 (21).
The Funannotate v1.8.16 (22) predict pipeline was used with default parameters to predict genes from the final assembly, which utilized AUGUSTUS v3.5.0, CodingQuarry v2.0, GeneMark-ES v4.71, glimmerHMM v3.0.4, Snap v2006-07-28, tRNAscan-SE v2.0.9, and EvidenceModeler v1.1.1 (23–29). InterProScan v5.60–92.0 and EggNOG-mapper v2.1.12 were used for functional annotations (30, 31).
Total bases yielded was 1,594,611,459 bases from ONT (22.8× coverage) and 11,212,144,134 bases from Illumina (160.5× coverage). The final assembly had a sequence length of 69,843,039 bp in 2,083 scaffolds, with 49.21% GC, scaffold N50 186,871, and 92.6% BUSCO completeness (single 89.8%, duplicate 2.8%). Summary statistics are shown in Table 1.
TABLE 1.
Statistics for the final hybrid-genome of strain MS-2
| Feature | Value |
|---|---|
| Long read bases sequenced (coverage) | 1,594,611,459 (22.8×) |
| Long read N50 (bp) | 5,091 |
| Short read bases sequenced (coverage) | 11,212,144,134 (160.5×) |
| Assembly size (bp) | 69,843,039 |
| Scaffold number | 2,083 |
| Largest scaffold (bp) | 1,805,509 |
| Scaffold N50 (bp) | 186,871 |
| GC content | 49.21% |
| BUSCO completeness (%) | 92.6 |
| Repetitive content (%) | 16.6 |
ACKNOWLEDGMENTS
This research was supported by the USDA ARS In-House Project No. 8042-21000-303-000-D. I.K.B. was supported by an appointment to the Research Participation Program at USDA ARS administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy (DOE) and ARS and managed by Oak Ridge Associated Universities under DOE contract number DE-SC0014664. This research used resources provided by the SCINet project and/or the AI Center of Excellence of the USDA ARS, ARS project numbers 0201-88888-003-000D and 0201-88888-002-000D. Mention of trade names or commercial products in this report is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture (USDA). USDA is an equal opportunity lender, provider, and employer.
Contributor Information
Stephen P. Cohen, Email: Stephen.Cohen@usda.gov.
Jennifer Geddes-McAlister, University of Guelph, Guelph, Ontario, Canada.
DATA AVAILABILITY
Unprocessed sequence reads, the assembly, and annotations have been deposited to NCBI under BioProject accession PRJNA1077360. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JBBXMP000000000. The version described in this paper is version JBBXMP010000000.
REFERENCES
- 1. Amoako-Attah I., Akrofi AY, Rashid BH, Mercy A, Kumi-Asare E. 2016. White thread blight disease caused by Marasmiellus scandens (Massee) Dennis & Reid on cocoa and its control in Ghana. Afr J Agric Res 11:5064–5070. doi: 10.5897/AJAR2016.11681 [DOI] [Google Scholar]
- 2. Huamán-Pilco AF, Cruz M, Aime MC, Leiva-Espinoza ST, Oliva-Cruz SM, Díaz-Valderrama JR. 2023. First report of thread blight caused by Marasmius tenuissimus on cacao (Theobroma cacao) in Peru. Plant Dis 107:219. doi: 10.1094/PDIS-02-22-0420-PDN [DOI] [Google Scholar]
- 3. Wannathes N, Desjardin DE, Hyde KD, Perry BA, Lumyong S. 2009. A monograph of Marasmius (Basidiomycota) from northern Thailand based on morphological and molecular (ITS sequences) data. Fungal Divers 37:209–306. [Google Scholar]
- 4. Ali SS, Amoako-Attah I, Shao J, Kumi-Asare E, Meinhardt LW, Bailey BA. 2021. Mitochondrial genomics of six cacao pathogens from the basidiomycete family Marasmiaceae. Front Microbiol 12:752094. doi: 10.3389/fmicb.2021.752094 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Amoako-Attah I, Shahin AS, Aime MC, Odamtten GT, Cornelius E, Nyaku ST, Kumi-Asare E, Yahaya HB, Bailey BA. 2020. Identification and characterization of fungi causing thread blight diseases on cacao in Ghana. Plant Dis 104:3033–3042. doi: 10.1094/PDIS-03-20-0565-RE [DOI] [PubMed] [Google Scholar]
- 6. Leung J, Cohen SP, Baruah IK, Ali SS, Shao J, Bukari Y, Amoako-Attah I, Meinhardt LW, Bailey BA. 2023. A draft genome resource for Marasmius tenuissimus, an emerging causal agent of thread blight disease in cacao. PhytoFrontiers 3:863–866. doi: 10.1094/PHYTOFR-03-23-0027-A [DOI] [Google Scholar]
- 7. Saud Z, Kortsinoglou AM, Kouvelis VN, Butt TM. 2021. Telomere length de novo assembly of all 7 chromosomes and mitogenome sequencing of the model entomopathogenic fungus, Metarhizium brunneum, by means of a novel assembly pipeline. BMC Genomics 22:87. doi: 10.1186/s12864-021-07390-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi: 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, Schatz MC. 2017. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33:2202–2204. doi: 10.1093/bioinformatics/btx153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Marçais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–770. doi: 10.1093/bioinformatics/btr011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I. 2009. ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123. doi: 10.1101/gr.089532.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075. doi: 10.1093/bioinformatics/btt086 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. 2021. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol 38:4647–4654. doi: 10.1093/molbev/msab199 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Roach MJ, Schmidt SA, Borneman AR. 2018. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19:460. doi: 10.1186/s12859-018-2485-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. 2014. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9:e112963. doi: 10.1371/journal.pone.0112963 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Liu H, Wu S, Li A, Ruan J. 2021. SMARTdenovo: a de novo assembler using long noisy reads. GigaByte 2021:1–9. doi: 10.46471/gigabyte.15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Kolmogorov M, Yuan J, Lin Y, Pevzner PA. 2019. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37:540–546. doi: 10.1038/s41587-019-0072-8 [DOI] [PubMed] [Google Scholar]
- 18. Chakraborty M, Baldwin-Brown JG, Long AD, Emerson JJ. 2016. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Res 44:e147. doi: 10.1093/nar/gkw654 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF. 2020. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA 117:9451–9457. doi: 10.1073/pnas.1921046117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Bao W, Kojima KK, Kohany O. 2015. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA 6:11. doi: 10.1186/s13100-015-0041-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Smit AFA, Hubley R, Green P. 2013. RepeatMasker open-4.0., v1.4.5. Available from: http://www.repeatmasker.org
- 22. Palmer JM, Stajich J. 2020. Funannotate v1.8.1: eukaryotic genome annotation (v1.8.1), v1.8.16. Available from: https://zenodo.org/records/4054262
- 23. Stanke M, Diekhans M, Baertsch R, Haussler D. 2008. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24:637–644. doi: 10.1093/bioinformatics/btn013 [DOI] [PubMed] [Google Scholar]
- 24. Testa AC, Hane JK, Ellwood SR, Oliver RP. 2015. CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts. BMC Genomics 16:170. doi: 10.1186/s12864-015-1344-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. 2008. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res 18:1979–1990. doi: 10.1101/gr.081612.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Majoros WH, Pertea M, Salzberg SL. 2004. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20:2878–2879. doi: 10.1093/bioinformatics/bth315 [DOI] [PubMed] [Google Scholar]
- 27. Korf I. 2004. Gene finding in novel genomes. BMC Bioinformatics 5:59. doi: 10.1186/1471-2105-5-59 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Chan PP, Lin BY, Mak AJ, Lowe TM. 2021. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res 49:9077–9096. doi: 10.1093/nar/gkab688 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR. 2008. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9:R7. doi: 10.1186/gb-2008-9-1-r7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong SY, Lopez R, Hunter S. 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240. doi: 10.1093/bioinformatics/btu031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. 2021. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol 38:5825–5829. doi: 10.1093/molbev/msab293 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Unprocessed sequence reads, the assembly, and annotations have been deposited to NCBI under BioProject accession PRJNA1077360. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JBBXMP000000000. The version described in this paper is version JBBXMP010000000.
