ABSTRACT
Here, we report the complete genome of the non-aflatoxigenic Aspergillus flavus isolate La3279, which is an active ingredient of the aflatoxin biocontrol product Aflasafe. The chromosome-scale assembly clarifies the deletion pattern in the aflatoxin biosynthesis gene cluster and corrects a misidentified assembly previously published for this isolate.
KEYWORDS: Aspergillus flavus, aflatoxin biocontrol, agriculture, nanopore
ANNOUNCEMENT
Aspergillus flavus isolate La3279 (hereafter “La3279”) is a non-aflatoxigenic active ingredient of the aflatoxin biocontrol product Aflasafe (1). Previous work sequenced the La3279 genome and found 25 genes encoding aflatoxin biosynthesis (2). Another study analyzed the published La3279 genome and found that half of those genes were deleted (3). To resolve this inconsistency, we generated a new genome for La3279.
La3279 was grown from silica stock as described in reference (4). DNA was isolated and then sequenced on Oxford Nanopore Technologies and Illumina platforms following the methods detailed in reference (4). Quality filtering (QScore > 8) resulted in 2,097,937 nanopore reads (~7 gigabases). Read N50 was 8.2 kilobases, calculated using BBTools v38.79 (sourceforge.net/projects/bbmap/). Nanopore reads were aligned to the final assembly using Minimap2 v2.24 (5), and the depth was estimated using SAMtools v1.9 (6). After library preparation and barcoding using NEBNext Ultra II DNA Library Prep Kit for Illumina, paired-end (150 nt) reads were sequenced on an Illumina NextSeq 550 at the Arizona Genetics Core, generating 27,771,766 reads (4.2 gigabases). Adaptor contamination was masked using Illumina software bcl2fastq. Quality assessment using FastQC v0.11.9 (7) revealed high-quality reads without adapter contamination. All read positions had median Phred scores > 26. Jellyfish v2.2.9 (8) generated a histogram of 21-mer counts, which was visualized using GenomeScope (9) (Fig. 1A). Canu v2.2 (10) assembly of nanopore reads generated 12 unitigs, which were polished with three rounds of Pilon v1.23 (11), realigning Illumina reads each time. One ~4 kilobase unitig was discarded because it matched bacteria. Two unitigs <4 kilobases and one unitig ~112 kilobases were discarded because they matched Aspergillus mitogenomes in BLASTN searches (12). The eight remaining unitigs, each longer than three megabases (Table 1), were numbered following homology with A. flavus NRRL3357 chromosomes (Fig. 1B; GenBank accession: GCA_014117465.1). The NCBI contamination screen was negative. An idiogram was generated using RIdeogram (13, 14). Synteny between La3279 and A. flavus NRRL3357 was visualized using D-GENIES (15). Chromosome ends were identified by manually searching for consecutive repeats of the dodecanucleotide “TTAGGGTCAACA” (16). GeneMark v4.59 was used with options “--ES” and “--fungus” to generate ab initio gene predictions (17). Funannotate v1.7.4 (18) predicted protein-encoding genes using options “--genemark_gtf” with GeneMark predictions and “--protein_evidence” with predicted proteins from A. flavus NRRL3357. The gene model for aflQ (ordA) required manual correction based on alignment with A. flavus NRRL3357. Annotation completeness was estimated using BUSCO (19). The La3279 assembly was aligned to the previously published La3279 assembly (GenBank accession GCA_008694415.1) using GSAlign v1.0.22 (20).
Fig 1.
Genome structure and comparison of A. flavus isolate La3279. (A) Blue bars under the curves represent the observed counts of k-mers of length 21 in Illumina sequencing reads. The orange line represents the modeled distribution of sequencing errors with low coverage. The black line represents the modeled distribution of k-mers across the La3279 genome. (B) D-GENIES whole-genome dot plot portraying synteny between A. flavus La3279 and A. flavus NRRL 3357. Dark green alignments correspond to the identity between 0.75 and 1, and light green near the end of chromosome 7 corresponds to the identity between 0.5 and 0.75. (C) Idiogram showing eight chromosomes of La3279 with a heatmap of gene density per 25 kb window.
TABLE 1.
Summary statistics of the Aspergillus flavus isolate La3279 genome assembly and annotations
| Aspergillus flavus isolate La3279 | |
|---|---|
| Size (Mb) | 37.684 |
| 21-mer coverage | 44.5× |
| Average depth of coverage, ONT long reads | ~171× |
| Average depth of coverage, Illumina short reads | ~107× |
| No. of chromosomes | 8 |
| N50 (Mb) | 4.838 |
| L 50 | 4 |
| Largest scaffold (Mb) | 6.56 |
| GC content (% ±SD) | 47.44 ± 0.2 |
| Protein-encoding genes | 12,727 |
| BUSCO ascomycota | 98.80% |
| BUSCO eurotiomycetes | 98.80% |
| BUSCO eurotiales | 98.40% |
| Chromosome 1 length (bp) | 6,559,568 |
| Chromosome 2 length (bp) | 6,263,821 |
| Chromosome 3 length (bp) | 5,174,379 |
| Chromosome 4 length (bp) | 4,837,845 |
| Chromosome 5 length (bp) | 4,537,696 |
| Chromosome 6 length (bp) | 4,008,147 |
| Chromosome 7 length (bp) | 3,106,468 |
| Chromosome 8 length (bp) | 3,196,205 |
Telomeric repeats were found at 15/16 chromosome ends. Chromosomes of La3279 and A. flavus NRRL3357 showed 1:1 homology (Fig. 1B). Gene annotation predicted 12,727 protein-encoding genes (21) (Fig. 1C). The results of BUSCO analyses were in the 98th percentile (Table 1). At the end of chromosome 3, the aflatoxin biosynthesis gene cluster contained the 25 genes initially reported for La3279 (2). Our La3279 assembly shared 99.31% average nucleotide identity with the previously published assembly, suggesting that the original assembly belongs to a closely related isolate of A. flavus uploaded in error.
ACKNOWLEDGMENTS
Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. The USDA is an equal opportunity employer and provider.
The authors thank the Arizona Genetics Core (AZGC, Facility RRID: SCR_012429) for assistance with Illumina sequencing.
This work was funded by the United States Department of Agriculture project number 2020-42000-023-00D. This research used resources provided by the SCINet project and the AI Center of Excellence of the USDA Agricultural Research Service, ARS project number 0500000093-001-00-D.
K.A.C., H.L.M., B.A., and A.W.L. conceptualized the experiment. M.W. isolated genomic DNA. A.W.L. performed nanopore sequencing, assembled and annotated the genome, and wrote the initial manuscript. All authors helped edit the manuscript.
Contributor Information
Kenneth A. Callicott, Email: ken.callicott@usda.gov.
Jason E. Stajich, University of California Riverside, California, United States
DATA AVAILABILITY
The genome and sequencing reads were deposited at NCBI under BioProject accession PRJNA992514. The BioSample accession is SAMN36357330. The SRA record for Illumina reads is SRX20939733. The SRA record for nanopore reads is SRX20939734. Genome and gene annotation files are available on figshare (20) (https://doi.org/10.6084/m9.figshare.23632116). This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession ASM3051527v1. The submitted GenBank assembly accession is GCA_030515275.1.
REFERENCES
- 1. Atehnkeng J, Ojiambo PS, Ikotun T, Sikora RA, Cotty PJ, Bandyopadhyay R. 2008. Evaluation of atoxigenic isolates of Aspergillus flavus as potential biocontrol agents for aflatoxin in maize. 25:1264–1271. doi: 10.1080/02652030802112635 [DOI] [PubMed] [Google Scholar]
- 2. Adhikari BN, Bandyopadhyay R, Cotty PJ. 2016. Degeneration of aflatoxin gene clusters in Aspergillus flavus from Africa and North America. AMB Express 6:62. doi: 10.1186/s13568-016-0228-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Chang PK. 2022. Aspergillus flavus La3279, a component strain of the Aflasafe biocontrol product, contains a partial aflatoxin biosynthesis gene cluster followed by a genomic region highly variable among A. flavus isolates. Int J Food Microbiol 366:109559. doi: 10.1016/j.ijfoodmicro.2022.109559 [DOI] [PubMed] [Google Scholar]
- 4. Legan AW, Mack BM, Mehl HL, Wissotski M, Ching’anda C, Maxwell LA, Callicott KA. 2023. Complete genome of the toxic mold Aspergillus pseudotamarii isolate NRRL 25517 reveals genomic instability of the aflatoxin biosynthesis cluster. G3 (Bethesda) 13:jkad150. doi: 10.1093/g3journal/jkad150 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100. doi: 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. 2021. Twelve years of SAMtools and BCFtools. Gigascience 10:giab008. doi: 10.1093/gigascience/giab008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Andrews S. 2010. A quality control tool for high throughput sequence data. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- 8. Marçais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–770. doi: 10.1093/bioinformatics/btr011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, Schatz MC. 2017. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33:2202–2204. doi: 10.1093/bioinformatics/btx153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736. doi: 10.1101/gr.215087.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. 2014. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9:e112963. doi: 10.1371/journal.pone.0112963 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403–410. doi: 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
- 13. R Core Team . 2018. R: a language and environment for statistical computing. R foundation for statistical computing. Vienna, Austria. Available from: https://www.R-project.org/ [Google Scholar]
- 14. Hao Z, Lv D, Ge Y, Shi J, Weijers D, Yu G, Chen J. 2020. Rideogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput Sci 6:e251. doi: 10.7717/peerj-cs.251 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Cabanettes F, Klopp C. 2018. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ. doi: 10.7717/peerj.4958 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Kusumoto K-I, Suzuki S, Kashiwagi Y. 2003. Telomeric repeat sequence of Aspergillus oryzae consists of dodeca-nucleotides. Appl Microbiol Biotechnol 61:247–251. doi: 10.1007/s00253-002-1193-3 [DOI] [PubMed] [Google Scholar]
- 17. Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. 2008. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res 18:1979–1990. doi: 10.1101/gr.081612.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Palmer JM, Stajich J. 2020. Funannotate V1.8.1: eukaryotic genome annotation (V1.8.1). Zenodo. 10.5281/zenodo.4054262. [DOI]
- 19. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212. doi: 10.1093/bioinformatics/btv351 [DOI] [PubMed] [Google Scholar]
- 20. Lin H-N, Hsu W-L. 2020. GSAlign: an efficient sequence alignment tool for intra-species genomes. BMC Genom 21:182. doi: 10.1186/s12864-020-6569-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Legan AW, Mehl HL, Wissotski M, Adhikari BN, Callicott KA. 2023. Sequence and annotation files from “Telomere-to-Telomere genome assembly of the aflatoxin biocontrol agent Aspergillus flavus isolate La3279 isolated from maize in Nigeria. figshare. Dataset. Available from: 10.6084/m9.figshare.23632116 [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The genome and sequencing reads were deposited at NCBI under BioProject accession PRJNA992514. The BioSample accession is SAMN36357330. The SRA record for Illumina reads is SRX20939733. The SRA record for nanopore reads is SRX20939734. Genome and gene annotation files are available on figshare (20) (https://doi.org/10.6084/m9.figshare.23632116). This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession ASM3051527v1. The submitted GenBank assembly accession is GCA_030515275.1.

