ABSTRACT
Toxic molds in the Aspergillus genus synthesize carcinogenic aflatoxins which contaminate crops. The widely applied biocontrol isolate Aspergillus flavus AF36 (NRRL 18543) has a high-quality public genome but lacks corresponding gene annotations. We generated high-quality gene predictions for this isolate by using long-read Nanopore PCR-cDNA sequencing.
KEYWORDS: Aspergillus, transcription, aflatoxin, annotation, Nanopore
ANNOUNCEMENT
Aspergillus flavus AF36 (NRRL 18543), isolated from cottonseed collected in Yuma, Arizona, USA, is the active ingredient in the first non-aflatoxigenic A. flavus-based biocontrol product to be developed for the mitigation of aflatoxins (1–3). It is now sold commercially as AF36 Prevail (Arizona Cotton Research and Protection Council, Phoenix, AZ, USA). A high-quality AF36 genome assembly was previously reported without gene annotations (4). We annotated the AF36 genome incorporating long-read evidence from Nanopore PCR-cDNA sequencing.
Because gene expression might vary between chemical environments and throughout development, we sampled AF36 from high- and low-aflatoxin environments at two time points, resulting in four tissue samples. Fungi were grown in 30 mL modified Czapek’s broth (5) with sucrose replaced by 66 mM glucose, supplemented with 0 µg/mL or 4 µg/mL aflatoxin B1 (Sigma-Aldrich, St. Louis, MO, USA). After inoculation with 105 AF36 spores, flasks were incubated in the dark at 31°C and 150 rpm. After 3 and 7 days of incubation, samples of fungal tissue were collected using vacuum filtration and immediately ground in liquid nitrogen using a sterile mortar and pestle. cDNA was amplified from total RNA (extracted using Plant RNeasy Mini Kit, QIAGEN, Germantown, MD, USA) and sequenced as one library using a PCR-cDNA Sequencing Kit (SQK-PCS109, Oxford Nanopore Technologies, Oxford, UK). Sixteen cycles of PCR were run using a 250-s extension to sequence up to 5 kb. Sequencing was performed using a MinION sequencer with FLO-MIN106D Spot-ON Flow Cell (ONT). MinKNOW and Guppy (MinION Release 19.12.2) were used with the FAST basecalling model and default settings for demultiplexing reads, quality filtering, and adapter sequence trimming. This resulted in 2,450,075 reads with N50 = 1 kb, constituting ~2.1 Gb. Reads were mapped to the AF36 genome (GenBank GCA_012897275.1) (4) using Minimap2 v2.24-r1122 with “--ax splice” (6), resulting in 2,357,784 mapped reads (96.2% of reads). The alignment was converted from SAM to BAM format, sorted by coordinate using samtools v1.9 (7), and input to StringTie2 v2.1.1 with option “-L” for long-read assembly (8). GeneMark v4.59 with options “--ES --fungus” generated initial ab initio gene predictions (9). The genome was cleaned, sorted, soft-masked, and annotated using Funannotate v1.7.4 (10) with options “--protein_evidence” with predicted proteins from A. flavus NRRL 3357, “--genemark_gtf” using the GeneMark GTF output, “--rna_bam” with the sorted BAM alignment of PCR-cDNA reads, and “--transcript_evidence” with the predicted transcripts from StringTie2. For comparison, Funannotate was also run without evidence from PCR-cDNA sequencing.
This pipeline predicted 15,382 transcripts (StringTie2) and 12,894 protein-encoding genes (Funannotate), suggesting ~20% alternative splicing on average (Table 1) (11). BUSCO scores were in the 99th percentile, indicating highly complete gene predictions (Table 1) (12). Without evidence from PCR-cDNA sequencing data, Funannotate generated 12,608 protein-encoding gene predictions, with a corresponding decrease in ortholog recovery (Table 1).
TABLE 1.
Summary of the Aspergillus flavus AF36 gene annotation
| A. flavus strain AF36 gene annotation | ||
|---|---|---|
| Informed by PCR-cDNA seq | Uninformed by PCR-cDNA seq | |
| Predicted transcripts | 15,382 | NA |
| Protein-encoding genes | 12,894 | 12,608 |
| BUSCO db = ascomycota | 1,696/1,706 (99.4%) | 1,689/1,706 (99%) |
| BUSCO db = eurotiomycetes | 3,513/3,546 (99%) | 3,491/3,546 (98.4%) |
| BUSCO db = eurotiales | 4,148/4,191 (99%) | 4,108/4,191 (98.1%) |
ACKNOWLEDGMENTS
This work was funded by the United States Department of Agriculture project number 2020-42000-023-000D. This research used resources provided by the SCINet project and the AI Center of Excellence of the USDA Agricultural Research Service, ARS project number 0500-00093-001-00-D. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. The USDA is an equal opportunity employer and provider.
K.A.C., H.L.M., and A.W.L. conceptualized the experiment. A.V. cultured the fungus and isolated the RNA. A.W.L. performed library preparation, sequencing, and analyses. A.W.L. wrote the initial manuscript, and all the authors helped to edit.
Contributor Information
Kenneth A. Callicott, Email: ken.callicott@usda.gov.
Jennifer Geddes-McAlister, University of Guelph, Guelph, Ontario, Canada.
DATA AVAILABILITY
All PCR-cDNA Nanopore reads are available under BioProject ID PRJNA984741 (BioSamples SAMN35777990-SAMN35777993 ). Gene annotation files are available on figshare (11).
REFERENCES
- 1. Cotty PJ. 1989. Virulence and cultural characteristics of two Aspergillus flavus strains pathogenic on cotton . Phytopathology 79:808. doi: 10.1094/Phyto-79-808 [DOI] [Google Scholar]
- 2. Brown RL, Cotty PJ, Cleveland TE. 1991. Reduction in aflatoxin content of maize by atoxigenic strains of Aspergillus flavus. J Food Prot 54:623–626. doi: 10.4315/0362-028X-54.8.623 [DOI] [PubMed] [Google Scholar]
- 3. Cotty PJ, Bayman P. 1993. Competitive exclusion of a toxigenic strain of Aspergillus flavus by an atoxigenic strain. Phytopathology 83:1283. doi: 10.1094/Phyto-83-1283 [DOI] [Google Scholar]
- 4. Fountain JC, Clevenger JP, Nadon B, Wang H, Abbas HK, Kemerait RC, Scully BT, Vaughn JN, Guo B. 2020. Draft genome sequences of one Aspergillus parasiticus isolate and nine Aspergillus flavus isolates with varying stress tolerance and aflatoxin production. Microbiol Resour Announc 9:e00478-20. doi: 10.1128/MRA.00478-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Dox AW. 1910. The intracellular enzyms of Penicillium and Aspergillus volume 111-121; with special reference to those of Penicillium camemberti . United States Department of Agriculture, Bureau of Animal Industry, Washington, D.C., U.S.A. [Google Scholar]
- 6. Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100. doi: 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. 2021. Twelve years of SAMtools and BCFtools. Gigascience 10:giab008. doi: 10.1093/gigascience/giab008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. 2019. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol 20:278. doi: 10.1186/s13059-019-1910-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. 2008. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res 18:1979–1990. doi: 10.1101/gr.081612.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Palmer JM, Stajich J. 2020. Funannotate V1.8.1: eukaryotic genome annotation (V1.8.1). Zenodo. doi: 10.5281/zenodo.4054262 [DOI] [Google Scholar]
- 11. Legan AW, Mehl HL, Varaksa A, Callicott KA. 2023. Annotation files from "Nanopore PCR-cDNA sequencing of the biocontrol isolate Aspergillus flavus AF36. Figshare. Dataset. doi: 10.6084/m9.figshare.23535519.v1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212. doi: 10.1093/bioinformatics/btv351 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All PCR-cDNA Nanopore reads are available under BioProject ID PRJNA984741 (BioSamples SAMN35777990-SAMN35777993 ). Gene annotation files are available on figshare (11).
