Abstract
Fungi are the causal agents of many of the world's most serious plant diseases causing disastrous consequences for large-scale agricultural production. Pathogenicity genomic basis is complex in fungi as multicellular eukaryotic pathogens. The fungus Cercospora sojina is a plant pathogen that threatens global soybean supplies. Here, we report the genome sequence of C. sojina strain S9 and detect genome features and predicted genomic elements. The genome sequence of C. sojina is a valuable resource with potential in studying the fungal pathogenicity and soybean host resistance to frogeye leaf spot (FLS), which is caused by C. sojina. The C. sojina genome sequence has been deposited and available at DDBJ/EMBL/GenBank under the project accession number AHPQ00000000.
Specifications
| Organism/cell line/tissue | Cercospora sojina |
|---|---|
| Sex | – |
| Sequencer or array type | Illumina GA IIx |
| Data format | Processed |
| Experimental factors | DNA extracted from a field strain, no treatment |
| Experimental features | Whole genome sequence |
| Consent | n/a |
| Sample source location | Soybean field from Georgia, USA |
1. Direct link to deposited data
2. Materials and methods
2.1. C. sojina whole-genome sequencing and assembly
Genome of C. sojina strain S9 [1], from a soybean field in Georgia, was sequenced using the Illumina GA IIx next generation technology by paired-end sequencing method to a depth of 239 × at the Keck Center at the University of Illinois Urbana-Champaign. The produced sequences had a read length of 124 base pairs (bp). A total of 29,619,123 reads from each end were produced for a total of 59,238,246 reads from one lane. C. sojina genomes were assembled using Velvet algorithm [2] to obtain optimized results with high quality assembly. The assembly contained 30,797,991 bases with 1804 scaffolds (N50 = 37.690 bp) and the G + C content was 53.80%.
2.2. C. sojina genome annotation.
The C. sojina genes were predicted with ab initio gene finders (FGENESH, FGENESH +, and GENEWISE). We referred to the gene models from Zymoseptoria tritici (Mycosphaerella graminicola) as the most closely related species with C. sojina to train the gene finding programs. BlastX against publicly available non-redundant protein and BlastN against ESTs databases are used to validate and curate predicted complete coding regions of the gene models. The entire DNA sequence was also compared against the nonredundant protein databases in all six reading frames, using BlastX with threshold E < 1e− 5 to identify any possible coding sequences previously missed by using ARTEMIS to collate data and facilitate annotation [3]. Finally, a non-redundant set of gene models is produced, in which a single best gene model per locus is selected, preferring the candidate annotation with supporting evidence of homolog protein/EST sequence in public database and complete coding sequence region. rRNA genes were identified with RNAmmer [4]. And tRNA and ncRNA genes were identified by RFAM and TRNASCAN-SE [5], [6] in JGI Annotation system. The complete genome sequence contained 9099 protein-coding genes, 72rRNAs, 63tRNAs, 9ncRNAs. The average size of a gene was 2742 bp and the average CDS size was 1764 bp. The C. sojina sequencing and assembly statistics are summarized in Table 1.
Table 1.
C. sojina genome sequencing assembly and annotation statistics.
| Assembly and annotation | No. |
|---|---|
| Total no. paired-end reads | 59,238,246 |
| Average depth-coverage of mapped reads | 239 |
| Average read length (bp) | 124 |
| Assembly size (bp) | 30,797,991 |
| Total contigs (> 500 bp) | 1804 |
| N50 | 37,690 |
| Protein-coding genes | 9099 |
| Protein coding genes > 100 amino acids | 8868 |
| Average CDS size (bp) | 1764 |
| Average gene length (bp) | 2742 |
| % coding | 64 |
| Genes with known function | 7542 |
| Genes with unknown function | 1557 |
| rRNA | 72 |
| ncRNA | 9 |
| tRNA | 63 |
2.3. Genomics functional annotation.
All predicted genes are annotated for Gene Ontology (GO) using Blast2Go function annotation system [7], [8], according to sequence comparison with BlastP and domains/motif identification with interProScan [7] and PFAM [9]. The C. sojina genome encoded predominant potential genomic elements involved in mycelium related development. It is through the mycelium that a fungus absorbs nutrients from its environment. The major set of genes involved in the C. sojina life cycle suggests the fundamental systems for life sustainment. Additionally, the C. sojina genome contains extensive genetic factors involved in auxin biosynthetic process, and Ser/Thr protein kinases signal transduction and transcription factors regulation. It implies that considerable auxin regulation and signal transduction pathway involved in appropriate physiology processes and pathogenesis. ABC transporters (ATP binding cassette protein) genes were highly redundant and assigned to the largest set of molecular function category in C. sojina genome. These results accord with the report that ABC transporters act as an essential virulence factors by mediating secretion of host-specific toxins compounds during pathogenesis [10].
Acknowledgments
This project was supported by a grant from the USDA-CSREES as part of the Soybean Disease Biotechnology Center at the University of Illinois. This work was also supported by NSFC 31401428, National Key R& D Program for Crop Breeding 2016YFD0100306, Fok Ying-Tong Foundation 151024 and Taishan Scholar Talent Project from PRC.
References
- 1.Mian M.A.R., Missaoui A.M., Walker D.R., Phillips D.V., Boerma H.R. Frogeye leaf spot of soybean: a review and proposed race designations for isolates of Cercospora sojina Hara. Crop Sci. 2008;48:14–24. [Google Scholar]
- 2.Zerbino D.R., Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Rutherford K., Parkhill J., Crook J., Horsnell T., Rice P., Rajandream M.A., Barrell B. Artemis: sequence visualization and annotation. Bioinformatics. 2000;16:944–945. doi: 10.1093/bioinformatics/16.10.944. [DOI] [PubMed] [Google Scholar]
- 4.Lagesen K., Hallin P., Rodland E.A., Staerfeldt H.H., Rognes T., Ussery D.W. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 2007;35:3100–3108. doi: 10.1093/nar/gkm160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Griffiths-Jones S., Moxon S., Marshall M., Khan-na A., Eddy S.R., Bateman A. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005;33:D121–D124. doi: 10.1093/nar/gki081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lowe T.M., Eddy S.R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Götz S., García-Gómez J.M., Terol J., Williams T.D., Nagaraj S.H., Nueda M.J., Robles M., Talón M., Dopazo J., Conesa A. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008;36:3420–3435. doi: 10.1093/nar/gkn176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Conesa A., Götz S., García-Gómez J.M., Terol J., Talón M., Robles M. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21:3674–3676. doi: 10.1093/bioinformatics/bti610. [DOI] [PubMed] [Google Scholar]
- 9.Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy S.R., Griffiths-Jones S., Howe K.L., Marshall M., Sonnhammer E.L. The Pfam protein families database. Nucleic Acids Res. 2002;30:276–280. doi: 10.1093/nar/30.1.276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zwiers L.H., Stergiopoulos I., Van Nistelrooy J.G., De Waard M.A. ABC transporters and azole susceptibility in laboratory strains of the wheat pathogen Mycosphaerella graminicola. Antimicrob. Agents Chemother. 2003;46:3900–3906. doi: 10.1128/AAC.46.12.3900-3906.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
