Highlights
-
•
We sequenced the downy mildew pathogen, which is one of the most important production constraints for pearl millet.
-
•
In a maiden attempt, the whole-genome of Sclerospora graminicola pathotype 1 from India was sequenced and annotated.
-
•
The overall genome coverage achieved was 40×.
-
•
Estimate genome size of S. graminicola was 299.9 Mb.
-
•
Out of 65,404 genes that were predicted, a total of 38,120 genes were annotated.
Keywords: Sclerospora graminicola, Pathotype 1, Pearl millet, Downy mildew, Whole genome sequence
Abstract
Sclerospora graminicola pathogen is the most important biotic production constraints of pearl millet in India, Africa and other parts of the world. We report a de novo whole genome assembly and analysis of pathotype 1, one of the most virulent pathotypes of S. graminicola from India. The draft genome assembly contained 299,901,251 bp with 65,404 genes. This study may help understand the evolutionary pattern of pathogen and aid elucidation of effector evolution for devising effective durable resistance breeding strategies in pearl millet.
Pearl millet [Pennisetum glaucum (L.) R. Br.], is an important crop of the semi-arid and arid regions of the world. It is capable of growing in harsh and marginal environments with the highest degree of tolerance to drought and heat stresses among cereals [1].
Downy mildew is the most devastating disease of pearl millet caused by Sclerospora graminicola (sacc. Schroet), particularly on genetically uniform hybrids. Estimated annual grain yield loss due to downy mildew is approximately 10–80% [2], [3], [4], [5], [6], [7].
Pathotype 1 has been reported to be the highly virulent pathotype of Sclerospora graminicola in India [8]. We report a de novo whole genome assembly and analysis of S. graminicola pathotype 1 from India.
A susceptible pearl millet genotype Tift 23D2B1P1-P5 was used for obtaining single-zoospore isolates from the original oosporic sample. The library for whole genome sequencing was prepared according to the instructions by NEB ultra DNA library kit for Illumina (New England Biolabs, USA). The libraries were normalized, pooled and sequenced on Illumina HiSeq 2500 (Illumina Inc., San Diego, CA, USA) platform at 2 × 100 bp length. Mate pair (MP) libraries were prepared using the Nextera mate pair library preparation kit (Illumina Inc., USA). The libraries were normalized, pooled and sequenced on Illumina MiSeq (Illumina Inc., USA) platform at 2 × 300 bp length. One SMRTbell library was prepared with 20Kb insert size sequenced on PACBIO RSII platform.
The whole genome sequencing was performed by sequencing of 7.38 Gb with 73,889,924 paired-end reads from the paired-end library, and 1.15 Gb with 3,851,788 reads from the mate pair library generated from Illumina HiSeq2500 and Illumina MiSeq, respectively. Illumina reads were filtered with a quality score of at least 20 and read duplicates were removed before the assembly. A total 597,293 filtered sub reads with average read length of 6.39 Kb was generated on PACBIO RSII with P6-C4 chemistry. Approximately 51% of data generated from the reads with more than 10Kb length with a maximum read length of 49,261 bp. The sequences were assembled using various genome assemblers like ABySS, MaSuRCA, Velvet, SOAPdenovo2, and ALLPATHS-LG. The hybrid assembly generated by MaSuRCA [9] algorithm was found to be superior over other algorithms (Table 1). Assembled draft genome sequence of S. graminicola pathotype 1 was 299,901,251 bp in length, N50 of 17,909 bp with a minimum of 1 Kb scaffold size. The GC content was 47.2% consisting of 26,786 scaffolds with longest scaffold size of 238,843 bp. The overall coverage was 40×. The draft genome sequence was used for gene prediction using AUGUSTUS which resulted in 65,404 genes using Saccharomyces cerevisiae as a model. The completeness of the assembly was investigated through CEGMA and revealed 92.7% proteins completely present and 95.6% proteins partially present, while BUSCO v3 fungal dataset indicated 64.9% complete, 12.4% fragmented, 22.7% missing out of 290 BUSCO groups. A total of 52,285 predicted genes found homology using BLASTX against nr database and 38,120 genes were observed with a significant BLASTX match with E-value cutoff of 1e-5 and 40% identity percentage. Out of 38,120 genes annotated a set of 11,873 genes had UniProt entries, while 7248 were GO terms and 9686 with KEGG IDs. Of the 7248 GO terms, 2724 were associated with the biological processes. Some important GO terms are listed in Table 2. During the annotation, we observed many protein molecules which have known role in pathogenicity. Some of these include Crinkler (CRN) family protein, Glucanase inhibitor, Serine protease inhibitor, Cystiene inhibitor, INF1 Elicitin-like protein, SWI4 1, Peter Pan-like protein suppressor, Sterol binding protein, PexRD2, Glyceraldehyde-3-phosphate dehydrogenase, Ribonuclease, HECT E3 ubiquitin ligase, Alpha-1,2-Mannosidase, Endo-1,3(4)-beta-glucanase putative, Palmitoyltransferase, Serine/threonine-protein phosphatase 2A activator, Protein kinase, putative, NAD-dependent histone deacetylase sir2-like protein, rpp 13-like proteins, rpm, Glycoside hydrolase, Pre-mRNA-splicing factor SF2, NADH dehydrogenase flavoprotein 1, Mitochondrial Aldehyde dehydrogenase, Deoxyhypusine hydroxylase, DEAD/DEAH box RNA helicase, putative, CAMK protein kinase, Alpha-1,2-Mannosidase, Ornithine aminotransferase, mitochondrial Phosphatidate cytidylyltransferase, Acetolactate synthase, Inositol hexakisphosphate and diphosphoinositol-pentakisphosphate kinase.
Table 1.
Comparative statistics of the promising genome assemblers.
Assembler | Minimum | Maximum | Mean | N50 | No. of Contigs | Sum of Contigs | CEGMA Complete | CEGMA Partial |
---|---|---|---|---|---|---|---|---|
Abyss_DBG2OLC Scaffold | 2730 | 235,195 | 27,432 | 22,557 | 5126 | 140,615,056 | 56.45 | 62.5 |
SOAP_DBG2OLC Scaffolds | 2079 | 194,864 | 28,748 | 26,386 | 5404 | 155,354,949 | 65.73 | 71.37 |
MaSuRCA_Scaffolds | 1000 | 238,843 | 11,196 | 17,909 | 26,786 | 299,901,251 | 89.52 | 93.95 |
Table 2.
Important biological process identified using GO annotation.
Biological process | Number of GO terms |
---|---|
DNA integration [GO:0015074] | 699 |
NADH dehydrogenase (ubiquinone) activity [GO:0008137] | 95 |
Cytochrome-c oxidase activity [GO:0004129] | 81 |
ATP binding [GO:0005524] | 79 |
Heme binding [GO:0020037] | 68 |
Cytochrome-c oxidase activity [GO:0004129] | 68 |
Hydrogen ion transmembrane transporter activity [GO:0015078] | 61 |
Proton-transporting ATP synthase complex, coupling factor F(o) [GO:0045263] | 58 |
Microtubule motor activity [GO:0003777] | 35 |
Intracellular signal transduction [GO:0035556] | 34 |
Small-subunit processome [GO:0032040] | 29 |
Unfolded protein binding [GO:0051082] | 24 |
Magnesium ion binding [GO:0000287] | 20 |
Intracellular protein transport [GO:0006886] | 20 |
Repetitive element analysis with Repbase revealed 115 Ty1/Copia, 50 Gypsy, 419 small RNA, 23,618 simple repeats and 3365 low complex repeats. Microsatellite analysis with misa tool revealed 8179 mononucleotide repeats, 1082 low complexity repeats and 5562 dinucleotide to hexanucleotide repeats. S. graminicola pathotype 1 genome characteristics and resources are mentioned in Table 3.
Table 3.
Sclerospora graminicola pathotype 1 genome characteristics and resources.
Name | Genome characteristic/resource |
---|---|
NCBI bioproject ID | PRJNA325098 |
NCBI biosample ID | SAMN05219233 |
NCBI SRA accession No. | SRP076363 with accession numbers SRR3658180 and SRR3658181 |
Sequence type | Illumina HiSeq2500 and Illumina MiSeq, PacBio RSII |
Total number of reads | 73,889,924 from PE Library, 3,851,788 from MP Library |
Read length | 2 × 100 bp for PE and 2 × 300 bp for MP |
Overall coverage | 40× |
Estimated genome size | 299.9 Mb |
Predicted protein coding genes | 65,404 |
Annotated Genes | 38,120 |
The S. graminicola pathotype 1 sample has been deposited at the National fungal herbarium facility with accession number 52052 at the Herbarium Cryptogamae Indiae Orientalis (HCIO), Division of Plant Pathology, Indian Agricultural Research Institute (IARI), New Delhi, India.
Information on deposited data
The genome information of downy mildew pathogen is available in the NCBI GenBank database. The Sclerospora graminicola whole genome shotgun (WGS) project has the project accession MIQA00000000. This version of the project (02) has the accession number MIQA02000000, and consists of sequences MIQA02000001-MIQA02026786, with BioProject ID PRJNA325098 and BioSample ID SAMN05219233, and can be accessed at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA325098/.
Author contributions
RKS, HSS, CNS designed the experiment. RKS, CNS, SSH, CTS, RSY, VBRL, ATA, MKM, BK, NC, MKAVSK performed research. RKS, CNS, VBRL, SSH, RSY, CTS, PPS, SP, PK, OVS wrote the manuscript.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Acknowledgments
The authors gratefully acknowledge the support received from the All India Coordinated Research Project on Pearl Millet (AICRP-PM), Mandor; and the Indian Institute of Millets Research (IIMR), Hyderabad in collecting and maintaining downy mildew inoculum. The financial assistance received from ICAR and AgriGenome Labs is gratefully acknowledged. The authors also acknowledge the Herbarium Cryptogamae Indiae Orientalis (HICO), Division of Plant Pathology, Indian Agricultural Research Institute (IARI), for undertaking conservation of the deposited fungal specimen.
References
- 1.Supriya A., Senapathy S., Hash C.T., Thirunavukkarasu N., Vengaldas Kankanti R., Sharma R., Thakur R.P., Veeranki P.R., Yadav R.C., Srivastava R.K. QTL mapping of pearl millet rust resistance using an integrated DArT- and SSR-based linkage map. Euphytica. 2016;209(2):461–476. [Google Scholar]
- 2.Singh S.D., King S.B., Werder J. International Crops Research Institute for the Semi-Arid Tropics; India: 1993. Downy Mildew Disease of Pearl Millet. Information Bulletin No. 37, Patancheru, Andhra Pradesh 502324; p. 36. [Google Scholar]
- 3.Singh S.D. Downy mildew of pearl millet. Plant Dis. 1995;79:545–550. [Google Scholar]
- 4.Hash C.T., Singh S.D., Thakur R.P., Talukdar B.S. Breeding for disease resistance. In: Khairwal I.S., Rai K.N., Andrews D.J., Harinarayana G., editors. Pearl Millet Breeding. Oxford & IBH; New Delhi, India: 1999. pp. 337–379. [Google Scholar]
- 5.Hess D.E., Thakur R.P., Hash C.T., Sérémé P., Magill C.W. Pearl millet downy mildew: problems and control strategies for a new millennium. In: Leslie J.F., editor. Sorghum and Millets Diseases Ames. Iowa State Press; Iowa, USA: 2002. pp. 37–42. [Google Scholar]
- 6.Yadav O.P., Rai K.N. Genetic improvement of pearl millet in India. Agric. Res. 2013;2:275–292. [Google Scholar]
- 7.Anup C.P., Kini K.R. Analysis of dynamics of proteome in resistant cultivar of pearl millet seedlings during Sclerospora graminicola infection. J. App. Biol. Biotechnol. 2016;4:067–071. [Google Scholar]
- 8.Sudisha J., Ananda K.S., Shetty H.S. Characterization of Downy Mildew Isolates of Sclerospora graminicola by using differential cultivars and molecular markers. J. Cell Mol. Biol. 2008;7:41–55. [Google Scholar]
- 9.Zimin A.V., Guillaume M., Daniela P., Michael R., Steven L.S., James A.Y. The MaSuRCA genome assembler. Bioinformatics. 2013;29:2669–2677. doi: 10.1093/bioinformatics/btt476. [DOI] [PMC free article] [PubMed] [Google Scholar]