Skip to main content
3 Biotech logoLink to 3 Biotech
. 2023 Jan 19;13(2):55. doi: 10.1007/s13205-023-03468-4

Genome sequencing and annotation of Cercospora sesami, a fungal pathogen causing leaf spot to Sesamum indicum

Shagun Sinha 1,2, Sudhir Navathe 3, Sakshi Singh 4, Deepak K Gupta 5, Ravindra Nath Kharwar 2,, Ramesh Chand 1,
PMCID: PMC9852405  PMID: 36685323

Abstract

Cercospora sesami is a plant pathogen that causes leaf spot disease in sesame plants worldwide. In this study, genome sequence assembly of C. sesami isolate Cers 52–10 (MCC 9069) was generated using native paired-end and mate-pair DNA sequencing based on the Illumina HiSeq 2500 platform. The genome assembly of C. sesami is 34.3 Mb in size with an N50 of 26,222 bp and an average GC content of 53.02%. A total number of 10,872 genes were predicted in this study, out of which 9,712 genes were functionally annotated. Genes assigned to carbohydrate-active enzyme classes were also identified during the study. A total of 80 putative effector candidates were predicted and functionally annotated. The C. sesami genome sequence is available at DDBJ/ENA/GenBank, and other associated information is submitted to Mendeley's data.

Supplementary Information

The online version contains supplementary material available at 10.1007/s13205-023-03468-4.

Keywords: Cercospora leaf spot, Effectors, Gene annotation, Genome assembly, Plant pathogens, Sesame

Introduction

Sesame (Sesamum indicum) is one of the India’s oldest oilseed crops, cultivated throughout Asia, Africa, and South America’s tropical and subtropical climates (Zhang et al. 2013). Decorticated sesame seeds contain 45 to 63% oil, protein, vitamins, minerals and lignans. It is also used in the bakery industries and herbal medicine (Anilakumar et al. 2010). Sesame consumption is rising globally due to changes in the food habits of consumers for health-related benefits. Like any other crop, sesame is also prone to different diseases. Cercospora leaf spot caused by Cercospora sesami is the major fungal disease that hinders sesame cultivation resulting in significant yield losses. C. sesami belongs to one of the largest genera of hyphomycetes, Cercospora, with its teleomorph Mycosphaerella (Capnodiales, Mycosphaerellaceae) (Groenewald et al. 2013), which comprises about 3000 described species (Wijayawardene et al. 2017). C. sesami was first reported by Zimmermann in 1904 on Sesamum indicumCeratotheca triloba and Pretrea zanguebarica. The pathogen was reported from Asia, Africa, West Indies, and South and Central America (https://www.mycobank.org/). In India, the pathogen is widespread in all the places where sesame is cultivated (Kamal 2010). The pathogen affects the crop at all phases of development. The infection causes defoliation and damage to the sesame capsule, leading to 20 to 55% yield loss (Enikuomehin et al. 2010).

The genome organization of C. sesami is still unknown. Molecular data on barcodes is rarely available for C. sesami. The genes responsible for pathogenesis are still unidentified in this pathogen. As the knowledge of genome structure is very vital for understanding the pathosystems, therefore, in this study we report a detailed analysis of de novo whole-genome assembly and gene annotation of C. sesami isolate Cers 52–10. We have also identified and functionally annotated effector repertoires of C. sesami. The predicted effectors can be used further for functional genomics studies.

Materials and methods

Collection, identification and monoconidial isolation

Cercospora sesami infected leaf samples were collected during a survey from Birsa Agricultural University, Ranchi, Jharkhand, India. The leaves were observed and photographed by a Nikon D5200 digital camera (Nikon, Japan), and examined microscopically for characteristic symptoms using a Nikon Eclipse E200 microscope. The NIS-Elements v 4.0 imaging software was used to capture the micrographs of conidia and conidiophores at 20 × and 40 × resolution. The characteristic symptoms of the leaf spot and morphological characteristics of conidia and conidiophores were confirmed with the MycoBank database. The leaves for isolation were selected based on typical symptoms of Cercospora leaf spot and disease severity. The disease severity was recorded on a 1–9 disease rating scale (Alice and Nadarajan 2007). The leaf with above 75% infection (Rating scale – 9) was chosen for isolation.

The methods given by Chupp (1953) and Choi et al. (1999) were modified and used for the monoconidial isolation of the pathogen. The leaf spot was directly observed under the microscope to locate the spots where maximum sporulation has occurred. The ash gray center of the leaf spot having spore deposits was lifted with the help of a sterilized needle. The spores were spread over 2% water agar (Goode and Brown 1970) plates and incubated at 26 ºC for 24 h. The germinated spores were located under the microscope at a magnification of 100 × and marked. The individual germinated spore together with a water agar block was excised with the help of a sterilized cork borer of 5 mm and was inoculated on Potato Dextrose Agar (PDA) plates. The plates were incubated at 26 °C.

Pathogenicity test and culture deposition

The pathogenicity of the C. sesami isolate Cers 52–10 was tested in a greenhouse on healthy sesame plants. The inoculum was prepared by multiplying the C. sesami isolate Cers 52–10 on boiled sorghum grains following the protocol given by Chand et al. (2013). The spore suspension of 104 spore ml−1 was used for inoculation at the 50% flowering stage in five sesame plants. Similarly, control plants were sprayed with sterile distilled water. The plants were kept in the greenhouse, and high relative humidity (> 80%) was maintained by spraying water every two hours for two days and a temperature of around 25–28 ºC. Leaf spot symptoms appeared on the inoculated plants 7 days after inoculation. The symptoms were absent in the control plants. C. sesami was re-isolated from the leaves of the inoculated plants showing typical Cercospora leaf spot symptoms. The morphology was identical to the original isolate. Thus, the pathogenicity test revealed that C. sesami isolate Cers 52–10 fulfilled Koch’s postulates.

The culture of C. sesami was submitted to the National Fungal Culture Collection of India (NFCCI), Agharkar Research Institute, Pune, India, and International Depositary Authority (IDA) recognized repository, National Centre for Microbial Resource, National Centre for Cell Science (NCMR-NCCS), Pune, India, with an accession number NFCCI 3832 and MCC 9069.

DNA isolation and PCR amplification

The mycelium of C. sesami isolate Cers 52–10 was grown aseptically on PDB (Potato Dextrose Broth) in a 100 ml conical flask and was incubated at 25 °C for 10 days. The mycelial mat was harvested aseptically and used for DNA extraction. Total genomic DNA was extracted using the modified Cetyltrimethyl Ammonium Bromide (CTAB) protocol (Murray and Thompson 1980). The concentration of genomic DNA was determined using the Thermoscientific UV–Vis spectrophotometer NanoDrop™ One/One c. The DNA extracted was used for molecular confirmation of C. sesami isolate Cers 52–10 and further for genome sequencing.

The extracted DNA was diluted to a final concentration of 20 ng/µl for use in Polymerase Chain Reaction (PCR). The DNA amplification was performed by Bio-Rad T100 Thermal Cycler (Bio-Rad, California, USA) using Qiagen HotStar Taq Master Mix Kit (Qiagen, Germany). Three different primers, that is, internal transcribed spacer (ITS) region (White et al. 1990), large subunit of ribosomal RNA (LSU) (Vilgalys and Hester 1990; Rehner and Samuels 1995) and actin (ACT) (Carbone and Kohn 1999) were amplified. The amplification products were separated by electrophoresis at 70 V for 1 h on 2.0% agarose gel. The representative sequences resulting in 545 bp ITS, 1032 bp LSU and 166 bp ACT were deposited in GenBank with accession: MK027103, MK029365 and OM365916, respectively.

Phylogenetic analysis

The phylogenetic tree was constructed using the concatenated ITS, LSU and ACT datasets. Reference sequences of different Cercospora species were retrieved from Groenewald et al. (2013), Nguanhom et al. (2015) and GenBank (https://www.ncbi.nlm.nih.gov/) (Table S1). The sequences were concatenated using Geneious Prime v 2022.1.1 (http://www.geneious.com/). The concatenated sequences were aligned using Clustal Omega (McWilliam et al. 2013). The phylogenetic tree was constructed using MEGA11 (Tamura et al. 2021). The statistical method used for phylogenetic tree construction was the Neighbor-Joining method (Saitou and Nei 1987). One thousand bootstrap repetitions calculated the confidence level for each branch. The evolutionary distances were computed using the Tajima-Nei model (Tajima and Nei 1984). The number of base substitutions per site was 0.02. The positions containing gaps and missing data were deleted entirely. Zymoseptoria tritici (CBS 100329) was used as an outgroup.

Genome sequencing and assembly

The genome of C. sesami isolate Cers 52–10 was sequenced using paired-end (2 × 100 bp) and mate-pair (2 × 250 bp) sequencing based on an Illumina HiSeq 2500 platform (Illumina, California, USA). The quality of raw fastq files obtained from the sequencer was checked using FastQC (Andrews 2010) before performing assembly. The adapters and low-quality reads with an average quality score of less than 30 in any paired-end and mate-pair reads were removed using AdapterRemoval v 2.3.1 (Schubert et al. 2016). The duplicate reads were removed, and unique reads were obtained using FastUniq v 1.1 (Xu et al. 2012). The de novo assembly was performed using MaSuRCA v 3.2.3 (Zimin et al. 2013). A range of 31–95 k-mers was used for MaSuRCA assembly. The genome assembly quality was assessed using QUAST v 4.6 (Gurevich et al. 2013). The genome annotation completeness was assessed using the BUSCO v 2.0 Ascomycete odb_9 dataset (Simao et al. 2015). The repeat regions were identified through RepeatMasker Open-4.0. with default parameters (Smit et al. 2015).

Gene prediction and annotation

The protein-coding genes from the MaSuRCA assembled contigs were predicted by applying the ab initio gene predictor AUGUSTUS v 3.3.2 (Stanke and Morgenstern 2005) with default parameters. Cercospora beticola was used as the reference model for gene prediction. The predicted genes were annotated using the following in-house pipeline. The predicted genes were compared with the UniProt database (UniProt Consortium 2021) using the BLASTx v 2.6.0 tool (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) with an E-value cutoff of 10–3. The best BLASTx hit for each gene was chosen based on query coverage, identity, similarity score, and gene description. The predicted genes were annotated using BLASTx searches against UniProt and other databases. The gene ontology (GO) terms (Molecular Function (MF), Cellular Component (CC), Biological Process (BP)) for genes were mapped using InterProScan v 5 (Jones et al. 2014), KEGG (Kanehisa and Goto 2000), eggNOG-mapper (Huerta-Cepas et al. 2017) and Pfam (El-Gebali et al. 2019) databases. The predicted genes were subjected to the dbCAN2 meta-server with default parameters (Zhang et al. 2018) to predict and classify carbohydrates active enzymes (CAZymes) classes.

Genome comparison using dashing

Genome assembly of all 21 available Cercospora genomes were downloaded from the NCBI WGS database (https://www.ncbi.nlm.nih.gov/genbank/wgs/). The pairwise genomic distance and similarity of 21 Cercospora genomes were calculated using Dashing v 1.0 (Baker and Langmead 2019) with K-mer length 16. The following subcommands were used in Dashing – “dist,” “sketch,” “hll,” “union,” and “printmat.” The Dashing “dist” command was used to generate an upper triangular distance matrix. This distance matrix was further used for generating the Pearson correlation plot using the “cor” function and “ggplot2” package in R (R Core Team 2021https://www.r-project.org/).

Effector prediction and annotation

The effectors in the C. sesami genome were predicted using a computational pipeline. The predicted protein dataset was subjected to SignalP v 5.0 (Armenteros et al. 2019,b) to detect secretory proteins. The non-secretory proteins were eliminated and the secretory protein dataset was then subjected to PredGPI (Pierleoni et al. 2008) to eliminate glycophosphatidylinositol (GPI) anchored proteins, followed by TMHMM v 2.0 (Krogh et al. 2001) to eliminate the proteins with one or more transmembrane domains. TargetP v 2.0 (Armenteros et al. 2019a) was used to eliminate proteins targeted to mitochondria and chloroplast. WoLF PSORT (Horton et al. 2007) and DeepLoc v 1.0 (Armenteros et al. 2017) were used to identify apoplastic and cytoplasmic proteins. Further, cysteine content (≥ 4 cysteine residues) and the size of the proteins (≤ 300 amino acids) were determined using Geneious Prime v 2022.1.1. EffectorP v 2.0 (Sperschneider et al. 2018) was used to identify putative effector candidates. The predicted effectors were functionally annotated by assigning GO terms using OmicsBox (Götz et al. 2008).

Results

Morphological and molecular identification of C. sesami

C. sesami infected leaf spots were 0.5–3 mm in diameter, circular to sub-circular with a gray center and brown to black margin (Fig. 1a, b). Conidiophores were in fascicles of 2–9, olivaceous brown, multiseptate and geniculate with a scar at the sub-truncate tip (Fig. 1c). Conidia were acicular, hyaline, multiseptate with acute tip (Fig. 1d). The colony of C. sesami was white on the obverse side and reddish-pink on the reverse side of the culture plate (Fig. 1e, f).

Fig. 1.

Fig. 1

Cercospora sesami (Isolate Cers 52–10; MCC 9069) on Sesamum indicum a–f, a Leaf spots on upper surface of leaf, b Symptoms on leaf after 12 days of inoculation, c Conidiophores fascicles on leaf surface, d Conidia, e–f Obverse and reverse side of colony grown on PDA after incubation for 12 days, Scale bars: c = 100 μm and d = 50 μm

Molecular identification was carried out by constructing a phylogenetic tree based on the combined ITS, LSU, and ACT datasets (Fig. 2). A BLAST search in GenBank using the ITS sequences revealed that the sequences showed over 99–100% identity with many Cercospora species, including C. canescens, C. physalidis, C. beticola, C. capsici, C. apii and many others. However, LSU sequences showed 99–100% identity only with C. beticola, C. citrullina and C. malayensis. ACT sequences showed 99–100% identity only with C. kikuchii, Cercospora sp. Q JZG-2013 (CPC 10551) and Cercospora sp. 2 LO-2017. The phylogenetic tree constructed using three combined datasets, showed that C. sesami from S. indicum formed a well-supported clade sister to a clade consisting of Cercospora species as well as formed a distinct clade from other genera such as Zymoseptoria. Furthermore, the C. sesami pathogen isolated from the leaf spot on S. indicum was closely related to C. chrysanthemoides on Chrysanthemoides monilifera.

Fig. 2.

Fig. 2

A neighbor-joining tree based on the concatenated alignment of sequence data of three loci, ITS (545 bp), LSU (1032 bp) and ACT (166 bp) showing phylogenetic affinities of sequenced Cercospora sesami (shown in bold) with other Cercospora species. Multiple sequence alignments were produced using Clustal Omega, and the phylogenetic tree was constructed using MEGA11. Zymoseptoria tritici (CBS 100329) was designated as outgroup. Bootstrap values obtained from 1000 replicates are shown at the nodes. The scale bar represents 0.02 nucleotide substitutions per site. (CBS Centraalbureau Voor Schimmelcultures, Utrecht, Netherlands; CPC Culture Collection of Pedro Crous; MCC Microbial Culture Collection, National Centre for Microbial Resource, National Centre for Cell Science, Pune, India)

The estimation of pairwise distances and similarity of 21 Cercospora genome datasets was obtained from Dashing software (Tables S2–S3). The distance matrix value represented the most similar (0) and the most divergent (1) genome. The observed distance matrix was correlated by the Pearson correlation approach. It varied between – 1 (distantly correlated) to 1 (identical). The correlation profiling revealed the expression similarities among 21 genomes (Fig. 3, S1).

Fig. 3.

Fig. 3

A heat map of the optimal pairwise distance matrix showing the significant correlation among 21 Cercospora genomes. The red color indicates the most likely related, and the blue color indicates the most divergently related genomes

Genome assembly and annotation

The assembled draft genome of C. sesami was 34.3 Mb in size with an N50 value of 26,222 bp and an average GC content of 53.02%. Genome assembly statistics are presented in Table 1. Currently, 21 genome assemblies of Cercospora are available in the NCBI WGS database. The assembly statistics of all the genomes are presented in Table 2. The BUSCO analysis identified 96.55% (280/290) of the complete single-copy orthologs for C. sesami, suggesting a complete assembly and annotation. Comparative BUSCO analysis on the genome assembly of Cercospora and related genus is presented in Table 3. The genome assembly was estimated to have 138,031 bp (0.40%) simple repeats and 15,278 bp (0.04%) low complexity repeats.

Table 1.

Summary of genome sequencing, assembly, and annotation statistics of Cercospora sesami isolate Cers 52–10

Attributes Statistics
Estimated genome size (Mb) 34.33
Coverage 90.0 × 
Number of scaffolds 1867
Scaffold N50 26,222
Scaffold L50 405
Number of contigs 2086
Contig N50 25,693
Contig L50 415
Largest contig (Kb) 145.85
GC (%) 53.02
Predicted protein-coding genes 10,872
Number of predicted genes with significant BLASTx match with UniProt 9,712
Total BUSCO groups searched 290
Complete BUSCOs (C) 280
Complete and single-copy BUSCOs (S) 272
Complete and duplicated BUSCOs (D) 8
Fragmented BUSCOs (F) 5
Missing BUSCOs (M) 5

Table 2.

Assembly summary statistics of C. sesami compared to other Cercospora genomes

(Source – GenBank)

Species Assembly accession number(s) Size (Mb) GC %
Cercospora berteroae GCA_002933655.1 33.89 52.20
Cercospora beticola GCF_002742065.1 37.06 51.99
Cercospora beticola GCA_003370525.1 36.55 51.20
Cercospora beticola GCA_003370505.1 35.26 51.30
Cercospora brassicicola* GCA_013365245.1 38.32 51.90
Cercospora canescens* GCA_000347735.1 33.97 52.00
Cercospora cf. flagellaris GCA_005356885.1 33.24 52.70
Cercospora cf. sigesbeckiae GCA_002217505.1 34.94 52.70
Cercospora cf. sigesbeckiae GCA_005356805.1 33.72 53.00
Cercospora citrullina* GCA_013365195.1 32.81 52.50
Cercospora kikuchii GCF_019650295.1 34.44 53.00
Cercospora kikuchii GCA_005356855.1 33.22 53.00
Cercospora kikuchii GCA_009193115.1 33.20 53.00
Cercospora nicotianae GCA_002994015.1 33.37 52.80
Cercospora sesami* GCA_013365235.1 34.33 53.02
Cercospora sojina GCA_002534735.1 40.84 53.10
Cercospora sojina GCA_004299825.1 40.12 53.40
Cercospora sojina GCA_003435105.1 31.11 53.40
Cercospora sojina GCA_002084285.1 29.95 53.60
Cercospora zeae-maydis GCA_010093985.1 46.61 48.60
Cercospora zeina GCA_002844615.2 41.82 47.90

*Work from our lab

Table 3.

Comparative BUSCO analysis on assembly and annotation of Cercospora and related genus

(Source - Wingfield et al. 2017; Akinsanmi and Carvalhais 2020; Amarillas et al. 2020; Wilken et al. 2020; Wingfield et al. 2022)

Taxa Genome accession number(s) Complete BUSCOs Fragmented BUSCOs Missing BUSCOs
Cercospora brassicicola JAASLH000000000 281 (96.89%) 6 (2.06%) 3 (1.03%)
Cercospora citrullina JAASFF000000000 284 (97.93%) 3 (1.03%) 3 (1.03%)
Cercospora sesami JAASLG000000000 280 (96.55%) 5 (1.72%) 5 (1.72%)
Cercospora zeina MVDW00000000 95.40% 2.10% 2.50%
Pseudocercospora cruenta JAASFE000000000 284 (97.93%) 1 (0.34%) 5 (1.72%)
Pseudocercospora fijiensis JAAEBF000000000 96.90% 2.10% 1.00%
Pseudocercospora macadamiae WRNY00000000 1287 (97.87%) 5 (0.38%) 23 (1.74%)

A total of 10,872 genes were predicted by AUGUSTUS based on Cercospora beticola as a reference model. The maximum and minimum sizes of the genes were 6325 bp and 25 bp, respectively. The number of predicted genes with significant BLASTx matched with the UniProt database was 9712 (89.30% of 10,872 genes). The annotations were supported using publicly available databases. Gene ontology analysis mapped 1038 genes associated with molecular functions. Most genes were associated with metal ion binding, ATP binding, oxidoreductase activity, catalytic activity and RNA polymerase II transcription factor activity. There were 350 genes related to cellular components majorly linked to an integral component of the membrane, nucleus, cytoplasm, mitochondria, ribosome and extracellular region. There were 977 genes related to biological processes such as metabolic processes, transmembrane transport, intracellular protein transport, DNA repair, transcription and translation (Fig. 4).

Fig. 4.

Fig. 4

Gene ontology (GO) distribution of Cercospora sesami MCC 6069 genes under three main GO categories: Biological Process, Molecular Function and Cellular Component

The carbohydrate-active enzyme family

CAZymes play an essential role in colonizing host tissues by phytopathogenic fungi. Their primary functions include the breakdown, biosynthesis or modification of glycoconjugates and oligo- and polysaccharides (Lyu et al. 2015). CAZymes are classified into four main classes: glycosyl transferases (GT), glycoside hydrolases (GH), polysaccharide lyases (PL) and carbohydrate esterases (CE) (Cantarel et al. 2009). The genome of C. sesami was mapped to study the presence of CAZymes. A total of 291 genes were assigned to carbohydrate-active enzyme classes as defined in the database. The most abundant class was GH, with 159 (54.6%) genes followed by GT, with 52 (17.86%) genes. A comparative overview of CAZymes of C. sesami with other plant pathogenic fungi belonging to dothideomycetes is presented in Table 4.

Table 4.

Number of genes encoding for different CAZyme classes in Cercospora sesami and selected dothideomycetes

(Source - Goodwin et al. 2011; Chang et al. 2016)

Taxa GH GT PL CE CBMs
Cercospora sesami 159 52 3 14 8
Pseudocercospora fijiensis 244 109 6 64 30
Mycosphaerella graminicola 184 97 3 20 20
Stagonospora nodorum 284 92 10 57 74
Dothistroma septosporum 202 114 6 57 28
Zymoseptoria tritici 198 106 3 47 24
Sphaerulina musiva 180 107 6 41 26
Bipolaris maydis 268 102 15 83 81

GH glycoside hydrolase; GT glycosyl transferase; PL polysaccharide lyase; CE carbohydrate esterase; CBMs carbohydrate-binding modules

Effector prediction and annotation

The predicted protein dataset of C. sesami consisted of 714 (6.5%) secretory proteins. Out of 714, 52 proteins with a GPI-linked domain were eliminated. Of the remaining 662 proteins, 60 proteins with one or more transmembrane domains were eliminated. Eighty putative effector repertoires were predicted from the above protein dataset by EffectorP analysis. Out of 80, 36 (45%) effectors were small-secreted proteins with ≤ 300 amino acids and ≥ 4 cysteine residues. Functional annotation analysis of the predicted effectors identified 27 (38%) proteins with hypothetical functions. Other effectors included proteins belonging to different enzyme classes like glycosyl hydrolases, carbohydrate esterases, proteases, endo-peptidases, oxidoreductases, catalases and peroxidases. Many others were rich in essential domains that may play a crucial role in pathogenesis.

Discussion

Cercospora leaf spot of sesame is a very prevalent disease across the regions of the world where sesame is cultivated. Due to a lack of resistant sources, the released sesame varieties are highly susceptible to Cercospora leaf spot, resulting in a low yield. This study generated a C. sesami genome assembly of 34.3 Mb with 53.02% of GC content. The study reports the first genome assembly of C. sesami until now. The assembled size of the genome of C. sesami was found to be comparable to the genome assemblies of other Cercospora species (Table 2). Comparative genome analysis involved gene annotation, estimation of genomic distances, identification of repeat elements, characterization of carbohydrate-active enzymes and identification and functional characterization of putative effector proteins.

The calculation of genomic distances for the delineation of fungal species is very important. It is reported that genome pairs within and between species of the same genus could be correctly distinguished in almost 90% of cases, using an uncurated dataset of fungal genomes from GenBank and in more than 90% of the cases on a minimally curated dataset (Gostinčar 2020). The tools like Mash (Ondov et al. 2016) and Dashing (Baker and Langmead 2019) are user-friendly tools used in the calculation of genomic distances. Mash is considered one of the standard tools of comparative genomics which uses MinHash sketch whereas Dashing uses HyperLogLog (HLL) sketch and cardinality estimation methods that provide improved speed and accuracy for a wide range of genomic datasets (Baker and Langmead 2019). In this study, Dashing was used to generate pairwise distance and similarity matrix of 21 currently available Cercospora genomic datasets using the default setting and k-mer size 16.

Eighty effectors were predicted in this study. There are various criteria to define the candidate effector proteins which often involve the use of different in-silico tools. These criteria include the presence of signal peptide, no transmembrane domain, small size containing ≤ 300 amino acids and high cysteine residues (Stergiopoulos and de Wit 2009; Lo Presti et al. 2015; Sperschneider et al. 2015a; Sonah et al. 2016). These features have been used widely for screening a large number of candidates for functional analysis of potent effectors. Although these criteria help reduce the number of candidates, the problem remains unsolved as not all secreted small cysteine-rich proteins will function as an effector and on the contrary not all fungal effectors will be small and cysteine-rich (Sperschneider et al. 2015b). The web-based tools are 89% accurate in the prediction of effectors from the predicted secretome (Kanja and Hammond-Kosack 2020). Therefore, additional studies such as in-planta expression, diversifying selection and understanding of genomic features are needed to ensure that potential effector proteins are not being missed (Sperschneider et al. 2015a; Selin et al. 2016).

Before this study, there was minimal information available on the genome structure of C. sesami. Consequently, this research was initiated as the first step toward understanding the genome organization of C. sesami and narrowing down the number of effector candidates. This will open a window for future functional genomics studies of effectors. The predicted effector candidates can be confirmed using additional studies like in-planta expression and together with the understanding of genome structure, can play an essential role in the development of disease management measures for effective control of Cercospora leaf spot of sesame that can be used worldwide.

Supplementary Information

Below is the link to the electronic supplementary material.

Acknowledgements

Shagun Sinha is thankful to the Council of Scientific and Industrial Research (CSIR), New Delhi, India, for the Senior Research Fellowship (SRF) and the Department of Biotechnology (DBT) and the British Council for Newton-Bhabha PhD placement fellowship 2019-2020. All authors are thankful to the Head and Coordinator, Center of Advanced Studies (CAS) in Botany, Banaras Hindu University (BHU), Varanasi, India and the Head, Department of Mycology and Plant Pathology, Institute of Agricultural Sciences, BHU for financial and infrastructural support.

Author contributions

RC, RNK and SN planned the whole-genome sequencing study. RC and SS contributed to the identification, isolation and characterization of the sequenced pathogen. SS and SN performed DNA extraction, gene annotation and phylogenetic analysis. SS and DKG helped in bioinformatics analysis. SS and SN drafted the manuscript. RNK was involved in adding critical inputs to the manuscript. All authors have read and contributed to the final manuscript.

Funding

Council of Scientific and Industrial Research, India, 09/013(0824)/2018-EMR-I, Shagun Sinha.

Availability of Data and materials

The authors state that all the necessary data to confirm the conclusions presented in the article are entirely included within the article, and the supplementary data to this article have been deposited to Mendeley’s data with DOI number https://doi.org/10.17632/26m5grnb88.1.

Declarations

Conflict of interest

The authors declare that they have no conflict of interest in the publication.

Research involving human participants and/or animals

The authors declare that the research did not involve human participants and/or animals.

Informed consent

All authors have given consent to the publication of this manuscript in 3 Biotech.

Footnotes

Accession Numbers: The Cercospora sesami genomic data have been deposited as accession JAASLG000000000 in DDBJ/ENA/GenBank. The version described in this paper is version JAASLG010000000. The genome raw sequencing data and the reported assembly are associated with NCBI BioProject: PRJNA613165 and BioSample: SAMN14395398 within GenBank. The SRA accession of Cercospora sesami is SRX7967432.

Contributor Information

Ravindra Nath Kharwar, Email: rnkharwarbot@bhu.ac.in.

Ramesh Chand, Email: rc_vns@yahoo.co.in.

References

  1. Akinsanmi OA, Carvalhais LC. Draft genome of the macadamia husk spot pathogen, Pseudocercospora macadamiae. Phytopathology. 2020;110:1503–1506. doi: 10.1094/PHYTO-12-19-0460-A. [DOI] [PubMed] [Google Scholar]
  2. Alice D, Nadarajan N (2007) Pulses: screening techniques and assessment for disease resistance. All India coordinated research project on MULLaRP.
  3. Amarillas L, Estrada-Acosta M, León-Chan RG, López-Orona C, Lightbourn L. First draft genome sequence resource of a strain of Pseudocercospora fijiensis isolated in North America. Phytopathology. 2020;110:1620–1622. doi: 10.1094/PHYTO-04-20-0121-A. [DOI] [PubMed] [Google Scholar]
  4. Andrews S (2010) FastQC: a quality control tool for high throughput sequence data.
  5. Anilakumar KR, Pal A, Khanum F, Bawa AS. Nutritional, medicinal and industrial uses of sesame (Sesamum indicum L.) seeds - an overview. Agric Conspec Sci. 2010;75:159–168. [Google Scholar]
  6. Armenteros JJA, Sønderby CK, Sønderby SK, Nielsen H, Winther O. DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics. 2017;33:3387–3395. doi: 10.1093/bioinformatics/btx431. [DOI] [PubMed] [Google Scholar]
  7. Armenteros JJA, Salvatore M, Emanuelsson O, Winther O, Von Heijne G, Elofsson A, Nielsen H. Detecting sequence signals in targeting peptides using deep learning. Life Sci Alliance. 2019;2:e201900429. doi: 10.26508/lsa.201900429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Armenteros JJA, Tsirigos KD, Sønderby CK, Petersen TN, Winther O, Brunak S, von Heijne G, Nielsen H. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol. 2019;37:420–423. doi: 10.1038/s41587-019-0036-z. [DOI] [PubMed] [Google Scholar]
  9. Baker DN, Langmead B. Dashing: fast and accurate genomic distances with hyperloglog. Genome Biol. 2019;20:1–12. doi: 10.1186/s13059-019-1875-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B. The carbohydrate-active enzymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res. 2009;37(suppl_1):D233–D238. doi: 10.1093/nar/gkn663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chand R, Kumar P, Singh V, Pal C. Technique for spore production in Cercospora canescens. Indian Phytopathol. 2013;66:159–163. [Google Scholar]
  12. Chang TC, Salvucci A, Crous PW, Stergiopoulos I. Comparative genomics of the sigatoka disease complex on banana suggests a link between parallel evolutionary changes in Pseudocercospora fijiensis and Pseudocercospora eumusae and increased virulence on the banana host. PLoS Genetics. 2016;12:e1005904. doi: 10.1371/journal.pgen.1005904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Choi YW, Hyde KD, Ho WH. Single spore isolation of fungi. Fungal Divers. 1999;3:29–38. [Google Scholar]
  14. Chupp C. A monograph of fungus genus Cercospora. Ithaca, New York: Cornel Univ; 1953. p. 667. [Google Scholar]
  15. El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47:D427–D432. doi: 10.1093/nar/gky995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Enikuomehin OA, Aduwo AM, Olowe VIO, Popoola AR, Oduwaye A. Incidence and severity of foliar diseases of sesame (Sesamum indicum L.) intercropped with maize (Zea mays L.) Arch Phytopathol Plant Prot. 2010;43:972–986. doi: 10.1080/03235400802214810. [DOI] [Google Scholar]
  17. Goode MJ, Brown GR. Detection and characterization ofCercospora citrulinaisolates that sporulate readily in culture. Phytopathology. 1970;60:1502–1503. doi: 10.1094/Phyto-60-1502. [DOI] [Google Scholar]
  18. Goodwin SB, Ben M'Barek S, Dhillon B, Wittenberg AH, Crane CF, Hane JK, et al. Finished genome of the fungal wheat pathogen Mycosphaerella graminicola reveals dispensome structure, chromosome plasticity, and stealth pathogenesis. PLoS Genetics. 2011;7:e1002070. doi: 10.1371/journal.pgen.1002070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Gostinčar C. Towards genomic criteria for delineating fungal species. J Fungi. 2020;6:246. doi: 10.3390/jof6040246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Götz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talon M, Dopazo J, Conesa A. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008;36:3420–3435. doi: 10.1093/nar/gkn176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Groenewald JZ, Nakashima C, Nishikawa J, Shin HD, Park JH, Jama AN, Groenewald M, Braun U, Crous PW. Species concepts in Cercospora: spotting the weeds among the roses. Stud Mycol. 2013;75:115–170. doi: 10.3114/sim0012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: a quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Horton P, Park KJ, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, Nakai K. WoLF PSORT: protein localization predictor. Nucleic Acids Res. 2007;35(suppl_2):W585–W587. doi: 10.1093/nar/gkm259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, Von Mering C, et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol Biol Evol. 2017;34:2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Kamal . Cercosporoid fungi of India. Dehradun (Uttarakhand), India: Bishen Singh Mahendra Pal Singh; 2010. [Google Scholar]
  27. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kanja C, Hammond-Kosack KE. Proteinaceous effector discovery and characterization in filamentous plant pathogens. Mol Plant Pathol. 2020;21:1353–1376. doi: 10.1111/mpp.12980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001;305:567–580. doi: 10.1006/jmbi.2000.4315. [DOI] [PubMed] [Google Scholar]
  30. Lo Presti L, Lanver D, Schweizer G, Tanaka S, Liang L, Tollot M, et al. Fungal effectors and plant susceptibility. Annu Rev Plant Biol. 2015;66:513–545. doi: 10.1146/annurev-arplant-043014-114623. [DOI] [PubMed] [Google Scholar]
  31. Lyu X, Shen C, Fu Y, Xie J, Jiang D, Li G, Cheng J. Comparative genomic and transcriptional analyses of the carbohydrate-active enzymes and secretomes of phytopathogenic fungi reveal their significant roles during infection and development. Sci Rep. 2015;5:1–16. doi: 10.1038/srep15565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, et al. Analysis tool web services from the EMBL-EBI. Nucleic Acids Res. 2013;41:W597–W600. doi: 10.1093/nar/gkt376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Murray MG, Thompson WF. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 1980;8:4321–4326. doi: 10.1093/nar/8.19.4321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Nguanhom J, Cheewangkoon R, Groenewald JZ, Braun U, To-Anun C, Crous PW. Taxonomy and phylogeny of Cercospora spp. from Northern Thailand. Phytotaxa. 2015;233:27–48. doi: 10.11646/phytotaxa.233.1.2. [DOI] [Google Scholar]
  35. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;17:1–14. doi: 10.1186/s13059-016-0997-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Pierleoni A, Martelli PL, Casadio R. PredGPI: a GPI-anchor predictor. BMC Bioinform. 2008;9:1–11. doi: 10.1186/1471-2105-9-392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical computing, Vienna, Austria. https://www.r-project.org/
  38. Rehner SA, Samuels GJ. Molecular systematics of the hypocreales: a teleomorph gene phylogeny and the status of their anamorphs. Can J Bot. 1995;73:816–823. doi: 10.1139/b95-327. [DOI] [Google Scholar]
  39. Saitou N, Nei M. The neighbour-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
  40. Schubert M, Lindgreen S, Orlando L. Adapterremoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes. 2016;9:1–7. doi: 10.1186/s13104-016-1900-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Selin C, De Kievit TR, Belmonte MF, Fernando W. Elucidating the role of effectors in plant-fungal interactions: progress and challenges. Front Microbiol. 2016;7:600. doi: 10.3389/fmicb.2016.00600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  43. Smit AFA, Hubley R, Green P (2015) RepeatMasker Open-4.0. 2013–2015.
  44. Sonah H, Deshmukh RK, Bélanger RR. Computational prediction of effector proteins in fungi: opportunities and challenges. Front Plant Sci. 2016;7:126. doi: 10.3389/fpls.2016.00126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Sperschneider J, Dodds PN, Gardiner DM, Manners JM, Singh KB, Taylor JM. Advances and challenges in computational prediction of effectors from plant pathogenic fungi. PLoS Pathog. 2015;11:e1004806. doi: 10.1371/journal.ppat.1004806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Sperschneider J, Gardiner DM, Dodds PN, Tini F, Covarelli L, Singh KB, et al. EffectorP: predicting fungal effector proteins from secretomes using machine learning. New Phytol. 2015;210:743–761. doi: 10.1111/nph.13794. [DOI] [PubMed] [Google Scholar]
  47. Sperschneider J, Dodds PN, Gardiner DM, Singh KB, Taylor JM. Improved prediction of fungal effector proteins from secretomes with effectorP 2.0. Mol Plant Pathol. 2018;19:2094–2110. doi: 10.1111/mpp.12682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Stanke M, Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 2005;33((suppl_2)):465–467. doi: 10.1093/nar/gki458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Stergiopoulos I, de Wit PJ. Fungal effector proteins. Annu Rev Phytopathol. 2009;47:233–263. doi: 10.1146/annurev.phyto.112408.132637. [DOI] [PubMed] [Google Scholar]
  50. Tajima F, Nei M. Estimation of evolutionary distance between nucleotide sequences. Mol Biol Evol. 1984;1:269–285. doi: 10.1093/oxfordjournals.molbev.a040317. [DOI] [PubMed] [Google Scholar]
  51. Tamura K, Stecher G, Kumar S. MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol. 2021;38:3022–3027. doi: 10.1093/molbev/msab120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. UniProt Consortium UniProt: the universal protein knowledge base in 2021. Nucleic Acids Res. 2021;49(D1):D480–D489. doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Vilgalys R, Hester M. Rapid genetic identification and mapping of enzymatically amplified ribosomal DNA from several Cryptococcus species. J Bacteriol. 1990;172:4238–4246. doi: 10.1128/jb.172.8.4238-4246.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. White TJ, Bruns T, Lee S, Taylor J. PCR protocols: a guide to methods and application. San Diego: Academic Press; 1990. Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics; pp. 315–322. [Google Scholar]
  55. Wijayawardene NN, Hyde KD, Rajeshkumar KC, et al. Notes for genera: ascomycota. Fungal Divers. 2017;86:1–594. doi: 10.1007/s13225-017-0386-0. [DOI] [Google Scholar]
  56. Wilken PM, Aylward J, Chand R, et al. IMA genome - F13. IMA Fungus. 2020;11:1–17. doi: 10.1186/s43008-020-00039-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Wingfield BD, Berger DK, Steenkamp ET, et al. Draft genome of Cercospora zeina, Fusarium pininemorale, Hawksworthiomyces lignivorus, Huntiella decipiens and Ophiostoma ips. IMA Fungus. 2017;8:385–396. doi: 10.5598/imafungus.2017.08.02.10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Wingfield BD, De Vos L, Wilson AM, et al. IMA Genome - F16. IMA Fungus. 2022;13:1–22. doi: 10.1186/s43008-022-00089-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Xu H, Luo X, Qian J, Pang X, Song J, Qian G, Chen J, Chen S. FastUniq: a fast de novoduplicate removal tool for paired short reads. PLoS One. 2012;7:e52249. doi: 10.1371/journal.pone.0052249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Zhang H, Miao H, Wang L, Qu L, Liu H, Wang Q, Yue M. Genome sequencing of the important oilseed cropSesamum indicumL. Genome Biol. 2013;14:1–9. doi: 10.1186/gb-2013-14-1-401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Zhang H, Yohe T, Huang L, Entwistle S, Wu P, Yang Z, Busk PK, Xu Y, Yin Y. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2018;46:W95–W101. doi: 10.1093/nar/gky418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Zimin AV, Marçais G, Puiu D, Roberts M, Salzberg SL, Yorke JA. The MaSuRCA genome assembler. Bioinformatics. 2013;29:2669–2677. doi: 10.1093/bioinformatics/btt476. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The authors state that all the necessary data to confirm the conclusions presented in the article are entirely included within the article, and the supplementary data to this article have been deposited to Mendeley’s data with DOI number https://doi.org/10.17632/26m5grnb88.1.


Articles from 3 Biotech are provided here courtesy of Springer

RESOURCES