A large accessory genome and high recombination rates may influence global distribution and broad host range of the fungal plant pathogen Claviceps purpurea

Stephen Wyka; Stephen Mondo; Miao Liu; Vamsi Nalam; Kirk Broders

doi:10.1371/journal.pone.0263496

. 2022 Feb 10;17(2):e0263496. doi: 10.1371/journal.pone.0263496

A large accessory genome and high recombination rates may influence global distribution and broad host range of the fungal plant pathogen Claviceps purpurea

Stephen Wyka ¹, Stephen Mondo ^1,², Miao Liu ³, Vamsi Nalam ¹, Kirk Broders ^4,^5,^*

Editor: Christopher Toomajian⁶

¹Department of Agricultural Biology, Colorado State University, Fort Collins, Colorado, United States of America

²United States Department of Energy Joint Genome Institute, Berkeley, California, United States of America

³Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, Canada

⁴USDA, Agricultural Research Service, National Center for Agricultural Utilization Research, Mycotoxin Prevention and Applied Microbiology Research Unit, Peoria, IL, United States of America

⁵Smithsonian Tropical Research Institute, Apartado Panamá, República de Panamá

⁶Kansas State University, UNITED STATES

Competing Interests: The authors have declared that no competing interests exist. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. The U.S. Department of Agriculture prohibits discrimination in all its programs and activities on the basis of race, color, national origin, age, disability, and where applicable, sex, marital status, familial status, parental status, religion, sexual orientation, genetic information, political beliefs, reprisal, or because all or part of an individual’s income is derived from any public assistance program. (Not all prohibited bases apply to all programs.) Persons with disabilities who require alternative means for communication of program information (Braille, large print, audiotape, etc.) should contact USDA’s TARGET Center at (202) 720-2600 (voice and TDD). To file a complaint of discrimination, write to USDA, Director, Office of Civil Rights, 1400 Independence Avenue, S.W., Washington, D.C. 20250-9410, or call (800) 795-3272 (voice) or (202) 720-6382 (TDD). USDA is an equal opportunity provider and employer.

^✉

* E-mail: kirk.broders@usda.gov

Roles

Stephen Wyka: Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

Stephen Mondo: Formal analysis, Methodology, Writing – review & editing

Miao Liu: Conceptualization, Resources, Writing – review & editing

Vamsi Nalam: Project administration, Supervision, Writing – review & editing

Kirk Broders: Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing

Christopher Toomajian: Editor

PMCID: PMC8830672 PMID: 35143550

Abstract

Pangenome analyses are increasingly being utilized to study the evolution of eukaryotic organisms. While pangenomes can provide insight into polymorphic gene content, inferences about the ecological and adaptive potential of such organisms also need to be accompanied by additional supportive genomic analyses. In this study we constructed a pangenome of Claviceps purpurea from 24 genomes and examined the positive selection and recombination landscape of an economically important fungal organism for pharmacology and agricultural research. Together, these analyses revealed that C. purpurea has a relatively large accessory genome (~ 38%), high recombination rates (ρ = 0.044), and transposon mediated gene duplication. However, due to observations of relatively low transposable element (TE) content (8.8%) and a lack of variability in genome sizes, prolific TE expansion may be controlled by frequent recombination. We additionally identified that within the ergoline biosynthetic cluster the lpsA1 and lpsA2 were the result of a recombination event. However, the high recombination rates observed in C. purpurea may be influencing an overall trend of purifying selection across the genome. These results showcase the use of selection and recombination landscapes to identify mechanisms contributing to pangenome structure and primary factors influencing the evolution of an organism.

Introduction

Pangenomes can provide useful insight into a species distribution and lifestyle through examination of gene functional diversity, abundance, and distribution into core and accessory genomes. These variations often provide fitness advantages and promote adaptive evolution of the organism [1–3]. In prokaryotes the existence of more open pangenomes (large accessory) has been suggested to be the result of adaptive evolution that allows organisms, with large effective population sizes, to migrate into new ecological niches [4]. Whereas closed pangenomes (larger core) are found to be associated with more obligate and specialized organisms [4]. Similar results have been identified in fungal species, where a range of saprotrophic to opportunistic yeasts were found to have accessory genomes representing ~ 9–19% of the genes [5], while Zymoseptoria tritici, a global wheat pathogen, has 40% of genes in the accessory genome [6]. This increase in the Z. tritici accessory genome reflects the global distribution of this pathogen that must continuously adapt to overcome new host resistances and multiple cycles of annual fungicide applications [6, 7]. While the identification of pangenome sizes provide valuable knowledge of polymorphic gene content, which can be used to infer the lifestyle of the species [4], a combination of pangenomic and alternative genomic analyses provide a deeper understanding of the primary factors that are contributing to pangenome structure and the adaptive trajectory of the organism.

Claviceps purpurea is a biotrophic ascomycete plant pathogen that has a specialized ovarian-specific non-systemic lifestyle with its grass hosts [8]. Despite the specialized infection pattern, C. purpurea has a broad host range of ~ 400 grass species across 8 grass tribes, including economically important cereal crops such as wheat, barley, and rye and has a global distribution [8]. However, the mechanisms that underlie the evolutionary success of this species is still understudied. Unlike other pathogens of cereal crops, researchers have been unsuccessful in identifying qualitative resistance genes in crop or wild grass varieties [9–11]. Menzies et al. [9] noted the potential for a complex virulence and host susceptibility relationship of C. purpurea on durum and hexaploid wheat varieties, however, virulence was determined if sclerotia weighed > 81 mg; indicating that C. purpurea is able to initiate its biotrophic interaction but might be arrested during the final stages of sclerotia development. During infection the fungus does not induce necrosis or hypersensitive response (host mediated cell death) in its host, instead it actively manages to maintain host cell viability to obtain nutrients from living tissue through a complex cross-talk of fungal cytokinin production [12–16]. Furthermore, Wyka et al. [17] revealed evidence of tandem gene duplication occurring in genes often associated with pathogenicity or evasion of host defenses (effectors), which may provide insight into the success of the species. However, the factors influencing these duplication events remain unclear.

Claviceps purpurea is also known for its diverse secondary metabolite profile of ergot alkaloids and pigments [18–21]. Fungal secondary metabolites can play important roles in plant-host interactions as virulence factors but can also increase the fitness of the fungus through stress tolerance [8, 22, 23]. It was recently postulated that the evolution of C. purpurea was associated with a host jump and subsequent adaptation and diversification to cooler, more open habitats [8, 17]. In addition, likely due to the toxicity of ergot alkaloids, grass grazing mammals showed avoidance in grazing grass infected with C. purpurea, suggesting a potential for beneficial effects for the host plant [24]. This along with other evidence of neutral to positive effects of infection to host plants [25, 26] suggest that C. purpurea is a conditional defensive mutualist [24].

In this study, we implement a comprehensive population genomic analysis to gain a deeper understanding of factors governing the evolution and adaptive potential of C. purpurea. Using 24 isolates, from six countries and three continents, we constructed the pangenome and subsequently identified genes with signatures of positive selection. Full genome alignments were further utilized to estimate population recombination rates and predict recombination hotspots. We observed a large accessory genome which may be influenced by a large effective population size and high recombination rates, which subsequently influence an overall trend of purifying selection and likely help defend against TE expansion. In addition, we observed that the lpsA1 and lpsA2 genes of the well-known ergoline biosynthetic cluster were likely the result of a recombination event.

Materials and methods

Genome data

Haploid genome data from a collection of 24 isolates was utilized in this study to provide a comprehensive analysis of Claviceps purpurea. The 32.1 Mb reference genome of C. purpurea strain 20.1 was sequenced in 2013 using a combination of single and paired-end pyrosequencing (3 kb fragments) resulting in a final assembly of 191 scaffolds [18; NCBI: SAMEA2272775]. The remaining 23 isolates were recently Illumina sequenced, assembled, and annotated in [17, 27; NCBI BioProject: PRJNA528707], representing a collection of isolates from USA, Canada, Europe, and New Zealand (Table 1). To define gene models for our subsequent analyses, the reference genome was subject to an amino acid cutoff of 50 to match the other 23 isolates. In this study, we report the pangenome of C. purpurea, analysis of the population genomic recombination, and the landscape of genes with signatures of positive selection.

Table 1. Collection and annotation statistics for the 24 Claviceps purpurea genomes used in this study.

Strain ID^†	Origin	Host	Genome size (Mb)	Genomic GC (%)	TE^‡ content (%)	Gene count	BUSCO^§ score (%)
LM46	Canada: Alberta	T. turgidum subsp. durum	30.6	51.80%	9.64%	8,455	97.00%
LM60	Canada: Manitoba	Avena sativa	30.6	51.70%	9.29%	8,498	97.10%
LM223	Canada: Manitoba	Bromus riparius	30.8	51.70%	10.53%	8,438	96.60%
LM207	Canada: Manitoba	Elymus repens	30.5	51.80%	9.18%	8,475	97.00%
LM5	Canada: Manitoba	Hordeum vulgare	30.5	51.80%	8.95%	8,508	97.40%
LM33	Canada: Manitoba	Hordeum vulgare	30.5	51.80%	9.20%	8,557	97.10%
LM232	Canada: Manitoba	Phalaris canariensis	30.7	51.70%	9.36%	8,512	96.70%
LM233	Canada: Manitoba	Phalaris canariensis	30.6	51.80%	9.89%	8,717	96.60%
LM4	Canada: Manitoba	Tricosecale	30.6	51.80%	10.04%	8,470	96.90%
LM470	Canada: Ontario	Elymus repens	30.5	51.80%	8.95%	8,591	96.80%
LM474	Canada: Ontario	Hordeum vulgare	30.6	51.80%	9.38%	8,500	97.20%
LM469	Canada: Ontario	Triticum aestivum	30.5	51.80%	10.01%	8,394	96.50%
LM461	Canada: Quebec	Elymus repens	30.5	51.80%	8.42%	8,656	97.30%
LM14	Canada: Saskatchewan	Hordeum vulgare	30.6	51.80%	9.96%	8,422	97.30%
LM30	Canada: Saskatchewan	Hordeum vulgare	30.6	51.80%	9.35%	8,526	96.30%
LM39	Canada: Saskatchewan	T. turgidum subsp. durum	30.5	51.80%	10.11%	8,591	97.00%
LM28	Canada: Saskatchewan	Triticum aestivum	30.6	51.70%	9.58%	8,713	97.00%
LM582	Europe: Czech Republic	Secale cereale	30.7	51.80%	9.55%	8,518	95.50%
20.1	Europe: Germany	Secale cereale	32.1	51.60%	10.87%	8,703	95.50%
LM71	Europe: United Kingdom	Alopercurus myosuroides	30.5	51.80%	9.59%	8,472	97.00%
Clav55	Oceania: New Zealand	Lolium perenne	30.7	51.80%	9.80%	8,480	97.00%
Clav04	USA: Colorado	Bromus inermis	31.8	51.70%	10.05%	8,824	97.70%
Clav26	USA: Colorado	Hordeum vulgare	30.8	51.70%	9.07%	8,737	98.00%
Clav46	USA: Wyoming	Secale cereale	30.8	51.70%	9.68%	8,597	97.10%

Open in a new tab

† NCBI BioProject: PRJNA528707 (except 20.1, NCBI Accession = SAMEA2272775).

‡ Transposable element content presented in [17], as a proportion of genomic sequences.

§ Benchmarking Universal Single-Copy Orthologs Dikarya database (odb9).

Gene functional and transposable element (TE) annotations utilized were those reported in Wyka et al. [17] and datasets Wyka et al. [27]. In brief, secondary metabolite clusters were predicted using antiSMASH v5 [28], with all genes belonging to identified clusters classified as secondary (2°) metabolites. Functional domain annotations were conducted using InterProScan v5 [29], HMMer v3.2.1 [30] search against the Pfam-A v32.0 and dbCAN v8.0 CAZYmes databases, and a BLASTp 2.9.0+ search against the MEROPs protease database v12.0 [31]. Proteins were classified as secreted proteins if they had signal peptides detected by both Phobius v1.01 [32] and SignalP v4.1 [33] and did not possess a transmembrane domain as predicted by Phobius and TMHMM v2.0 [34]. Effector proteins were identified by using EffectorP v2.0 [35] on the set of secreted proteins for each genome. Transmembrane proteins were identified if both Phobius and TMHMM detected transmembrane domains. Transposable elements fragments were identified following procedures for establishment of de novo comprehensive repeat libraries set forth in Berriman et al. [36] through a combined use of RepeatModeler v1.0.8 [37], TransposonPSI [38], LTR_finder v1.07 [39], LTR_harvest v1.5.10 [40], LTR_digest v1.5.10 [41], Usearch v11.0.667 [42], and RepeatClassifier v1.0.8 [37] with the addition of all curated fungal TEs from RepBase v24.03 [43]. RepeatMasker v4.0.7 [37] was then used to identify TE regions and soft mask the genomes. These steps were automated through construction of a custom script, TransposableELMT (https://github.com/PlantDr430/TransposableELMT) [17, 20].

Pangenome analysis

The pangenome was constructed using OrthoFinder v2.3.3 [44], on all genes identified from the 24 genomes, to infer groups of orthologous gene clusters (orthogroups). OrthoFinder was run using BLASTp on default settings. For downstream analysis, gene clusters were classified as secreted, predicted effectors, transmembrane, secondary (2°) metabolites, carbohydrate-degrading enzymes (CAZys), proteases (MEROPs), and conserved domain (conserved) clusters if ≥ 50% of the strains present in a gene cluster had at least one protein classified as such. Gene clusters not grouped into any of the above categories were categorized as unclassified.

Core and pangenome size curves were extrapolated from resampling of 24 random possible combinations for each pangenome size of 1–24 genomes and modelled by fitting the power law regression formula: y = Ax^B + C using the curve_fit function in the Python module Scipy v1.4.1. These processes were automated through the creation of a custom python script (https://github.com/PlantDr430/FunFinder_Pangenome).

Positive selection

To investigate the positive selection landscape of genes we utilized the 53 isolates (22 species) of the Claviceps genus [17, 20; NCBI BioProject: PRJNA528707] and found single-copy orthologs using OrthoFinder v2.3.3 with BLASTp on default settings. A total of 3,628 single-copy orthologs were identified (See Table 2 for detailed report). For each ortholog cluster sequences were aligned using MUSCLE v3.8.1551 [45] on default settings and values of dN, dS, and dN/dS (omega, ω) were estimated using the YN00 [46] method in PAML v4.8 using default parameters.

Table 2. PAML processing information and filtering of core orthogroups for calculation of dN/dS (ω) ratios.

Total gene clusters (Pangenome)	10,540
Single-copy gene clusters (Pangenome)	6,244
Single-copy gene clusters (Claviceps genus)	3,628
Number of clusters with N/A PAML results	33
Cluster Classification (non-redundant) ^†:	Total Pangenome	Total Core^‡	Single-copy genes (Pangenome)^‡	Single-copy genes (Claviceps genus)^‡
Effectors	257	100 (38.9%)	84 (32.7%)	13 (5.1%)
Secreted	366	278 (75.9%)	253 (69.1%)	109 (29.8%)
2° Metabolites	313	202 (64.5%)	181 (57.8%)	78 (24.9%)
Transmembrane	1,210	998 (82.5%)	949 (78.4%)	567 (46.9%)
MEROPs	167	149 (89.2%)	143 (85.6%)	89 (53.3%)
CAZys	75	68 (90.7%)	66 (88.0%)	36 (48.0%)
Conserved	4,754	3,985 (83.8%)	3,808 (80.1%)	2,390 (50.3%)
Unclassified	3,398	778 (22.9%)	717 (21.1%)	320 (9.4%)

Open in a new tab

† For statistical purposes classification is structured such that each cluster is only represented once (in the order provided), i.e. secreted clusters are those not already classified as effectors, etc.

‡ Percentage out of total pangenome.

For statistical purposes, each gene cluster was only characterized by one functional category in the order displayed in Table 2 (i.e. secreted genes are those not already classified as effectors, etc) (See Methods section Statistical analyses and plotting).

Genome alignment, SNP calling, and recombination

Procedures followed [47], for creation of a fine-scale recombination map of fungal organisms and identification of recombination hotspots. A brief description will be provided below, for a more detailed methodology and explanation of algorithms refer to [47–49].

LastZ and MultiZ from the TBA package [50] was used to create the population genome alignment projected against the reference genome, C. purpurea strain 20.1 [18]. Alignments in MAF format were filtered using MafFilter v.1.3.1 [51] following [47]. Final alignments were merged according to the reference genome and subsequently divided into nonoverlapping windows of 100 kb. MafFilter was additionally used to compute genome-wide estimates of nucleotide diversity (Watterson’s θ) and Tajima’s D in 10 kb windows. Single nucleotide polymorphisms (SNPs) were called by MafFilter from the final alignment. Principal Component Analysis (PCA) and a Maximum-Likelihood phylogeny were conducted with fully resolved biallelic SNPs (Table 3) using the R package SNPRelate v1.18.1 [52] and RAxML v8.2.12 [53] using GTRGAMA and 1000 bootstrap replicates, respectively.

Table 3. Summary statistics of whole-genome alignment filtering and SNP calls for Claviceps purpurea.

C. purpurea strain 20.1
Number of scaffolds	191
Size of reference genome (bp)	32,091,443
Number of exonic sites in reference genome (bp)	12,774,951 (39.8%)
Number of haplotypes	24
Summary Genome alignment:	Total Alignment Length (bp)	Number of alignment blocks
MultiZ alignment	27,523,755	16,330
Keep blocks with all strains	27,517,978	15,861
MAFFT in 10kb windows	27,378,024	15,870
Filter 1	26,198,304	57,891
Filter 2	24,959,120	97,532
Merged per contigs (N’s filled in)	31,389,412	154
Total number of SNPs	1,152,999
Total number of analyzed SNPs (biallelic, no unresolved state) and percent of total SNPs	1,076,901 (93.4%)
Total number of SNPs in exons and percent of total	370,045 (32.1%)
Total number of analyzed SNPs in exons (biallelic, no unresolved state) and percent of total analyzed SNPs in exons	358,258 (96.8%)
Diversity in 10kb windows:	Median
Watterson’s Θ	0.01196
Tajima’s D	-0.82522

Open in a new tab

The following process was automated through the creation of a custom python script (https://github.com/PlantDr430/CSU_scripts/blob/master/Fungal_recombination.py). LDhat [54] was used to estimate population recombination rates (ρ) from the filtered alignment using only fully resolved biallelic positions. A likelihood table was created for the θ value 0.01, corresponding to the genome-wide Watterson’s θ of C. purpurea (Table 3; Julien Dutheil per comm), and LDhat was run with 10,000,000 iterations, sampled every 5000 iterations, with a burn-in of 100,000. The parameter ρ relates to the actual recombination rate in haploid organism through the equation ρ = 2N_e × r, where N_e is the effective population size and r is the per site rate of recombination. However, without knowledge of N_e we cannot confidently infer r and thus sought to avoid the bias of incorrect assumptions. Therefore, we reported the population recombination rate (ρ).

Resulting recombination maps were filtered to remove pairs of SNPs for which the confidence interval of the recombination estimate was higher than two times the mean [47]. Average recombination rates were calculated in regions by weighing the average recombination estimate between every pair of SNPs by the physical distance between the SNPs. Using the reference annotation file [18], we calculated the average recombination rates for features in each gene: 1) exons, 2) introns, 3) 500 bp upstream, and 4) 500 bp downstream with a minimum of three filtered SNPs. Flanking upstream and downstream regions correspond to the 5´ and 3´ regions for forward stranded genes and the 3´ and 5´ regions for reverse stranded genes. We also calculated the average recombination rate for each intergenic region between the upstream and downstream regions of each gene. Introns were added to the GFF3 file using the GenomeTools package [55]. The original recombination maps produced from LDhat (Julien Dutheil per comm) were converted from bp to kb format for use in LDhot [48] to detect recombination hotspots with 1000 simulations and the parameter—windlist 10 was used to create 20 kb background windows [49]. Only hotspots with a value of ρ between 5 and 100 and width < 20 kb were selected for further analysis [47–49].

Statistical and enrichment analyses

Statistics and figures were generated using Python3 modules SciPy v1.3.1, statsmodel v0.11.0, Matplotlib v3.1.1, and seaborn v0.10.0. All multi-test corrections were performed with Benjamini-Hochberg false discovery rate procedure. Enrichment analyses were tested using Fischer’s Exact test with a cutoff α = 0.05. Uncorrected p-values were corrected using Benjamini-Hochberg and Bonferroni multi-test correction with a false discovery rate (FDR) cutoff of α = 0.05. Corresponding p-values from correction tests were averaged together to get a final p-value. Enrichment was performed on protein domain names and GO terms. Orthogroups were only associated with a domain or GO term if ≥ 50% of the strains present in the gene cluster had one gene with the term. This process was automated through creation of a custom python script (https://github.com/PlantDr430/CSU_scripts/blob/master/Domain_enrichment.py).

Results

Pangenome analysis

We constructed a pangenome of Claviceps purpurea from 24 isolates representing a collection from three continents and six countries (Table 1). Taking advantage of plentiful isolates available from Canada, we sampled more heavily from different provinces and on different host plants. The principal component and phylogenetic analysis revealed substantial genetic variation among the samples. However, the genetic distances were not correlated with geographic distances, such as LM470 (Canada) and Clav04 (USA) grouping closer to isolates from Europe and the isolate from New Zealand (S1 Fig). In addition, across Canada and USA, isolates from similar regions rarely clustered together and were often intermixed (S1B Fig). These results agree with the results from a multi-locus genotyping of a larger set of samples from Canada and USA [56]. Previous reports [17] showed that C. purpurea isolates had similar genome size (30.5 Mb– 32.1 Mb), genomic GC content (51.6% - 51.8%), TE content (8.42% - 10.87%), gene content (8,394–8,824), and BUSCO completeness score (95.5% - 98.0%) (Table 1). The pangenome consisted of 205,354 genes which were assigned to 10,540 orthogroups. We observed 6,558 (62.22%) orthogroups shared between all 24 isolates (core genome), of which 6,244 (59.2%) were single-copy gene clusters, while the remaining core orthogroups, 314 (3%), contained paralogs (2–8 paralogs per cluster). The accessory genome consisted of 3,982 (37.78%) orthogroups with 2,851 (27.05%) shared by at least two isolates (but not all) and 1,131 (10.73%) were lineage-specific (singletons) found in only one isolate (Fig 1 and S1 Table). Within the accessory genome (including lineage-specific orthogroups) we observed 592 (5.6%) orthogroups containing paralogs, with some isolates containing > 20 genes per cluster (Fig 1C and S1 Table).

Fig 1 — **(A)** Categorization of orthogroups (gene clusters) into core (shared between all isolates), accessory (shared between ≥ 2 isolates, but not all), and singletons (found in only one isolate) according to the number of orthogroups shared between genomes. **(B)** Copy number variation in core orthogroups containing paralogs. **(C)** Presence/absence variation and copy number variation of accessory orthogroups, not including singletons. **(D)** Estimation of core and pangenome (core + accessory + singleton) sizes by random resampling of possible combinations of 1–24 genomes (dots). Curves were modelled by fitting the power law regression formula: y = Ax^B + C.

We utilized multiple gene functional categories to get a deeper understanding of how genes of different function were structured within the pangenome. As a proportion of orthogroups within each pangenome category (core, accessory, and singleton) we found that the core genome was significantly enriched in orthogroups that contained genes with conserved protein domains (conserved) (5,471; 84%), transmembrane domains (transmembrane) (1,038; 16%), peptidase and protease domains (MEROPs) (211, 3.2%), and orthogroups of carbohydrate-active enzymes (CAZys) (212, 3.2%) (P < 0.01, Fisher’s exact test, Fig 2A and 2E–2G). Effector proteins play major roles in plant-microbe interactions, often conveying infection potential of the pathogen. A total of 257 predicted effector orthogroups were identified; 100 (38.9%) were core, 143 (55.6%) were accessory, and 14 (5.4%) were singletons. Predicted effectors and orthogroups coding for secreted proteins, which also contribute to host-pathogen interactions, were significantly enriched in the accessory genome (143, 5%; 218, 7.6%; respectively) (P < 0.01, Fisher’s exact test, Fig 2C and 2D). Although, the accessory and singleton genomes were largely composed of unclassified orthogroups (1791, 62.8%; 830, 73.4%; respectively) (P < 0.01, Fisher’s exact test, Fig 2H). Lastly, we observed that orthogroups which contained secondary (2°) metabolite genes were similarly represented across all pangenome categories (P > 0.05, Fisher’s exact test, Fig 2B).

Fig 2 — Graphs indicate the proportion of orthogroups within each pangenome category of classified protein function. **(A)** Containing conserved protein domains, **(B)** genes found in secondary (2°) metabolite clusters, **(C)** possessing predicted secreted signals, **(D)** predicted to be effectors, **(E)** containing transmembrane domains, **(F)** containing MEROPs domains for proteases and peptidases, **(G)** contain CAZY enzymes, **(H)** all unclassified orthogroups not falling into a previous category. Different letters (within each classification) represent significant differences determined by multi-test corrected Fisher exact test (P < 0.01).

As expected, core orthogroups were found to be significantly enriched in general housekeeping and basic cellular functions and development such as protein and ATP binding, nucleus and membrane cellular components, and transmembrane transport, metabolic, and oxidation-reduction processes (S2 Table). Protein domains in core orthogroups were significantly enriched for several WD40-repeat domains, P-loop nucleoside triphosphate hydrolase (IPR027417), armadillo-type fold (IPR016024), and a major facilitator (PF07690) (S2 Table). When narrowing the focus to orthogroups with paralogs, core paralogous orthogroups were enriched in cytochrome P450 domains, and domains associated with trehalose activity (S3 Table). In contrast, the accessory genome was only found to be enriched in a fungal acid metalloendopeptidase domain (MER0001399) and the singleton genome had enrichment for a Tc5 transposase DNA-binding domain (PF03221) (S2 Table). Accessory paralogs were found to be enriched in several protein kinases, Myb-like domains, phosphotransferases, as well as DNA integration and a MULE transposase domain (S3 Table). It should be noted that the high abundance of unclassified genes in the accessory genome may have increased the level of type II error rates for GO and domain enrichment analyses. Overall, our results revealed a large accessory pangenome enriched with genes associated with host-pathogen interactions (predicted effectors) and an abundance of orthogroups containing paralogs (8.6%), indicating the presence of proliferate gene duplication occurring within the species.

Selection landscape

To further understand the evolution of genes within the pangenome we investigated the positive selection landscape on protein coding genes using 3,628 single-copy core orthologs to compute the ratio of non-synonymous substitutions to synonymous substitutions (dN/dS) (Table 2). Ratios of dN/dS (omega, ω) can provide information of evolutionary forces shaping an organism as genes with ω > 1 may indicate positive or diversifying selection, ω = 1 may indicate neutral evolution, and ω < 1 may indicate negative or purifying selection [57].

Overall, we saw low dN (0.047 ± 0.046) and high dS (0.37 ± 0.16) values across all functional categories (S3 Fig), corresponding to low ω ratios (Fig 3). This suggests a general trend of purifying selection within C. purpurea, with only 8 (0.2%) orthogroups with ω values > 1, of which 6 (0.16%) were functionally unclassified (Fig 3 and S4 Table). Of the two genes with ω > 1 and functional annotations, one was a high compatibility group (HMG) box domain containing protein (OG0003348, ω = 1.17) and the other contained a Type IIB DNA topoisomerase domain and was related to meiotic recombination protein rec12 (OG0003965, ω = 1.52) (S5 Table). Overall, core unclassified genes showed significantly higher ω values than all other functional categories (P < 0.05, multi-test corrected Mann-Whitney U Test, Fig 3). In contrast, transmembrane, MEROPs, CAZys, and secondary (2°) metabolites showed significantly lower ω values (P < 0.05, multi-test corrected Mann-Whitney U Test, Fig 3) compared to the other functional categories (except 2° metabolites which was not significantly different than conserved genes), indicating that these genes are frequently experiencing purifying selection.

Recombination landscape

Recombination is also an important potential driver of genome evolution and plays a central role in the adaptability of parasitic organisms to overcome host defenses [58]. Our genome-alignments contained 154 of the original 191 scaffolds (Table 3). The 37 missing scaffolds totaled 222,918 bp (average lengths = 6,192 ± 5,676 bp) and corresponded to 59 genes. Thirty-one of the missing scaffolds contain genes that were only part of the accessory genome of which six scaffolds contained two or more genes (S6 Table), suggesting that these scaffolds represent blocks of genetic material that could be lost or gained from isolate to isolate. Only 1 of the missing scaffolds did not contained any genes. Most of the genes found on these scaffolds encoded conserved domains associated with either reverse transcriptase, integrases, or helicases (S6 Table), which suggest unplaced repetitive content. Although, one scaffold (scaffold 185) did possess a gene encoding a conserved domain for a centromere binding protein (S6 Table). Together these observations may indicate the potential for dispensable chromosomes, as dispensable and mini-chromosomes often contain higher repetitive content [59], however, long-read sequencing is necessary for confirmation.

From our shared alignments of all 24 genomes, we recovered 1,076,901 biallelic SNPs corresponding to a median nucleotide diversity (Watterson’s θ) of 0.01196 and a Tajima’s D of -0.82522 calculated from 10 kb non-overlapping windows (Table 3). The resulting SNPs were used to infer the population recombination rate (ρ) from the linkage disequilibrium between SNPs based on a priori specified population mutation rate θ, which was set to 0.01 based on our nucleotide diversity (Watterson’s θ) (Table 3) [47]. The C. purpurea genome recombination landscape was highly variable as some scaffolds showed highly heterogenous landscapes, other scaffolds showed intermixed large peaks of recombination, while others still had more constantly sized peaks across the regions (Fig 4 and S4 Fig). Overall, the mean genomic population recombination rate in C. purpurea was ρ = 0.044. Recombination in specific sequence features and gene type were examined through comparison of mean population recombination rates in exons, introns, 500-bp upstream and downstream of the coding DNA sequence, and intergenic regions based on the annotation of the reference genome (strain 20.1). The distribution of population recombination rates was comparable across different gene features and gene functional categories, although, some significant differences were observed (Fig 5). In general, we found upstream regions to have the lowest recombination rates, while downstream regions have the highest recombination rates (Fig 5). The decreased recombination in upstream regions might be the result of mechanisms trying to conserve promotor regions. This trend was observed across different functional gene categories, except in predicted effector genes where exons showed the highest recombination rates and downstream regions with the lowest, although these were not significantly different (Fig 5B). Across functional categories, secreted genes and transmembrane genes showed the highest recombination rates within each gene feature but were not always significantly different (Fig 5C).

Fig 4 — Estimates of population recombination rates (ρ), in non-overlapping 1 kb windows, across four representative scaffolds displaying the different variation observed across the *Claviceps purpurea* genome. Smoothing curves were calculated from population recombination rates in 10 kb windows. See S4 Fig for remaining scaffolds.

Fig 5 — Plots indicate the distribution of estimated population recombination rates (ρ) between **(A)** different gene features (exons, introns, 500bp upstream and downstream), and **(B-D)** genes of different functional categories and classification. Different letters represent significant differences determined by Kruskal-Wallis with *post hoc* multi-test corrected Mann-Whitney U Test (α ≤ 0.01) between data within each plotting window, *** P < 0.0001. Sample sizes are embedded below each plot.

Due to the observation of paralogs (Fig 1) and evidence of tandem gene duplication in C. purpurea [17] the extent recombination might have influenced these events was investigated. Duplicated genes were found to have lower population recombination rates than all other genes within the genome (Fig 5D), suggesting that other factors are influencing gene duplication. Due to the absence of repeat-induced point (RIP) mutation [17], transposable elements (TEs) are likely a contributing factor. To investigate the association of duplicated genes with TEs we calculated the average distance of genes to transposons (DNA and long terminal repeat (LTR) retrotransposons) and the average number of flanking transposons. Results showed duplicated genes were significantly closer to LTRs and had significantly more flanking LTRs than predicted effectors and other genes (P < 0.0001, multi-test corrected Mann-Whitney U Test, S5 Fig). In addition, for all genes examined LTRs were significantly closer than DNA transposons (genes (P < 0.0001, multi-test corrected Mann-Whitney U Test, S5 Fig).

As distinct peaks of recombination were observed (Fig 4 and S4 Fig), LDhot was used to call statistically significant recombination hotspots by analysis of the intensity of recombination rates in 3 kb (1 kb increments) windows compared to background recombination rates in 20 kb windows [47–49]. After implementing a cutoff of ρ ≥ 5 and length of 20 kb [48] only five recombination hotspots were retained, ranging from 11 kb to 18.5 kb in length (Fig 6). A recombination hotspot was identified between the lpsA1 and lpsA2 genes of the ergoline biosynthetic cluster, suggesting that this gene duplication event was likely the result of recombination (Fig 6D). Association of gene functional category and TEs within hotspots varied between regions. Some hotspots showed a greater association with duplicated genes and TEs (Fig 6B–6D), while others showed no association with duplicated genes (Fig 6E) or no association with TEs (Fig 6A). In general, genes with conserved protein domains showed the highest presence within hotspots (S6 Fig). It should be noted that some unclassified genes and genes with conserved protein domains associated with hotspots were also found to be overlapping regions identified as repeats (Fig 6A–6C and 6E). Protein domains found within these genes were associated with ankyrin (IPR002110) and tetratricopeptide (IPR013026) repeats. Only 5 of the 846 duplicated genes [reported in 17] found throughout the reference genome were located within predicted recombination hotspots (Fig 6 and S6 Fig). While Wyka et al. [17] showed that gene cluster expansion was prevalent among predicted effectors, only one non-duplicated predicted effector (CCE30212.1) was found located within a recombination hotspot (Fig 6C). Together these results suggest that while recombination may result in important gene duplication, it is not the primary driver of gene duplication within C. purpurea.

Fig 6 — Panels indicate scaffolds: **(A)** scaffold 14; **(B)** scaffold 15; **(C, D)** scaffold 20; **(E)** scaffold 23. Lines indicate background population recombination rates (ρ) estimated in non-overlapping 1 kb windows. Blue bars represent the position, intensity, and width of the predicted hotspots. Genes within the hotspot window and surrounding (± 20 kb) region are depicted by arrows with protein ID’s of the reference (strain 20.1) from NCBI. Genes identified as duplicated (≥ 80% identity) from Wyka *et al*. 2021 are outlined in red. TEs are depicted by lines between genes and the corresponding hotspot graph. Colors of arrows and lines correspond to the legend on the right.

Discussion

The establishment of a Claviceps purpurea pangenome from 24 isolates, as well as the detection of core genes with signatures of positive selection and analysis of the recombination landscape have provided knowledge into how high recombination rates and gene duplication are driving the genomic evolution and adaptation of the species.

The pangenome of C. purpurea reveals a large accessory genome with 37.78% accessory orthogroups (27.05% accessory + 10.73% singleton) in comparison to four model fungal pangenomes (Saccharomyces cerevisiae, Candida albicans, Cryptococcus neoformans, and Aspergillus fumigatus), which found around 9–19% of their genes in the accessory genome [5]. Our results are more comparable to the pangenome of the fungal pathogen Zymoseptoria tritici which had an accessory genome comprised of 40% (30% accessory + 10% singleton) of genes [6]. Similar to C. purpurea, Zymoseptoria tritici is a globally distributed fungal pathogen of wheat, suggesting that fungal species with similar geographical distributions could possess comparable pangenome structures as they are under similar evolutionary pressures. Shared ecological habitats and lifestyles have been reported to influence pangenome sizes in bacteria [4]. In fact, C. purpurea and Z. tritici both experienced enrichment of predicted effector orthogroups in the accessory genome and enrichment of carbohydrate-active enzymes (CAZys) orthogroups in the core genome (Fig 2) [6], conveying a comparable similarity between gene functions as both organisms are pathogens of wheat. In addition, Badet et al. [6] suggested that the large accessory genome of Z. tritici is likely maintained due to TE activity and a large effective population size as a result of observations of high SNP density, rapid decay in linkage disequilibrium, and high recombination rates [47, 60, 61]. The same mechanisms could also explain the large accessory genome observed in C. purpurea.

We identified 37 missing scaffolds in our population genome alignment with 31 of these containing genes only present in the accessory genome, suggesting the potential for blocks of DNA that could be lost/gained between isolates. Of these accessory scaffolds 15 contained genes encoding conserved domains associated with either reverse transcriptase, integrases, or helicases and one scaffold possessed a gene encoding a conserved domain for a centromere binding protein (S6 Table). Together these could indicate the potential for dispensable mini-chromosomes, as dispensable and mini-chromosomes often contain higher repetitive content [59]. However, these unplaced contigs may be assembly artifacts. Due to our Illumina based assemblies we did not process these elements further but believe that these are important aspects of C. purpurea evolution and should be a focal point of future research with the advantage of long-read sequencing to understand their function more confidently. Due to these transcriptase rich unplaced scaffolds, the lack of RIP, and observation of TEs with 0% divergence [17], we believe transposons and/or transcriptases are influencing the evolution of the accessory genome in C. purpurea.

We observed an abundance of orthogroups containing paralogs (8.6%). This presence of gene duplication and association with LTR retrotransposons (S5 Fig) could be contributing to the large size of the accessory genome, potentially through pseudogenization and/or neofunctionalization. In fact, unclassified genes had the highest ω (dN/dS) ratios (Fig 3). In addition, the abundance of duplication in accessory unclassified genes [17] and their small sizes (S2 Fig) further suggests the presence of pseudogenization and/or neofunctionalization. Badet et al. [6] suggested that TEs were likely contributing to the Z. tritici accessory genome due to the correlation of TE content with genome size and observations of transcribed TEs. We observed a similar correlation of TE content with genome size (P = 0.004, Adj. R² = 0.28), however, our genome sizes and TE content (30.5 Mb– 32.1 Mb, 8.42% - 10.87%, respectively) were not as variable as in Z. tritici, which also had a twofold higher TE content [6]. This suggests that TEs play a more important role in Z. tritici genome expansion, however, only 0.2% of the orthogroups in Z. tritici contained paralogs suggesting that gene duplication is not as common in Z. tritici as it is in C. purpurea (8.6% paralogs). The lack of gene duplication in Z. tritici is likely due to the presence of RIP [62], which should also reduce TE expansion through silencing [63–65]. While we lack RNAseq data to observe TE transcription within C. purpurea, observations of TEs with 0% divergence in C. purpurea [17] suggest recent TE activity. The observed reduced association of recombination with duplicated genes (Fig 6D) and association of duplicated genes with LTR transposons (S5 Fig) would suggest that gene duplication in C. purpurea is mediated in part by transposon activity.

Due to the potential for transposon mediated gene duplication, it was remarkable to find relatively low TE content (~8–10%) within C. purpurea, especially in the absence of RIP. Other genomic mechanisms, such as recombination, may limit TE expansion and increases in genome size. Tiley and Burleigh [66] found a strong negative correlation between global recombination rate, genome size and LTR retrotransposon proportion across 29 plant species, indicating that higher recombination rates actively reduce genome size likely through the removal of LTR elements. A similar function may be affecting LTR content in C. purpurea, which would explain the observed differences in LTR content between Claviceps section Claviceps (low LTR content, RIP absent) and Claviceps sections Pusillae, Paspalorum, and Citrinae (high LTR content, RIP present) [17].

On average we observed a twofold higher mean population recombination rate (ρ = 0.044) in C. purpurea than Z. tritici (ρ = 0.0217) and tenfold higher than Z. ardabiliae (ρ = 0.0045) [47]. As ρ is a function of effective population size and recombination rate per site (ρ = 2N_e × r), these increases could be the result of the increment in recombination rate per site (r) and/or effective population size (N_e). Differences in ρ between the two Zymoseptoria species was postulated to be due to increased recombination rates per site as it was found that the nucleotide diversity (Watterson’s θ = 2 N_e x μ, where μ is mutation rate) was 1.6 times higher in Z. tritici (0.0139) than Z. ardabiliae (0.00866). Under an assumption that both Z. tritici and Z. ardabiliae have comparable mutation rates, N_e of Z. tritici would only be 1.6 times higher than Z. ardabiliae, therefore, the 5-fold higher ρ would likely be caused by higher recombination rates per site [47]. Our observed Watterson’s θ of 0.012 in C. purpurea (Table 2) is comparable to Z. tritici, suggesting that if mutation rates and effective populations sizes are comparable than the twofold increase in ρ is likely influenced by higher recombination rates per site in C. purpurea. Although, Z. tritici is a heterothallic organism while C. purpurea is homothallic [67] but C. purpurea does frequently out-cross in nature [19, 68], suggesting that these factors may provide a difference in effective population sizes between these organisms. In addition, mutation rates might be higher in Z. tritici, than C. purpurea, due to the presence of RIP, which identifies repeat/duplicated sequences within a genome and introduces C:G to T:A mutations to effectively silence these regions [63–65]. It has also been reported that RIP can “leak” into neighboring non-repetitive regions and introduce mutations, thus, accelerating the rate of mutations, particularly those in closer proximity to repeat regions [69–71]. If the mutation rate is increased in Z. tritici due to RIP “leakage” the nucleotide diversity in Z. tritici could be the result of high mutation rates, whereas the nucleotide diversity in C. purpurea could be influenced by higher effective population size and/or recombination rates per site. Our positive selection analysis did reveal a gene with evidence of positive selection (ω = 1.52) that is related to meiotic recombination protein rec12, which is known to catalyze the formation of dsDNA breaks that initiate homologous recombination in meiosis in yeast [72]. Kan et al. [72] further determined that the frequency of dsDNA breaks catalyzed by rec12 significantly increased the frequency of intergenic recombination. In both plants [66] and Z. tritici [73], higher recombination rates were found to increase the efficacy of purifying selection. Similarly, C. purpurea had an overall trend of purifying selection with skewness towards lower ω values (Fig 3) and an observed correlation of higher population recombination rates around genes with lower ω ratios (S7 Fig), further suggesting the potential for higher recombination rates in C. purpurea.

Additional support for higher recombination rates per site in C. purpurea could be extrapolated from recombination hotspots, or lack thereof. While we observed evidence of a heterogenous recombination landscape with several scaffolds showing large peaks in population recombination rates (Fig 4 and S4 Fig), we only predicted five recombination hotspots (Fig 6), which is in stark contrast to the ~1,200 hotspots identified in Z. tritici [74]. On average, we did observe higher population recombination rates across scaffolds compared to the rates observed across chromosomes of Zymoseptoria [47], suggesting that the background recombination rate in C. purpurea is higher and “flatter”, potentially limiting the detection of hotspots [48]. Overall, this indicates that C. purpurea exhibits high recombination rates per site, which potentially helps defend against TE expansion.

While these higher recombination rates are likely influencing the purifying selection observed in regions of C. purpurea genome, it may not be the only reason we were unable to detect predicted core effector genes with signatures of positive selection (Fig 3). Specifically, it is possible that positive selection is occurring but the whole gene dN/dS ratio is <1. Predicted effectors may have sites under positive selection, but that are not detected by this whole gene analysis of positive selection. Therefore, our results represent a lack of power to positive selection in predicted effector genes, and not actual evidence for a lack of selection. Another potential explanation is that the ancestral state of Claviceps purpurea is plant endophytism [8] and is closely related to several mutualistic grass endophytes (i.e. Epichloe, Balansia, Atkinsonella) which have been known to provide beneficial aspects to their hosts mostly through production of secondary metabolites and plant hormones [75–77]. Furthermore, Wäli et al. [24] classified C. purpurea as a conditional defense mutualist with its plant host, as they found sheep avoided grazing infected grasses and observed that infection rates were higher in grazed pastures compared to ungrazed fields. Other researchers have observed neutral to positive effects of seed set, seed weight, and plant growth on infected plants compared to uninfected plants [24–26, 78]. These factors, along with the broad host range of C. purpurea (400+ grass species) and lack of known crop resistance (R) genes, could suggest a lack of strong selection for resistance to C. purpurea in grass species [24]. This could help explain the lack of positive selection observed in predicted core effector genes, implying that effectors are not under strong selection pressure to compete in the evolutionary arms race against host defense. However, it should be noted that positive selection analyses are computed from single-copy core orthologs. Observations of significant enrichment of predicted effector genes in the accessory genome of C. purpurea and duplication of effector gene cluster [17] could implicate their role in diversity of infection potential [7], however, no host specific races of C. purpurea have been identified.

While further research is needed to better characterize the accessory genome of C. purpurea it appears that TE mediated-gene duplication and frequent recombination are likely playing a role in the expansion of C. purpurea’s accessory genome and may be influencing the success of C. purpurea. In addition, all members of Claviceps section Claviceps, which contain grass pathogens that have extended geographical distributions and host ranges, have genomes that lack RIP, exhibit gene duplication, and have comparable TE content [17], suggesting that the genomic mechanisms identified in this study might be characteristic of section Claviceps as a whole.

Conclusion

Overall, we observed that ~38% of the Claviceps purpurea pangenome is accessory, which is likely influenced by a large effective population size, frequent recombination, and TE mediated gene duplication. Pseudogenization and neofunctionalization might also be contributing due to the observed TE activity, observations of higher ω ratios, signatures of positive selection in core single-copy unclassified genes, and small size of many accessory unclassified genes. Due to a lack of RIP, prolific TE expansion is likely kept under controlled by high recombination rates, which subsequently may be influencing the overall trend of purifying selection.

Supporting information

S1 Fig. Genetic diversity of 24 Claviceps purpurea isolates.

(TIF)

Click here for additional data file.^{(570.2KB, tif)}

S2 Fig. Average protein lengths (aa) of all orthogroups in Claviceps purpurea pangenome.

(TIF)

Click here for additional data file.^{(720.8KB, tif)}

S3 Fig. Distributions of mean non-synonymous (dN) and synonymous (dS) substitution rates of core single-copy orthogroups in Claviceps purpurea.

(TIF)

Click here for additional data file.^{(364.4KB, tif)}

S4 Fig. Estimated population recombination rates of Claviceps purpurea scaffolds.

(TIF)

Click here for additional data file.^{(1.4MB, tif)}

S5 Fig. Distributions of genes and their association (distance and flanking counts) to LTR transposable elements.

(TIF)

Click here for additional data file.^{(330.6KB, tif)}

S6 Fig. Association of genes within recombination hotspots.

(TIF)

Click here for additional data file.^{(366KB, tif)}

S7 Fig. Correlation of recombination rates and omega ratios.

(TIF)

Click here for additional data file.^{(770KB, tif)}

S1 Table. Claviceps purpurea pangenome spreadsheet.

(XLSX)

Click here for additional data file.^{(11MB, xlsx)}

S2 Table. Enrichment of protein domains within pangenome.

(XLSX)

Click here for additional data file.^{(12.6KB, xlsx)}

S3 Table. Enrichment of protein domains within paralogous orthogroups.

(XLSX)

Click here for additional data file.^{(12.9KB, xlsx)}

S4 Table. PAML summarized results.

(XLSX)

Click here for additional data file.^{(101.7KB, xlsx)}

S5 Table. BLAST results of single-copy core orthologs with an ω (dN/dS) ≥ 1.

(XLSX)

Click here for additional data file.^{(12.9KB, xlsx)}

S6 Table. Annotation information of missing reference scaffolds from 24 isolate whole-genome alignment.

(XLSX)

Click here for additional data file.^{(17.8KB, xlsx)}

Acknowledgments

We would like to thank Julien Dutheil for his assistance in understanding the procedures for estimating fungal recombination rates using LDhat and LDhot.

Data Availability

Most of the relevant data are within the paper and supporting files. Additional raw datasets and scripts are available on Dryad: Wyka, Stephen et al. (2020), A large accessory genome, high recombination rates, and selection of secondary metabolite genes help maintain global distribution and broad host range of the fungal plant pathogen Claviceps purpurea, v1, Dryad, Dataset, doi: https://doi.org/10.5061/dryad.6hdr7sqxp. Whole genome sequences generated for this project were deposited in NCBI BioProject PRJNA528707.

Funding Statement

This work was supported by the Agriculture and Food Research Initiative (AFRI) National Institute of Food and Agriculture (NIFA) Fellowships Grant Program: Predoctoral Fellowships grant no. 2019-67011-29502/project accession no. 1019134 from the United States Department of Agriculture (USDA), the American Malting Barley Association grant no. 17037621, and the U.S. Department of Agriculture, Agricultural Research Service. Dr. Broders was supported in part by the Simon’s Foundation Grant number 429440 to the Smithsonian Tropical Research Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Araki H, Tian D, Goss EM, Jakob K, Halldorsdottir SS, Kreitman M, et al.Presence/absence polymorphism for alternative pathogenicity islands in Pseudomonas viridiflava, a pathogen of Arabidopsis. Pnas. 2006;103:5887–92. doi: 10.1073/pnas.0601431103 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Hartmann FE, Rodríguez de la Vega RC, Brandenburg J-T, Carpentier F, Giraud T. Gene presence–absence polymorphism in castrating Anther-Smut fungi: recent gene gains and phylogeographic structure. Genome Biology and Evolution. 2018;10:1298–314. doi: 10.1093/gbe/evy089 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Brynildsrud O, Gulla S, Feil EJ, Nørstebø SF, Rhodes LD. Identifying copy number variation of the dominant virulence factors msa and p22 within genomes of the fish pathogen Renibacterium salmoninarum. Microbial Genomics 2016;2:e000055 doi: 10.1099/mgen.0.000055 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.McInerney JO, McNally A, O’Connell MJ. Why prokaryotes have pangenomes. Nature Microbiology. 2017;2:17040 doi: 10.1038/nmicrobiol.2017.40 [DOI] [PubMed] [Google Scholar]
5.McCarthy CGP, Fitzpatrick DA. Pan-genome analysis of model fungal species. Microbial Genomics. 2019;5:e000243 doi: 10.1099/mgen.0.000243 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Badet T, Oggenfuss U, Abraham L, McDonald BA, Croll D. A 19-isolate reference-quality global pangenome for the fungal wheat pathogen Zymoseptoria tritici. BMC Biology. 2020;18:12. doi: 10.1186/s12915-020-0744-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Sánchez-Vallet A, Fouché S, Fudal I, Hartmann F, Soyer JL, Tellier A, et al. The genome biology of effector gene evolution in filamentous plant pathogens. Annual Review of Phytopathology. 2018;56:21–40. doi: 10.1146/annurev-phyto-080516-035303 [DOI] [PubMed] [Google Scholar]
8.Píchová K, Pažoutová S, Kostovčík M, Chudíčková M, Stodulůkvá E, Novák P, et al. Evolutionary history of ergot with a new infrageneric classification (Hypocreales: Clavicipitaceae: Claviceps). Molecular Phylogenetics and Evolution. 2018;123:73–87. doi: 10.1016/j.ympev.2018.02.013 [DOI] [PubMed] [Google Scholar]
9.Menzies JG, Turkington TK. An overview of the ergot (Claviceps purpurea) issue in western Canada: challenges and solutions. Canadian Journal of Plant Pathology. 2015;37:40–51. [Google Scholar]
10.Menzies JG, Klein-Gebbinck HW, Gordon A, O’Sullivan DM. Evaluation of Claviceps purpurea isolates on wheat reveals complex virulence and host susceptibility relationships. Canadian Journal of Plant Pathology. 2017;39:307–317. [Google Scholar]
11.Gordon A, McCartney C, Knox RE, et al. Genetic and transcriptional dissection of resistance to Claviceps purpurea in the durum wheat cultivar Greenshank. Theoretical and Applied Genetics. 2020. doi: 10.1007/s00122-020-03561-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Hinsch J, Vrabka J, Oeser B, Novák O, Galuszka P, Tudzynski P. De novo biosynthesis of cytokinins in the biotrophic fungus Claviceps purpurea. Environmental Microbiology. 2015;17:2935–2951. doi: 10.1111/1462-2920.12838 [DOI] [PubMed] [Google Scholar]
13.Hinsch J, Galuszka P, Tudzynski P. Functional characterization of the first filamentous fungal tRNA-isopentenyltransferase and its role in the virulence of Claviceps purpurea. New Phytologist. 2016;211:980–992. doi: 10.1111/nph.13960 [DOI] [PubMed] [Google Scholar]
14.Oeser B, Kind S, Schurack S, Schmutzer T, Tudzynski P, Hinsch J. Cross-talk of the biotrophic pathogen Claviceps purpurea and its host Secale cereale. BMC Genomics. 2017;18:273. doi: 10.1186/s12864-017-3619-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Kind S, Hinsch J, Vrabka J, Hradilová M, Majeská-Čudejková M, Tudzynski P, et al. Manipulation of cytokinin level in the ergot fungus Claviceps purpurea emphasizes its contribution to virulence. Current Genetics. 2018;64:1303–1319. doi: 10.1007/s00294-018-0847-3 [DOI] [PubMed] [Google Scholar]
16.Kind S, Schurack S, Hinsch J, Tudzynski P. Brachypodium distachyon as alternative model host system for the ergot fungus Claviceps purpurea. Molecular Plant Pathology. 2018;19:1005–1011. doi: 10.1111/mpp.12563 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Wyka SA, Mondo SJ, Liu M, Dettman J, Nalam V, Broders KD. Whole genome comparisons of ergot fungi reveals the divergence and evolution of species within the genus Claviceps are the result of varying mechanisms driving genome evolution and host range expansion. Genome Biology and Evolution. 2021;13:evaa267. doi: 10.1093/gbe/evaa267 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Schardl CL, Young CA, Hesse U, et al. Plant-symbiotic fungi as chemical engineers: multi-genome analysis of the Clavicipitaceae reveals dynamics of alkaloid loci. PLoS Genetics, 2013;9:e1003323. doi: 10.1371/journal.pgen.1003323 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Tudzynski P, Neubauer L. Ergot Alkaloids. In: Martín JF., García-Estrada C., Zeilinger S. (eds) Biosynthesis and Molecular Genetics of Fungal Secondary Metabolites. Fungal Biology. Springer, New York, NY; 2014. doi: 10.1016/j.funbio.2014.06.001 [DOI] [Google Scholar]
20.Neubauer L, Dopstadt J, Humpf H-U, Tudzynski P. Identification and characterization of the ergochrome gene cluster in the plant pathogenic fungus Claviceps purpurea. Fungal Biology and Biotechnology. 2016;3:2. doi: 10.1186/s40694-016-0020-z [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Flieger M, Stodůlková E, Wyka SA, et al. Ergochromes: heretofore neglected side of ergot toxicity. Toxins. 2019;11:439. doi: 10.3390/toxins11080439 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Avalos J, Carmen Limon M. Biological roles of fungal carotenoids. Current Genetics. 2015;61:309–324. doi: 10.1007/s00294-014-0454-x [DOI] [PubMed] [Google Scholar]
23.Pusztahelyi T, Holb IJ, Pócsi I. Secondary metabolites in fungus-plant interactions. Frontiers in Plant Science. 2015;6. doi: 10.3389/fpls.2015.00573 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Wäli PP, Wäli PR, Saikkonen K, Tuomi J. Is the pathogenic ergot fungus a conditional defensive mutualist for its host grass? PLoS ONE. 2013;8:e69249. doi: 10.1371/journal.pone.0069249 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Raybould AF, Gray AJ, Clarke RT. The long-term epidemic of Claviceps purpurea on Spartina anglica in Poole Harbour: pattern of infection, effects on seed production and the role of Fusarium heterosporum. New Phytologist. 1998;138:497–505. [Google Scholar]
26.Fisher AJ, DiTomaso JM, Gordon TR, Aegerter BJ, Ayres DR. Salt marsh Claviceps purpurea in native and invaded Spartina marshes in Northern California. Plant Disease. 2007;91:380–386. doi: 10.1094/PDIS-91-4-0380 [DOI] [PubMed] [Google Scholar]
27.Wyka SA, Mondo SJ, Liu M, Dettman J, Nalam V, Broders KD. Whole genome comparisons of ergot fungi reveals the divergence and evolution of species within the genus Claviceps are the result of varying mechanisms driving genome evolution and host range expansion. V4, Dryad, Dataset; 2020. Available from: 10.5061/dryad.6hdr7sqxp [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Blin K, Shaw S, Steinke K, Villebro R, Ziemert N, Lee SY, et al. antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic Acids Research. 2019;47:W81–87. doi: 10.1093/nar/gkz310 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Jones P, Binns D, Chang H-Y, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–1240. doi: 10.1093/bioinformatics/btu031 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Wheeler TJ, Eddy SR. nhmmer: DNA homology search with profile HMMs. Bioinformatics. 2013;29:2487–2489. doi: 10.1093/bioinformatics/btt403 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Rawlings ND, Barrett AJ, Thomas PD, Huang X, Bateman A, Finn R. The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database. Nucleic Acids Research. 2018;46:624–632. doi: 10.1093/nar/gkx1134 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Käll L, Krogh A, Sonnhammer EL. Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server. Nucleic Acids Research. 2007;35:W429–32. doi: 10.1093/nar/gkm256 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Nielsen H. Predicting secretory proteins with SignalP In: Protein function prediction (eds. Kihara D). Methods in Molecular Biology 1611 Humana Press, New York, NY; 2017. [Google Scholar]
34.Krogh A, Larsson B, von Heijne G, Sonnhammer ELL. Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. Journal of Molecular Biology. 2001;305:567–580. doi: 10.1006/jmbi.2000.4315 [DOI] [PubMed] [Google Scholar]
35.Sperschneider J, Dobbs PN, Gardiner DM, Singh KB, Taylor JM. Improved prediction of fungal effector proteins from secretomes with EffectorP 2.0. Molecular Plant Pathology. 2018;19:2094–2110. doi: 10.1111/mpp.12682 [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Berriman M, Coghlan A, Tsai IJ. Creation of a comprehensive repeat library for newly sequenced parasitic worm genome. Protocol Exchange. 2018. doi: 101038/protex2018054 [Google Scholar]
37.Smit AFA, Hubley R, Green P. RepeatMasker Open-40; 2015. Available from: http://wwwrepeatmaskerorg. [Google Scholar]
38.Hass B. TransposonPSI; 2010. Available from: http://transposonpsisourceforgenet. [Google Scholar]
39.Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Research. 2007;35:W265–268. doi: 10.1093/nar/gkm286 [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Ellinghaus D, Kurtz S, Willhoeft U. LTRharvest, a efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics. 2008;9:18. doi: 10.1186/1471-2105-9-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Steinbiss S, Willhoeft U, Gremme G, Kurtz S. Fine-grained annotation and classification of de novo predicted LTR retrotransposons. Nucleic Acids Research. 2009;37:7002–7013. doi: 10.1093/nar/gkp759 [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–2461. doi: 10.1093/bioinformatics/btq461 [DOI] [PubMed] [Google Scholar]
43.Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology. 2019;20:238. doi: 10.1186/s13059-019-1832-y [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 2005;32:1792–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Yang Z, Neilsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Molecular Biology and Evolution. 2000;17:32–34. doi: 10.1093/oxfordjournals.molbev.a026236 [DOI] [PubMed] [Google Scholar]
47.Stukenbrock EH, Dutheil JY. Fine-scale recombination maps of fungal plant pathogens reveal dynamic recombination landscape and intragenic hotspots. Genetics. 2018;208:1209–1229. doi: 10.1534/genetics.117.300502 [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Auton A, Myers S, and McVean G. Identifying recombination hotspots using population genetic data. 2014. arXiv: 1403.4264.
49.Wall JD, Stevison LS. Detecting recombination hotspots from patterns of linkage disequilibrium. G3. 2016;62265–2271. doi: 10.1534/g3.116.029587 [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research. 2004; 14:708–715. doi: 10.1101/gr.1933104 [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Dutheil JY, Gaillard S, Stukenbrock EH. MafFilter: a highly flexible and extensible multiple genome alignment files processor. BMC Genomics. 2014; 15. doi: 10.1186/1471-2164-15-53 [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28:3326–3328. doi: 10.1093/bioinformatics/bts606 [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Stamatakis A. RAxML version 8: a tool for phylogeneitc analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033 [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Auton A, McVean G. Recombination rate estimation in the presence of hotspots. Genome Research. 2007;17:1219–1227. doi: 10.1101/gr.6386707 [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Gremme G, Steinbiss S, Kurtz S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2013;10:645–656. doi: 10.1109/TCBB.2013.68 [DOI] [PubMed] [Google Scholar]
56.Liu M, Shoukouhi P, Bisson KR, Wyka SA, Broders KD, Menzies JG. Sympatric divergence of the ergot fungus, Claviceps purpurea, populations infecting agricultural and nonagricultural grasses in North America. Ecology and Evolution. 2020;11:273–293. doi: 10.1002/ece3.7028 [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Jeffares DC, Tomiczek B, Sojo V, dos Reis M. A beginners guide to estimating the non-synonymous to synonymous rate ratio of all protein-coding genes in a genome. In: Peacock C. (eds) Parasite Genomics Protocols. Methods in Molecular Biology, vol 1201. Humana Press, New York, NY; 2015. [DOI] [PubMed] [Google Scholar]
58.Morran LT, Schmidt OG, Gelarden IA, Parrish RC 2nd, Lively CM. Running with the Red Queen: host-parasite coevolution selects for biparental sex. Science. 2011;333:216–8. doi: 10.1126/science.1206360 [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Peng Z, Oliveria-Garcia E, Lin G, et al. Effector gene reshuffling involves dispensable mini-chromosomes in the wheat blast fungus. PLoS Genetics. 2019;15:e1008272. doi: 10.1371/journal.pgen.1008272 [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Croll D, Lendenmann MH, Stewart E, BA MD. The impact of recombination hotspots on genome evolution of a fungal plant pathogen. Genetics. 2015;201:1213–28. doi: 10.1534/genetics.115.180968 [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Hartmann FE, Sánchez-Vallet A, McDonald BA, Croll D. A fungal wheat pathogen evolved host specialization by extensive chromosomal rearrangements. ISME Journal. 2017;11:1189–204. doi: 10.1038/ismej.2016.196 [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Testa A, Oliver R, Hane J. Overview of genomic and bioinformatic resources for Zymoseptoria tritici. Fungal Genetics and Biology. 2015;79:13–16. doi: 10.1016/j.fgb.2015.04.011 [DOI] [PubMed] [Google Scholar]
63.Galagan JE, Calvo SE, Borkovich KA et al. The genome sequence of the filamentous fungus Neurospora crassa. Nature. 2003;422:859–868. doi: 10.1038/nature01554 [DOI] [PubMed] [Google Scholar]
64.Galagan JE Selker EU. RIP: the evolutionary cost of genome defense. TRENDS in Genetics. 2004;20:417–423. doi: 10.1016/j.tig.2004.07.007 [DOI] [PubMed] [Google Scholar]
65.Urquhart AS, Mondo SJ, Makela MR et al. Genomic and genetic insights into a cosmopolitan fungus, Paecilomyces variotii (Eurotiales). Frontiers in Microbiology. 2018;9:3058. doi: 10.3389/fmicb.2018.03058 [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Tiley GP, Burleigh JG. The relationship of recombination rate, genome structure, and patterns of molecular evolution across angiosperms. BMC Evolutionary Biology. 2015;15:194. doi: 10.1186/s12862-015-0473-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Esser K, Tudzynski P. Genetics of the ergot fungus Claviceps purpurea. I. Proof of a monoecious life-cycle and segregation patterns for mycelial morphology and alkaloid production. Theoretical Applied Genetics. 1978;53:145–149. doi: 10.1007/BF00273574 [DOI] [PubMed] [Google Scholar]
68.Amici AM, Minghetti A, Scotti T, Spalla C, Tognoli L. Ergotamine production in submerged culture and physiology of Claviceps purpurea. Applied Microbiology. 1967;15:597–602. doi: 10.1128/am.15.3.597-602.1967 [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Fudal I, Ross S, Brun H, Besnard A-L, Ermel M, Kuhn M-L, et al. Repeat-induced point mutation (RIP) as an alternative mechanism of evolution towards virulence in Leptosphaeria maculans. Molecular Plant-Microbe Interactions. 2009;22:932–941. doi: 10.1094/MPMI-22-8-0932 [DOI] [PubMed] [Google Scholar]
70.Hane JK, Williams AH, Taranto AP, Solomon PS, Oliver RP. Repeat-induced point mutation: a fungal-specific, endogenous mutagenesis process. In: van den Berg MA Maruthachalam K, editors. Genetic transformation systems in fungi. Vol. 2. Springer International Publishing, 2015. p.55–68. [Google Scholar]
71.Van de Wouw A, Cozijnsen AJ, Hane JK, Brunner PC, McDonald BA, Oliver RP, et al. Evolution of linked avirulence effectors in Leptosphaeria maculans is affected by genomic environment and exposure to resistance genes in host plants. PLoS Pathogens. 2010;6:e1001180. doi: 10.1371/journal.ppat.1001180 [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Kan F, Davidson MK, Wahls WP. Meiotic recombination protein Rec12: functional conservations, crossover homeostasis and early crossover/noncrossover decision. Nucleic Acids Research. 2011;39:1460–1472. doi: 10.1093/nar/gkq993 [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Grandaubert J, Dutheil JY, Stukenbrock EH. The genomic determinants of adaptive evolution in a fungal pathogen. Evolution Letters. 2019;3:299–312. doi: 10.1002/evl3.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Stukenbrock EH, Dutheil JY. Data and scripts for: Fine-Scale Recombination Maps of Fungal Plant Pathogens Reveal Dynamic Recombination Landscapes and Intragenic Hotspots; 2018. Database: GitLab [Internet]. Available from: https://gitlab.gwdg.de/molsysevol/ZtPopRec [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Clay K. Fungal endophytes of grasses: a defensive mutualism between plants and fungi. Ecology. 1988;69:10–16. [Google Scholar]
76.Song H, Nan Z, Song Q, Xia C, Li X, Yao X, et al. Advances in research on Epichloë endophytes in Chinese native grasses. Frontiers in Microbiology. 2016;7:1399. doi: 10.3389/fmicb.2016.01399 [DOI] [PMC free article] [PubMed] [Google Scholar]
77.Xia C, Christensen MJ, Zhang X, Nan Z. Effect of Epichloë gansuensis endophyte transgenerational effects on the water use efficiency, nutrient and biomass accumulation of Achnatherum inebrians under soil water deficit. Plant Soil. 2018;424:555–571. [Google Scholar]
78.Wyka SA. From fields to genomes: A comprehensive understanding of the lifestyle and evolution of Claviceps purpurea the ergot fungus, PhD Dissertation, Department of Agricultural Biology, Colorado State University. 2020. Available from: https://mountainscholar.org/handle/10217/211800

PLoS One. doi: 10.1371/journal.pone.0263496.r001

Decision Letter 0

Christopher Toomajian

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

19 Mar 2021

PONE-D-21-02186

A large accessory genome, high recombination rates, and selection of secondary metabolite genes help maintain global distribution and broad host range of the fungal plant pathogen Claviceps purpurea

PLOS ONE

Dear Dr. Broders,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

You will see I took the unusual step of assigning myself as a reviewer of this manuscript. I am fully disclosing this, so as not to violate any PLOS ONE policies. I had read through your manuscript completely before inviting reviewers, but it was not until receiving 2 independent reviews and going over the manuscript again before I realized I had missed something very important in your positive selection analysis methods that the 2 reviewers apparently missed as well. Unless the methods have not been explained fully, the use of the dN/dS analyses on the intra-species sample will not give valid estimates for selection pressures, affecting a large portion of the manuscript. Because the manuscript would require a major rewrite and potentially lots of reanalysis, I'm not sure whether Major Revisions or Open Reject is a better decision for the manuscript, though I have selected Major Revision. Suffice it to say, there are aspects of the manuscript and data that you present that will be useful to other researchers, and I encourage the manuscript to be revised/resubmitted. If the timeline for resubmission due to the Major Revisions decision is not possible, the rewritten manuscript can be submitted as a new manuscript, and I would be happy to again serve as academic editor.

Besides this major problem with the manuscript, the other reviewers agreed that the data presented in the paper at most partly supported your conclusions. I agree with another reviewer in that your discussion includes many speculations without robust results supporting the claims, so care should be taken in the wording of such passages to reflect what is possible rather than you believe your data clearly show [required]. Another specific point to address [required] is Reviewer #1's concern about your accessory chromosome evidence (Reviewer #2 also had similar concerns regarding resolving indels within chromosomes from accessory mini-chromosomes without longer sequencing reads). One aspect of this is the question of assembly quality, and thus what evidence scaffolds absent in some isolates actually represent chromosomes. The other is related to the global nature of the sample, such that some presence/absence differences might reflect divergence between separate populations - though it is not clear such cases of presence/absence between populations would not be classified as accessory genome sequences. You should also provide some general clarification on the statistical testing and power of the study, though I don't necessarily see a major problem here. I don't see any need to remove citations to tables and figures from the Discussion section - this stylistic choice is up to you, even if it is not all that common.

Please submit your revised manuscript by Apr 29 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Christopher Toomajian

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

3. Please include your tables as part of your main manuscript and remove the individual files. Please note that supplementary tables should remain as separate "supporting information" files.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

Reviewer #3: No

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Major concerns:

252-254: I would be more supportive about the evidence for accessory chromosome(s) if these data were derived from the population-level accessory pangenome. In this study the accessory pangenome references all isolates gathered across the globe – global populations may not exchange genetic material, so 1) How can you distinguish between accessory chromosome scaffolds and genomic regions that evolved from isolation? 2) How can you distinguish them from poorly assembled scaffolds? Short reads are especially prone to not assembling well, potentially leaving streaks of N’s in scaffolds derived from small contigs... particularly from regions with low coverage. At this point, the study points to accessory chromosomes - my assessment is that accessory chromosomes are not substantiated by these data and the other 2 options should be presented as the assumed nulls.

Other concerns:

639-646: I’m concerned about the type 1 error thresholds – considering the individual type 1 error thresholds, I think the cumulative type 1 error for the study could be large. The data seems justified visually and corrections are in place in explicit multiple comparisons analyses, but the statistical power of the overall study concerns me. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4840791/

Use of normative language without reference points, e.g. vague adjectives as in line 45 “broad genomic analyses”; general grammatic errors/tense switching/active-passive voice switching, e.g. 89 “the factors that were influencing”

Abstract:

“which is often governed by variable gene content.” - what is governed by variable gene content, what does this mean, what information does this add?

Data locations:

“Dataset, doi: XXX” – remove or add doi

Author Summary:

43-44: “ambiguity surrounding the true nature” is vague, please be specific

Introduction:

*88-89: how does tandem gene duplication implicate their positive contribution to Claviceps – how is this definitively not just a random duplication? Looking for a citation here or explanation of how

106: “use SCOs to identify genes under positive selection” – the SCOs don’t show which genes are under positive selection, what was the actual method that references the SCOs?

Results:

183-184: here, the percentage in ‘()’ is relative to the whole then the next is relative to the previous percentage; previous orthogroup percentages in the text had a common reference point. I think readers could infer, so I can see arguments either way, but I would argue either maintain common reference points or explicitly state them so the reader doesn’t have to infer.

185: “two conserved genes” – do you mean orthogroups?

224: again, I’m not sure I’d say “genes” when you are referencing what appear to be orthogroups… If there are paralogs within these orthogroups then I don’t think it is appropriate to universally asign a function to the OG because paralogs can take on different functions. At present, it might be appropriate if they are single copy orthologs, but I don’t think that is made clear if that’s the case.

252-254: see major concerns

*255-270: are these obtained from individual populations or the entire sample set?

282: RIP is introduced without expanding the acronym.

Discussion:

*338-339: how is it supported that this is potentially due to a lack of RIP?

417-420: this is not substantiated; perhaps instead of saying “it is plausible to believe”, state the two hypotheses you have mentioned. At present, this is added without any support.

Conclusions:

503-504: This section would benefit from reducing normative language i.e. “large” is used twice, “high” is used - there’s no reference to base these off. Use actual values and if you want to compare with other fungi then actually some sort of reference

Methods:

639-646: see general concerns

Reviewer #2: PONE-D-21-02186

A large accessory genome, high recombination rates, and selection of secondary metabolite genes help maintain global distribution and broad host range of the fungal plant pathogen Claviceps purpurea

The authors conducted a comprehensive study on the extended gene repertoire (pangenome) of Claviceps purpurea sampled from different hosts in six countries. The inclusion of samples from different geographical locations and the total number of 24 genomes would be adequate for exploring for the first time the pangenome of this pathogen. Overall, the results presented bring novelty and would be fit for publication in PLOS One after major revisions. Major criticisms are regarding long parts of the discussion being mostly speculations without robust results for supporting their claims. Therefore, the discussion needs to focus on the major novelties presented (i.e., positive selection in secondary metabolites) and shortened in other sections (i.e., mini-chromosome speculation, several comparisons with Zymoseptoria tritici). In my detailed comments below, I request some clarifications and suggest modifications in the manuscript.

The manuscript figures are of high quality, and the authors provide open access to scripts and genomic data, which should be praised.

Specific comments organized according to line numbers in the manuscript:

Results

FigS1: What is the impact of the hosts of origin on the population structure?

FigS1: Mention the enlarged view within the PCA.

Line 126: Liu et al not referenced. Would it be possible to provide preprint?

Line 131: Is 59.2% correct?

Line 151: Please clarify what would be the consequences of unclassified orthologs in the accessory genome for the GO enrichment.

Line 200: I understand the details for the “stringent filtering” are in the methods, but briefly mention here.

Line 220: Given that effectors in the core genome are not under selection in the present study, how authors see the role of effectors in the accessory genome?

Line 252: Since authors used Illumina reads mapped on a not complete (chromosome level) reference genome, how to distinguish whether different scaffolds would ultimately compose a single chromosome and represent indels instead of mini-chromosomes? Also, what would be the impact of structural variations in the pangenome (core + accessory + indels)?

Line 283: Why have authors not considered DNA transposons?

Line 298: “Some hotspots showed a greater association with duplicated genes and TEs (Fig. 7 B-D), while others showed a lower association (Fig. 7 A, E).” ….. If I’m interpreting the figure correctly, I’d suggest being more explicit for Figure 7A = no TE’s within the recombination hotspot.

Discussion

As a general comment, please remove references to tables and figures.

Line 363 to 382 is mainly speculative as long reads would be necessary to tackle the mini-chromosome hypothesis. Would it be possible to remove/shorten this section?

Line 324: Zymoseptoria tritici is not a biotrophic fungus (see references: doi.org/10.1016/j.fgb.2015.04.001and doi.org/10.1094/phi-i-2011-0407-01), and the only grass it infects is wheat (Please see http://www.genome.org/cgi/doi/10.1101/gr.118851.110 and also Seifbarghi et al 2009). Please, authors should correct this statement and discussion based on this wrong definition. Please clarify that the global distribution of Zymoseptoria tritici is due to wheat infection and that grasses (other than wheat) are infected by different Zymoseptoria.

Line 325: Please tone down comparisons with Zymoseptoria tritici. Z. tritici is heterothallic, has a very limited host range, and develops across a crop season mostly via asexual spores (and hyphae).

Line 343: Is there RNAseq data available for improving gene annotation? Considering the signatures of positive selection on unclassified genes, how would positive selection act on pseudogenes in C. purpurea?

Line 428: Populations in areas without fungicide pressure or deployment of resistant cultivars (Israel, Oregon) have similarly high levels of genetic diversity when compared to regions with intensive usage and deployment (Switzerland, Oregon R) as for example in the study of Hartmann et al. 2017. I suggest authors to shorten this discussion section to avoid speculations.

Line 435: Are C. purpurea of different hosts able to cross sexually? How would it influence population recombination rates and hotspots analyses?

Line 446: Are high levels of recombination expected in a homothallic fungus?

Line 478: I see this section on the evolution of secondary metabolites in C. purpurea as a novel and relevant discussion that should receive focus by shorting previous sections.

L 488: Reference about Puccinia missing.

Materials and Methods

Table 1: I could not find LM28 and 582 in the PRJNA528707.

Line 522: Mention the sequencing technology for the 23 isolates.

Line 524: “cutoff of 50 aa” What does it mean?

Line 546: What RepBase version?

Line 567: Describe how gene sequences of individual genomes were extracted for selection analyses.

Reviewer #3: The authors use resequenced, assembled, and annotated genomes from 24 C. purpurea isolates to perform a pangenome analysis, and combine this work with two additional analyses, estimating recombination rates (and detecting hotspots), and performing an analysis of selection on coding sequences (single copy orthologous genes). There are many different approaches to infer the action of natural selection in population genetics datasets, and they authors here estimate selection (purifying and positive selection) through estimating dN/dS (omega) for their population set.

The work presented here involves a fairly large dataset, as well as lots of bioinformatics analysis, and so those contributions alone can serve as important resources for understudied species. That said, these large-scale genomic analyses are not that well integrated in this manuscript (at least they are integrated to some extent, especially through various enrichment analyses), and other population genetic analyses could have been included that might help to integrate the different pieces in other ways. The interpretation of population genomic studies is not straightforward, and results tend to be consistent with multiple explanations until more specific analyses are carried out. In this work, the results tend to be over-interpreted, and in many cases the evidence for the authors' conclusions are rather weak. However, unless some important methods descriptions have been left out of this manuscript, the biggest problem in the manuscript is the analysis of natural selection. It appears the authors have carried out the dN/dS analyses with PAML and the CodeML algorithm using only the 24 C. purpurea isolates and without inclusion of data from additional species. The Jeffares et al. paper they cite indicates sequences from multiple species are needed for these analyses [states that PAML FAQ recommends a minimum 4-5 species, refers to multiple alignments of protein-coding gene sequences from several species in a phylogeny, in describing required input files indicates an annotated genome for at least 1 related species is needed besides the target species of interest]. dN/dS analyses are meant to compare nonsynonymous versus synonymous substitutions (alleles gone to fixation in some species or independent lineages), yet what we have in a populations capable of sexual reproduction are segregating polymorphisms (since the isolates are not independent lineages), and the distinction is important. Kryazhimskiy and Plotkin (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596312/) have shown that inferring selection pressures from dN/dS calculated from population samples does not work in the same way that dN/dS analyses work when multiple species are compared (the behavior of dN/dS is expected to be different). In particular, they find that even for genes experiencing negative selection, the observation of an elevated dN/dS (values closer to 1) is expected with intra-specific samples. And the observation of dN/dS<1 is also consistent with strong positive selection. Given this problem, the major report from the manuscript that secondary metabolism genes are more important than effectors based on the dN/dS analyses must be called into question, since it no longer has any reliable support.

The selection analyses represent a major section of the manuscript - lines 174-238 in the Results, 566-588 in the Methods, Figures 3 and 4, Table 2, Supplemental figures S3, S5, S9, and supplemental tables S4-S7 all are based on the highly questionable dN/dS results. To become publishable, the manuscript should either remove the dN/dS analyses (diminishing its findings), or go back and perform the dN/dS analysis using the genome data from multiple species presented in the 2021 GBE Wyka et al paper.

Specific comments:

line 86 - Wyka et al. (2020a) citation - besides updating this reference, make sure it also is listed as 2020a in the Reference list (not just 2000).

line 126 - Liu et al. Accepted - This did not appear in the list of references cited.

line 248 - conserved

lines 280-1 - is it possible that there is either bias or some difficulty in estimating the inferred recombination rate associated with duplicated genes (such as potential problems with read mapping and potential genotype errors for these genes)?

Line 299 - you state genes with conserved domains are most frequent within hotspots - but you are not indicating any overrepresentation, are you? Don't we expect these genes to be most frequent in hotspots because they are most frequent across the whole genome?

Line 359-60 - Reword, there is not literally a lack of recombination here.

Lines 447-449 - Your wording seems to indicate that recombination rate is determining whether purifying selection or positive selection is acting, but that is not true. Purifying and positive selection are not mutually exclusive across the genome (one can be happening at some loci, the other at other loci). And high recombination is expected to increase the efficiency of both types of selection. I don't see how high recombination could ever explain few signatures of positive selection.

Lines 508-9 - rephrase "likely controlled by" -> likely kept under control, or similar.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Zachary Konkel

Reviewer #2: No

Reviewer #3: Yes: Christopher Toomajian

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Feb 10;17(2):e0263496. doi: 10.1371/journal.pone.0263496.r002

Author response to Decision Letter 0

29 Jun 2021

We thank you for your feedback on our manuscript. We have addressed your comments and have

altered the text in accordance with your advice.

Reviewer #1:

Major concerns:

252-254: I would be more supportive about the evidence for accessory chromosome(s) if these

data were derived from the population-level accessory pangenome. In this study the accessory

pangenome references all isolates gathered across the globe – global populations may not

exchange genetic material, so 1) How can you distinguish between accessory chromosome

scaffolds and genomic regions that evolved from isolation? 2) How can you distinguish them

from poorly assembled scaffolds? Short reads are especially prone to not assembling well,

potentially leaving streaks of N’s in scaffolds derived from small contigs... particularly from

regions with low coverage. At this point, the study points to accessory chromosomes - my

assessment is that accessory chromosomes are not substantiated by these data and the other 2

options should be presented as the assumed nulls. – Claviceps purpurea is still spread globally

through import and export of contaminated grain. In fact, one of the isolates that we have

sequenced for this study that was from New Zealand was sent to us by an APHIS agent at a

check point in Oregon from an import of contaminated seed. It is very plausible to suggest that

not all contaminated seed is identified and that some foreign samples of C. purpurea do cross

country borders. In addition, Liu et al. 2020 (https://doi.org/10.1002/ece3.7028) identified that

geographical distribution did not correlate with genetically sub-divided populations of C.

purpurea. While this section represented a very small portion of our paper, we have further

reduced the discussion and presentation of these results and are simply stating a hypothesis that

will need to be studied with the use of long-read sequencing and a larger population of samples.

Other concerns:

639-646: I’m concerned about the type 1 error thresholds – considering the individual type 1

error thresholds, I think the cumulative type 1 error for the study could be large. The data seems

justified visually and corrections are in place in explicit multiple comparisons analyses, but the

statistical power of the overall study concerns

me. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4840791/ - We understand reviewer 1’s

concern regarding type 1 error. As noted by the reviewer we have made our best attempt to limit

the potential for type 1 error, and the statistical analyses demonstrate this. Given there are

multiple analyses in the manuscript it is not possible to measure the statistical power of the entire

study. Therefore, we believe we have provided an accurate interpretation of the data, and do not

try to over extrapolate these results in the discussion and know there is the potential for type 1

error. Only future experimental data will be able to determine the validity of some of our

analyses

Use of normative language without reference points, e.g. vague adjectives as in line 45 “broad

genomic analyses”; general grammatic errors/tense switching/active-passive voice switching,

e.g. 89 “the factors that were influencing” – We have addressed this throughout the manuscript.

Abstract:

“which is often governed by variable gene content.” - what is governed by variable gene content,

what does this mean, what information does this add? - Statement has been removed.

Data locations:

“Dataset, doi: XXX” – remove or add doi – doi has been added

Author Summary:

43-44: “ambiguity surrounding the true nature” is vague, please be specific – Statement has been

made more succinct.

Introduction:

*88-89: how does tandem gene duplication implicate their positive contribution to Claviceps –

how is this definitively not just a random duplication? Looking for a citation here or explanation

of how – Statement has been reworded.

106: “use SCOs to identify genes under positive selection” – the SCOs don’t show which genes

are under positive selection, what was the actual method that references the SCOs? – Statement

has been altered

Results:

183-184: here, the percentage in ‘()’ is relative to the whole then the next is relative to the

previous percentage; previous orthogroup percentages in the text had a common reference point.

I think readers could infer, so I can see arguments either way, but I would argue either maintain

common reference points or explicitly state them so the reader doesn’t have to infer. – Changed

to be consistent with relative to the whole.

185: “two conserved genes” – do you mean orthogroups? – Yes, but these are single-copy

orthologs. We believe the designation of gene is appropriate.

224: again, I’m not sure I’d say “genes” when you are referencing what appear to be

orthogroups… If there are paralogs within these orthogroups then I don’t think it is appropriate

to universally asign a function to the OG because paralogs can take on different functions. At

present, it might be appropriate if they are single copy orthologs, but I don’t think that is made

clear if that’s the case. – It has been previously stated that the dN/dS analysis and positive

selection analysis were only conducted on single-copy orthologs (no paralogs in the orthogroup).

Since we are only using single-copy orthologs, we have kept with using the term “gene”

252-254: see major concerns – See comments for major concerns.

*255-270: are these obtained from individual populations or the entire sample set? – Entire

sample set.

282: RIP is introduced without expanding the acronym. – Acronym has been expanded

Discussion:

*338-339: how is it supported that this is potentially due to a lack of RIP? – Statement has been

removed

417-420: this is not substantiated; perhaps instead of saying “it is plausible to believe”, state the

two hypotheses you have mentioned. At present, this is added without any support. - We have

altered the text and decided against stating the hypothesis regarding increased mutation due to

agricultural practices.

Conclusions:

503-504: This section would benefit from reducing normative language i.e. “large” is used twice,

“high” is used - there’s no reference to base these off. Use actual values and if you want to

compare with other fungi then actually some sort of reference – Normative language has been

toned down.

Methods:

639-646: see general concerns – See comments regarding statistics above.

Reviewer #2: PONE-D-21-02186

A large accessory genome, high recombination rates, and selection of secondary metabolite

genes help maintain global distribution and broad host range of the fungal plant pathogen

Claviceps purpurea

The authors conducted a comprehensive study on the extended gene repertoire (pangenome) of

Claviceps purpurea sampled from different hosts in six countries. The inclusion of samples from

different geographical locations and the total number of 24 genomes would be adequate for

exploring for the first time the pangenome of this pathogen. Overall, the results presented bring

novelty and would be fit for publication in PLOS One after major revisions. Major criticisms are

regarding long parts of the discussion being mostly speculations without robust results for

supporting their claims. Therefore, the discussion needs to focus on the major novelties

presented (i.e., positive selection in secondary metabolites) and shortened in other sections (i.e.,

mini-chromosome speculation, several comparisons with Zymoseptoria tritici). In my detailed

comments below, I request some clarifications and suggest modifications in the manuscript.

The manuscript figures are of high quality, and the authors provide open access to scripts and

genomic data, which should be praised.

Specific comments organized according to line numbers in the manuscript:

Results

FigS1: What is the impact of the hosts of origin on the population structure? – This analysis is

not a focus of this paper but has been discussed in Miao Liu et al. 2020

(https://doi.org/10.1002/ece3.7028). In general, C. purpurea population genetic subdivisions are

not correlated with hosts of origin.

FigS1: Mention the enlarged view within the PCA. - Enlarged PCA has been mentioned.

Line 126: Liu et al not referenced. Would it be possible to provide preprint? – It has since been

published and is available https://doi.org/10.1002/ece3.7028. We have updated the reference

section.

Line 131: Is 59.2% correct? – Yes, 6,244 / 10,540 = 59.2%. Relative to the whole (pangenome)

Line 151: Please clarify what would be the consequences of unclassified orthologs in the

accessory genome for the GO enrichment. – An increased number of unclassified orthologs in

the population of the accessory genome will likely increase the type 2 error rate and may mask

some potentially enriched GO terms. This caveat has been stated.

Line 200: I understand the details for the “stringent filtering” are in the methods, but briefly

mention here. – We have deleted this section, due to removal of the positive selection analysis.

Line 220: Given that effectors in the core genome are not under selection in the present study,

how authors see the role of effectors in the accessory genome? – We have deleted this section,

due to removal of the positive selection analysis.

Line 252: Since authors used Illumina reads mapped on a not complete (chromosome level)

reference genome, how to distinguish whether different scaffolds would ultimately compose a

single chromosome and represent indels instead of mini-chromosomes? Also, what would be the

impact of structural variations in the pangenome (core + accessory + indels)? – Illumina reads

were not mapped to a reference genome. All genomes were assembled de novo in Wyka et al.

2021, therefore this comment is not pertinent to our study.

Line 283: Why have authors not considered DNA transposons? – We have added DNA

transposons to the analysis. The supplemental figure has been updated.

Line 298: “Some hotspots showed a greater association with duplicated genes and TEs (Fig. 7 BD), while others showed a lower association (Fig. 7 A, E).” ….. If I’m interpreting the figure

correctly, I’d suggest being more explicit for Figure 7A = no TE’s within the recombination

hotspot. – Text has been made more explicit to follow reviewer’s suggestion.

Discussion

As a general comment, please remove references to tables and figures. – We have received

feedback from the editor that is it our choice to remove/keep the references to figures in the

discussion. Due to the abundance of figures in the study and supplemental files, we feel that it

would improve the ability of the reader to continue to track our discussion with references back

to the pertinent figure / data.

Line 363 to 382 is mainly speculative as long reads would be necessary to tackle the minichromosome hypothesis. Would it be possible to remove/shorten this section? – This section has

been shortened, but we decided to leave it for the purpose of stating the hypothesis which could

be important for C. purpurea evolution if dispensable are identified by another researcher

through the use of long-read sequencing.

Line 324: Zymoseptoria tritici is not a biotrophic fungus (see

references: doi.org/10.1016/j.fgb.2015.04.001and doi.org/10.1094/phi-i-2011-0407-01), and the

only grass it infects is wheat (Please

see http://www.genome.org/cgi/doi/10.1101/gr.118851.110 and also Seifbarghi et al 2009).

Please, authors should correct this statement and discussion based on this wrong definition.

Please clarify that the global distribution of Zymoseptoria tritici is due to wheat infection and

that grasses (other than wheat) are infected by different Zymoseptoria.

Line 325: Please tone down comparisons with Zymoseptoria tritici. Z. tritici is heterothallic, has

a very limited host range, and develops across a crop season mostly via asexual spores (and

hyphae). – (comments for Line 324 and Line 325) We have corrected the statements regarding

Z. tritici and have altered the discussion around these statements. However, regardless of the fact

that Z. triciti is only globally distributed because of wheat infections we still feel there is ample

claim to compare these two pathogens. Claviceps purpurea is also a pathogen of wheat as is

globally distributed (mostly due to the spread of contaminated seeds between farms). While C.

purpurea does have a larger host range, both C. purpurea and Z. tritici are under similar

ecological and environmental pressures to survive, which have been shown to influence similar

pangenome structures in bacteria.

Line 343: Is there RNAseq data available for improving gene annotation? Considering the

signatures of positive selection on unclassified genes, how would positive selection act on

pseudogenes in C. purpurea? – The RNAseq data that was available at the time of assembly and

annotations (Wyka et al. 2021) was already used in the annotation pipeline to help improve gene

predictions. We did specifically look at pseudogenes, but the few pseudogenes that have been

found in C. purpurea are associated with secondary metabolites, and specifically ergot alkaloids.

Gene gains, gene loss and gene sequence changes at the end of chromosomes are believed to be

major drivers of ergot alkaloid diversification in the Claviceptaceae (Young et al. 2015). It is

possible that many of the many of the unclassified genes are actually psuedogenes that represent

remnants of gene loss that have given rise to diverse chemotypes (Young et al. 2015).

Line 428: Populations in areas without fungicide pressure or deployment of resistant cultivars

(Israel, Oregon) have similarly high levels of genetic diversity when compared to regions with

intensive usage and deployment (Switzerland, Oregon R) as for example in the study of

Hartmann et al. 2017. I suggest authors to shorten this discussion section to avoid speculations. –

We have removed this discussion point, and left the hypothesis of RIP “leakage”, which is likely

a more plausible hypothesis.

Line 435: Are C. purpurea of different hosts able to cross sexually? How would it influence

population recombination rates and hotspots analyses? – C. purpurea of different hosts are able

to cross sexually and do so quite often. This likely increases the effective population size of C.

purpurea which may inflate recombination rates. However, a larger-scale recombination analysis

comparing multiple fungal organisms with LDhot will likely be needed to determine how

different lifestyles can influence hotspot analysis. At the time of writing, I only saw Z. tritici

being analyzed with LDhot.

Line 446: Are high levels of recombination expected in a homothallic fungus? – Sexual

recombination has been found to frequently occur in homothallic plant pathogens, such as

Sclerotinia slcerotiorum (Attanayake et al. 2014, https://www.nature.com/articles/hdy201437)

and Fusarium graminearum (Talas & McDonald 2015,

https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-2166-0), which have large

effective population sizes. Certainly, more research needs to be done across additional

homothallic fungal species but recombination among homothallic species does appear to occur,

but the rate of recombination may depend on the population size and species.

Line 478: I see this section on the evolution of secondary metabolites in C. purpurea as a novel

and relevant discussion that should receive focus by shorting previous sections. – This section

has largely been removed, due to the removal of the positive selection analysis.

L 488: Reference about Puccinia missing. - The statement was removed from the text due to

removal of the CodeML analysis, so no reference is needed.

Materials and Methods

Table 1: I could not find LM28 and 582 in the PRJNA528707. – These genomes are present in

PRJNA528707 and have recently been uploaded by NCBI, please see the screenshot of NCBI

below.

Line 522: Mention the sequencing technology for the 23 isolates. - Sequencing technology has

been mentioned.

Line 524: “cutoff of 50 aa” What does it mean? - Removed “aa” as it’s redundant.

Line 546: What RepBase version? - v24.03

Line 567: Describe how gene sequences of individual genomes were extracted for selection

analyses. – Single-copy orthologs from the pangenome analyses (described in the previous

section) were used for dN/dS analysis. This has been made more explicit.

Reviewer #3:

The authors use resequenced, assembled, and annotated genomes from 24 C. purpurea isolates to

perform a pangenome analysis, and combine this work with two additional analyses, estimating

recombination rates (and detecting hotspots), and performing an analysis of selection on coding

sequences (single copy orthologous genes). There are many different approaches to infer the action

of natural selection in population genetics datasets, and they authors here estimate selection

(purifying and positive selection) through estimating dN/dS (omega) for their population set.

The work presented here involves a fairly large dataset, as well as lots of bioinformatics analysis,

and so those contributions alone can serve as important resources for understudied species. That

said, these large-scale genomic analyses are not that well integrated in this manuscript (at least

they are integrated to some extent, especially through various enrichment analyses), and other

population genetic analyses could have been included that might help to integrate the different

pieces in other ways. The interpretation of population genomic studies is not straightforward, and

results tend to be consistent with multiple explanations until more specific analyses are carried

out. In this work, the results tend to be over-interpreted, and in many cases the evidence for the

authors' conclusions are rather weak. However, unless some important methods descriptions have

been left out of this manuscript, the biggest problem in the manuscript is the analysis of natural

selection. It appears the authors have carried out the dN/dS analyses with PAML and the CodeML

algorithm using only the 24 C. purpurea isolates and without inclusion of data from additional

species. The Jeffares et al. paper they cite indicates sequences from multiple species are needed

for these analyses [states that PAML FAQ recommends a minimum 4-5 species, refers to multiple

alignments of protein-coding gene sequences from several species in a phylogeny, in describing

required input files indicates an annotated genome for at least 1 related species is needed besides

the target species of interest]. dN/dS analyses are meant to compare nonsynonymous versus

synonymous substitutions (alleles gone to fixation in some species or independent lineages), yet

what we have in a populations capable of sexual reproduction are segregating polymorphisms

(since the isolates are not independent lineages), and the distinction is important. Kryazhimskiy

and Plotkin (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596312/) have shown that inferring

selection pressures from dN/dS calculated from population samples does not work in the same way

that dN/dS analyses work when multiple species are compared (the behavior of dN/dS is expected

to be different). In particular, they find that even for genes experiencing negative selection, the

observation of an elevated dN/dS (values closer to 1) is expected with intra-specific samples. And

the observation of dN/dS<1 is also consistent with strong positive selection. Given this problem,

the major report from the manuscript that secondary metabolism genes are more important than

effectors based on the dN/dS analyses must be called into question, since it no longer has any

reliable support. – We have incorporated all Claviceps species from the Wyka et al. 2021 paper

and attempted to re-run the analysis on 53 genomes. With the resources on hand we were able to

conduct a re-analysis of dN/dS (omega) ratios with YN00, however, the use of the site models in

CodeML were too computationally intensive (~24+ hours per single-copy ortholog, ~4,000 in the

new dataset) for our current computing capacity. This analysis would require a large expense and

time that we currently do not have. Both the lead author (Wyka) and corresponding author

(Broders) are no longer at Colorado State University (where the server for the originally analyses

was located) and have moved on to different positions respectively, and no longer have access to

the CSU server. We do not think the lack of this analysis diminishes the results of this paper or

lessens the value of the findings.

The selection analyses represent a major section of the manuscript - lines 174-238 in the Results,

566-588 in the Methods, Figures 3 and 4, Table 2, Supplemental figures S3, S5, S9, and

supplemental tables S4-S7 all are based on the highly questionable dN/dS results. To become

publishable, the manuscript should either remove the dN/dS analyses (diminishing its findings),

or go back and perform the dN/dS analysis using the genome data from multiple species

presented in the 2021 GBE Wyka et al paper. – As mentioned above, we have re-done the dN/dS

using the Claviceps genomes, but have removed the CodeML positive selection analysis.

Specific comments:

line 86 - Wyka et al. (2020a) citation - besides updating this reference, make sure it also is listed

as 2020a in the Reference list (not just 2000). – Changes have been made.

line 126 - Liu et al. Accepted - This did not appear in the list of references cited. – Correct

reference has been added

line 248 – conserved - Done

lines 280-1 - is it possible that there is either bias or some difficulty in estimating the inferred

recombination rate associated with duplicated genes (such as potential problems with read

mapping and potential genotype errors for these genes)? – For our recombination analysis we

used a method that utilizes haploid whole genome alignments with LastZ and MultiZ

(Stukenbrock and Dutheil 2018a), so there should be no concerns of errors with read mapping or

genotype errors. Unless I am mistaken.

Line 299 - you state genes with conserved domains are most frequent within hotspots - but you

are not indicating any overrepresentation, are you? Don't we expect these genes to be most

frequent in hotspots because they are most frequent across the whole genome? – Yes, this may

be expected, however, we did not compute any enrichment analysis as we only observed 5

recombination hotspots and felt that the statistical power of this test would be very low.

Line 359-60 - Reword, there is not literally a lack of recombination here. - The statement has

been reworded.

Lines 447-449 - Your wording seems to indicate that recombination rate is determining whether

purifying selection or positive selection is acting, but that is not true. Purifying and positive

selection are not mutually exclusive across the genome (one can be happening at some loci, the

other at other loci). And high recombination is expected to increase the efficiency of both types

of selection. I don't see how high recombination could ever explain few signatures of positive

selection. - We agree with your comment, we have altered the text to provide clarity to the

statement.

Lines 508-9 - rephrase "likely controlled by" -> likely kept under control, or similar. – Changed

to “kept under controlled”.

Attachment

Submitted filename: Reviewers_comments.docx

Click here for additional data file.^{(248.3KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0263496.r003

Decision Letter 1

Christopher Toomajian

17 Aug 2021

PONE-D-21-02186R1

A large accessory genome and high recombination rates help maintain global distribution and broad host range of the fungal plant pathogen Claviceps purpurea

PLOS ONE

Dear Dr. Broders,

I appreciate the changes that you have made in the revision, and although I have decided Minor Revisions are required, they may be relatively simple to make. First, in addition to providing my own review on the revision (Reviewer 3) and seeking the comments of one of the original reviewers (Reviewer 2), I sought the opinion of a new reviewer for the resubmission (Reviewer 4) since the revision is substantially different from the original submission with the removal of major sections.

My comments emphasize some needed changes in light of the updates of the selection analyses, but these changes are simply in wording. Following that general theme, I have also suggested other wording changes where statements or conclusions in the manuscript are not strongly supported by analyses (such as, where other alternative factors could explain a finding instead of, or in combination with a factor you highlight). These changes ought to help the reader not as fluent in the analyses you have performed distinguish some opinions or hypotheses you are proposing from results that are very strongly supported by the data and analyses.

I very much appreciate the insights of Reviewer 4. The PLOS ONE policy is to accept publications regardless of perceived impact, and so for a genomics manuscript like this, a reviewer may suggest insightful analyses to include but these additions often cannot be required when their inclusion would only increase the work's impact. Reviewer 4 states as much, indicating the concerns raised shouldn't stop the acceptance of the manuscript as long as sufficient explanations can be provided for why suggested analyses cannot be performed (or are too impractical for the authors). That said, I strongly recommend you try to follow through with at least some of these new suggestions

[e.g., is it feasible to take accessory genes for which orthologous sequences are available in the other species to compute dN/dS? This would allow for some interesting comparisons on the nature of selection in the core and accessory genomes. Would you agree that creating some summary and/or figures relevant to the genomic distribution or density of core, accessory, and singleton genes along the reference genome scaffolds could be informative for the structure of the accessory genome and the potential for accessory chromosomes? The suggestion related to host species is interesting too, though it may be that not enough isolates share host species to make this analysis meaningful.]

Other comments emphasize unclear sections, especially related to reference scaffolds missing from the multiple alignments. I know full details are available in cited references, but providing some quick explanations for how procedures from those cited references can result in 'missing scaffolds' in this work is critical for interpreting and conceptualizing the results. I recommend that when addressing comments, some edits, even if short, be made in the manuscript (rather than simply providing a clarification as a response to reviewers that readers may not be likely to see).

For any clarifications, please do not hesitate to contact me.

Please submit your revised manuscript by Oct 01 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Christopher Toomajian

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

Reviewer #4: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

Reviewer #3: Partly

Reviewer #4: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

6. Review Comments to the Author

Reviewer #2: PONE-D-21-02186R1

A large accessory genome and high recombination rates help maintain global distribution and broad host range of the fungal plant pathogen Claviceps purpurea

The authors have addressed all my comments with clear and satisfactory answers. Also, they significantly improved the manuscript by extensively removing and re-writing several parts. After careful reading of this new version, I can recommend it for publication in PLOS ONE.

Reviewer #3: The authors have recomputed the dN/dS ratios for their set of gene orthologs, incorporating sequences from species besides C. purpurea, and have removed the analysis of positive selection based on CodeML site models. Those changes have satisfied the main problem I had identified in the original submission. Those edits have a substantial impact on the overall message of the manuscript, and having gone through the revisions, I have identified some other issues that require changes, detailed below.

Tone of manuscript and amount of evidence for conclusions. The strengths of the manuscript come from its scale and breadth (i.e., pan genome, genome-wide, multiple population genomics analyses). However, that breadth comes at a cost of identifying some interesting correlations between population genome features but not being able to demonstrate any strong evidence for causation in these cases. This explains my choice of Partly to answer the question Do the data support the conclusions? There are several general conclusions or statements in the manuscript that suggest causation but are really not well supported by the data and analyses, and these should be revised to tone them down.

Title - Factors X and Y “help maintain global distribution and broad host range”

Abstract – L26 “likely maintained by high recombination… ”, L28 “likely controlled by frequent recombination”

Intro L98 “likely maintained by”

Certain references to genes under positive selection – see comment below.

Appreciation for low power of whole-gene dN/dS ratio analysis. I have 2 points related to the interpretation of the whole gene dN/dS ratios. 1st) Genes with dN/dS >1 are not necessarily under positive selection, particularly for genes with ratios only slightly greater than 1. Some test would be required to show a particular ratio is significantly greater than 1. It would be better to refer to genes with the ratio >1 as positive selection candidates, or genes with signatures (or evidence) of positive selection (phrases that are used in several places in the manuscript). The following phrases should be revised: L96 “identified genes under positive selection,” L115 “landscape of genes under positive selection”. 2nd) A dN/dS ratio for a whole gene being > 1 is a very stringent criterion for positive selection (that is, at least when it is significantly greater than 1). The literature is full of examples where positive selection is indicated but where the whole gene dN/dS ratio is < 1. That means this analysis has very low power for detected genes with sites under positive selection, and so the absence for evidence of positive selection for a category of genes should not be confused with evidence for the absence of positive selection. When considering factors that can explain no evidence for positive selection on a subset of genes (such as L489-490), the most important one should be this lack of power to detect it.

Minor points:

L112 should also reference [17]

L114 Amino acid cutoff requires some context here to be clear (such as the phrase For the purpose of defining gene models for subsequent analysis…). The appropriate context can only be found in the referenced papers currently.

L162 After filtering for positive selection – this analysis has been removed from the manuscript, correct? Then this sentence needs to go as well.

Gap between L203-204. Something is missing between hotspots and 1000 simulations.

L379 selection of secondary metabolite genes – this result is gone from the manuscript, so this needs to be removed here as well.

L400 Citation of Fig S7 should be changed to Fig S5.

Table S6 lists 36 scaffolds, but the text refers to 37 missing scaffolds.

Reviewer #4: This study explores the pangenome of Claviceps purpurea, as well as the recombination and selection landscapes based on 24 isolates. It is overall a nice study with interesting new results. I understand that this is a revised version of a previous manuscript, and, although I was not part of the previous reviewing process, it seems that the major methodology flaws pointed out were corrected by the authors. A sizeable part of the original analyses has been left out of the current manuscript entirely as the authors state that they do not have access to the computational power they had when the study was first written. As such I have been careful to make only suggestions for which the data should already be generated.

The introduction is clear and sets up the following text well. The methods section is detailed and scripts are provided publicly. The results are clearly explained. I have some concerns with the discussion which is not yet at a level of clarity as excellent as the rest of the manuscript. I also have some questions regarding the pangenome-related analyses.

Main comments:

None of these concerns are a cause for stopping the publication of the manuscript even without any new analyses, if appropriate explanations are provided as to why these analyses would be impossible/not relevant to the study.

Minor comments

As previously noted by the reviewers the discussion contains large section of comparison to Z.tritici and these are not always so relevant that they deserve the word count they get.

L. 403. “While this analysis was only conducted on single-copy core genes, it suggests that some of the unclassified accessory genes (Fig 2 H) may be undergoing similar evolutionary trends.” What exactly is suggesting such a thing? See main comment regarding this.

L. 428. “due to our Illumina based we did not process“ Missing word? Assemblies?

L.442. Typo: “Clavicpes “

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: Yes: Christopher Toomajian

Reviewer #4: No

PLoS One. 2022 Feb 10;17(2):e0263496. doi: 10.1371/journal.pone.0263496.r004

Author response to Decision Letter 1

23 Sep 2021

We thank you for your feedback on our manuscript. We have reviewed your comments and suggested edits and have done our best to address these comments. We have made a number of edits that we believe take into consideration your advice and address the concerns of reviewers.

Reviewer #2: PONE-D-21-02186R1

A large accessory genome and high recombination rates help maintain global distribution and broad host range of the fungal plant pathogen Claviceps purpurea

Thank you for you feedback.

Title - Factors X and Y “help maintain global distribution and broad host range” – The title has been altered

Abstract – L26 “likely maintained by high recombination… ”, L28 “likely controlled by frequent recombination” - These statements have been toned down

Intro L98 “likely maintained by” – This statement has been toned down

Certain references to genes under positive selection – see comment below.

The following phrases should be revised:

L96 “identified genes under positive selection,” – This phrase has been re-worded.

L115 “landscape of genes under positive selection”. – This phrase has been re-worded.

2nd) A dN/dS ratio for a whole gene being > 1 is a very stringent criterion for positive selection (that is, at least when it is significantly greater than 1). The literature is full of examples where positive selection is indicated but where the whole gene dN/dS ratio is < 1. That means this analysis has very low power for detected genes with sites under positive selection, and so the absence for evidence of positive selection for a category of genes should not be confused with evidence for the absence of positive selection. When considering factors that can explain no evidence for positive selection on a subset of genes (such as L489-490), the most important one should be this lack of power to detect it. The reviewers make an excellent point and he have incorporated this into the text. – New lines 504-506

Keep in-text reference citations consistent (all numerical, without publication year). As appropriate, most reference citations have been converted to numerical format, but some hold-overs of the (author, year) style remain. For example, L118, “in Wyka et al. (2021)” should be edited to “in Wyka et al. [17]” or even just “in [17].” – We have altered the references to be more consistent. If a reference was at the start of a sentence we decided to go with Wyka et al. [17], for all other cases we used the numerical format.

Minor points:

L112 should also reference [17] – Reference has been added.

L162 After filtering for positive selection – this analysis has been removed from the manuscript, correct? Then this sentence needs to go as well. – Sentence has been removed

Gap between L203-204. Something is missing between hotspots and 1000 simulations. – The sentence has been corrected.

L379 selection of secondary metabolite genes – this result is gone from the manuscript, so this needs to be removed here as well. – The statement has been removed

L400 Citation of Fig S7 should be changed to Fig S5. – Has been changed

Table S6 lists 36 scaffolds, but the text refers to 37 missing scaffolds. – Has been corrected to 36 in the text.

Main comments:

One concern with the manuscript as it stands now is that the different sections feel very disconnected from each other, both in the results and in the discussion. The study starts with a description of the pangenome. Then the authors go on looking at selection only on core genes and only look at the recombination, once again, in fragments shared by all samples. I did not find an explanation in the manuscript as to why accessory genes found at a sufficiently high frequency could not be analysed for selection for instance and the results between accessory and core genes then compared. – We provided an additional rebuttal to this comment about accessory genes in the following concerns. To answer the question, “why we didn’t include accessory genes in the selection analysis”, we chose not to do so as previously we only included C. purpurea isolates for the dN/dS ratio. This approach was incorrect, so we needed to include other species to provide power to the selection analysis. We did do this by including the remaining species that we had genomes for in the Claviceps genus. Choosing to run the selection on only core genes make sense as the dataset will be consistent, and all genes will have been compared to all other Claviceps species. If we started to look at accessory genes then we would also reduce the number of Claviceps species used in each dN/dS calculation. For example, core genes used alignments from 29 other Claviceps species to calculate dN/dS, however, if we start to look at accessory genes then the number of other Claviceps species will be reduced and thus would not provide a consistent dataset to compare against the core genes.

Many of the comparisons made to other species are made with Z. triciti which has an accessory genome organized in core chromosomes separated from accessory chromosomes. Some of the comparisons would be more relevant if there were hints of a similar organization in C. purpurea. Here, despite the authors basing their study on de novo assemblies of the genomes, the only attempt to understand the distribution in the genome of core and accessory genes is a comment on fragments that are “missing” from the MGA. It is not clear to me what “missing” implies. Does this mean that the fragments are found in only one isolate and that these are not reported by the alignment tool? Are these the contigs containing the singleton genes? If so, the numbers do not match. Where are the other singleton genes? If they are aligning, would this not point to a possible issue with the way the pangenome was defined in the first place? Please clarify this as this could potentially be a problem. – It is difficult to follow the reviewers statement here as there are no direct references and they do not indicate which “numbers do not match up”. From the Multiple Genome Alignment (MGA) we only kept scaffolds (based on the reference genome) that had alignments from all of the remaining 23 genomes. The reference genome contained 191 scaffold and from the final MGA we kept 154 scaffolds. The scaffolds that were not kept contain mostly accessory and singleton genes (as stated in S6_table). However, there are some cases of core genes being present on these scaffolds. The next question will be “Then how are there core genes on scaffolds that are not part of the scaffolds that were kept”. The answer to this question is that the pangenome was conducted with OrthoFinder (as other pangenome papers have also used), which groups the proteins into orthologous groups (OGs). OGs are then classified as “Core” if they contain proteins from all 24 genomes, regardless of whether the OG is paralogous (i.e. some genomes may contain 2 copies of the gene while others contain 1). If the core OG is paralogous for the reference genome or all but 1 genome, then it is entirely possible that the 1+ genomes that contain only 1 copy of the protein will not have a DNA region to align to the scaffold that contains the 2nd copy. This scaffold would then lack all 24 genomes and not be kept from the MGA. Since only a handful of scaffolds were not kept, and contain a large majority of accessory genes and TEs we did not see any fault in the logic of how the pangenome was created.

As for other accessory and singleton genes, the MGA also contain alignments that were either shared by 1-23 of the other genomes and not the reference. Therefore, these alignments were not kept as they did not map to the reference.

The authors hypothesize that these missing contigs are a hint of possible accessory chromosomes. I (like previous reviewers) am not at all convinced by this argument. Indeed, the abundance of reverse transcriptase and other repeat associated genes on these fragments would point more towards assembly artefacts. – We are stating this more as a potential hypothesis, but have made it more clear in the text that assembly artifacts could also be a likely hypothesis.

However, I do believe that understanding the genomic distribution of the core and accessory regions is extremely relevant to the topic of this study and should be attempted. Where are the accessory genes in the de novo assemblies? Are accessory/singleton genes systematically found together on the same contigs? Are they found more often on the same contigs or do they share contigs with core genes? I believe that the data should already be generated by the authors to answer most of these questions without need for new in-depth analyses. – If we had chromosome level genomes I could understand the reviewers comment here about understanding where accessory genes are located, in comparison to core genes. However, without such data we do not believe the analysis would strengthen the paper, but in fact would provide a discussion on results that do not draw any strong conclusions. Since many of the de novo assemblies contained > 1,000 contigs we feel that such an analysis may also lead to inaccurate results as a contig that contains many accessory genes may in fact be flanked by two contigs that contains large amounts of core genes. Without long-read sequencing data we would not be able to accurately examine the true distribution of these genes.

Accessory genomes are frequently hypothesized (and indeed in some cases proven) to play a role in host specificity. I’m surprised that no mention of this was made in the results (perhaps I missed it?). Several of the strains come from the same host species (Hordeum vulgare, for example). Were there accessory genes shared by these strains? Surely, the data is already available easily from the orthogroups table. This also seems like a relevant trait to look for when describing the pangenome of a species with a broad host range, especially as the authors report an enrichment of effectors in the accessory genes. – Single isolates of Claviceps purpurea can infect multiple hosts and this has been shown in several different manuscripts over the years. While many isolates may have been isolated from a single host species, we do not believe that shared genes between that snap-shot of isolation will reveal any correlation to an ability to infect H. vulgare (for example). In addition, additional work that was sparked from this paper looked at population structure of C. purpurea and found that host groups did not show any correlation with population structures (DOI: 10.1002/ece3.7028). To really test this proposed hypothesis we would need to conduct a broad grass inoculation panel of all the C. purpurea strains used in this study to truly identify any lack of infection potential and then conduct a correlation analysis of gene content. Currently, we do not have the means and capability to conduct such inoculation trials as I have graduated and do not have funding / resources.

Minor comments:

As previously noted by the reviewers the discussion contains large section of comparison to Z.tritici and these are not always so relevant that they deserve the word count they get.

L.387: The authors argue that the host distribution of these two species are similar. Z.tritici is strictly a wheat pathogen when the authors state that C. purpurea “has a broad host range of ~ 400 grass species across 8 grass tribes”. Arguing that the hosts are similar enough to lead to a convergent evolution of pangenome size seems to be a stretch. – The statement has been altered to only reference the similar geographical distribution as the factor that may help explain a similarly large accessory genome.

L.407. “Badet et al. suggested that TEs were likely contributing to Z. tritici accessory genome“. In this study, the accessory chromosomes are including non-genic content (based on PacBio genome assemblies), making the TEs having a clear role in expansion of the accessory genome. I am not sure how relevant this comparison is to a pangenome in which only the genes were considered. Perhaps focus this paragraph more on the results obtained in the current study instead of describing in details results from another paper. – From the previous revision we have removed some of the discussion around Z. tritici, however, we have decided to leave this paragraph as it does provide a medium and helps convey the results we obtained from C. purpurea, as it provided a reference point for comparison with another organism.

L. 428. “due to our Illumina based we did not process“ Missing word? Assemblies? – Yes, assemblies was missing.

L. 435-444. This paragraph is about TEs, genome size and gene duplication. The previous one is not, but the one even before is. This makes the reading confusing. Perhaps merge the two related paragraphs together. – We have altered the paragraph orientation to in agreement with the reviewers’ suggestions.

L.442. Typo: “Clavicpes “ - Fixed

Attachment

Submitted filename: 2021-09-22_Reviewers_comments.docx

Click here for additional data file.^{(23.2KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0263496.r005

Decision Letter 2

Christopher Toomajian

11 Oct 2021

PONE-D-21-02186R2A large accessory genome and high recombination rates may influence global distribution and broad host range of the fungal plant pathogen Claviceps purpureaPLOS ONE

Dear Dr. Broders,

Consider this a conditional acceptance. Reviewer #4 has found the revisions acceptable, and only points out that the reviewers should deposit their data in Dryad and update the doi by the time of publication. Reviewer #3 (myself) only found one minor issue of substance - making sure the number of "missing scaffolds" reported in different parts of the paper are all consistent. Beyond that, I discovered a handful of grammatical errors that need to be corrected (and regret that these were not caught in an earlier version). Since PLOS ONE does not copyedit accepted manuscripts, I would urge the authors to take this opportunity to check the whole of the manuscript carefully for any other spelling or grammatical errors that have been missed. The manuscript can be accepted without sending it out to reviewers once the authors have fixed the specific problems and errors listed in this round of review.

Please submit your revised manuscript by Nov 25 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Christopher Toomajian

Academic Editor

PLOS ONE

Journal Requirements:

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #3: (No Response)

Reviewer #4: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #3: Yes

Reviewer #4: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

Reviewer #4: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #3: No

Reviewer #4: Yes

**********

6. Review Comments to the Author

Reviewer #3: The authors have addressed nearly all of my comments from the previous round of review.

One issue I noted was the reported number of "missing scaffolds." The supplemental table lists 36, and the authors have changed the text in 1 place to agree with this, but another place in the text (line 396) the manuscript still lists 37 (and oddly, in their response to reviews, they explain that the reference genome had 191 scaffolds and 154 were kept in the final MGA, which would suggest 37 missing, unless this description is in error). This issue still needs to be resolved completely.

Beyond this, the following minor grammatical issues remain and should be corrected (as PLOS ONE does not copyedit).

Line 399 conversed domains -> conserved domains

Line 403 it still cannot be rule out -> it still cannot be ruled out

Line 403 may be assembly artifact -> either may be an assembly artifact OR may be assembly artifacts

Line 405 these are an important aspects -> these are important aspects OR these are an important aspect

Line 488 lack of power to positive -> lack of power to detect positive

Reviewer #4: The authors answer to my comments were satisfactory. I am happy to recommend the current version of the manuscript for publication in Plos One. (I did notice that the DOI for the dryad dataset in the main text does not lead to any dataset, but I am sure this would have been fixed by the authors at the time of publication.)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: Yes: Christopher Toomajian

Reviewer #4: No

PLoS One. 2022 Feb 10;17(2):e0263496. doi: 10.1371/journal.pone.0263496.r006

Author response to Decision Letter 2

9 Nov 2021

We thank the reviewers for their time and input. We have edited the manuscript one more time for grammar and incorporated the recommendations of the reviewers. We reviewed the data again and determined there are 37 missing scaffolds. One of the scaffolds doesn't contain any genes, so we likely did not picked it up before. We added contig 181 to S6_Table and fixed the 36's back to 37's in the text. The Dryad data has been submitted and is in progress and will be available before publication.

Attachment

Submitted filename: Response to reviewers.docx

Click here for additional data file.^{(11.9KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0263496.r007

Decision Letter 3

Christopher Toomajian

11 Nov 2021

PONE-D-21-02186R3A large accessory genome and high recombination rates may influence global distribution and broad host range of the fungal plant pathogen Claviceps purpureaPLOS ONE

Dear Dr. Broders,

In making grammatical corrections, the authors have deleted 3 sentences in the Discussion (L485) that originally had been added in Revision 2 to address a point I had raised (as Reviewer 3 of Revision 1) – that the detection of natural selection in specific genes using a whole-gene dN/dS ratio of >1 as a criterion was a low power method, that in many cases natural selection may have acted but not caused this ratio to exceed 1. This deletion is not shown in the track-changes version of the manuscript, but can be seen by comparing this revision to the last. Please restore these sentences or something revised along the same lines, to make clear to the readers that the low power of the approach is likely a main reason that evidence of selection was detected only for a few genes (and no predicted effectors).

Additionally, the following grammatical errors (new and old) should be fixed at the same time:

L20 – subject-verb agreement. Change back to need. The subject is inferences, so the verb should remain plural.

L307 – Verb tense. contained -> contain

L416 – insert a ‘the’? – TEs were likely contributing to the Z. tritici accessory genome

L785 - spelling, orthrogroups -> orthogroups (Fig 1 caption)

Please submit your revised manuscript by Dec 26 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Christopher Toomajian

Academic Editor

PLOS ONE

Journal Requirements:

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

PLoS One. 2022 Feb 10;17(2):e0263496. doi: 10.1371/journal.pone.0263496.r008

Author response to Decision Letter 3

13 Nov 2021

Thank you for catching the last remaining error. I believe I was working of a previous version I had on my computer rather than downloading the file from the editorial center. I believe this was the reason the 3 sentences at L485 were omitted. They have been added and I have reviewed the entire document to ensure their were no other omissions. My apologies.

Attachment

Submitted filename: Response to reviewers.docx

Click here for additional data file.^{(11.8KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0263496.r009

Decision Letter 4

Christopher Toomajian

21 Jan 2022

A large accessory genome and high recombination rates may influence global distribution and broad host range of the fungal plant pathogen Claviceps purpurea

PONE-D-21-02186R4

Dear Dr. Broders,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Christopher Toomajian

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0263496.r010

Acceptance letter

Christopher Toomajian

28 Jan 2022

PONE-D-21-02186R4

A large accessory genome and high recombination rates may influence global distribution and broad host range of the fungal plant pathogen Claviceps purpurea

Dear Dr. Broders:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Christopher Toomajian

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Genetic diversity of 24 Claviceps purpurea isolates.

(TIF)

Click here for additional data file.^{(570.2KB, tif)}

S2 Fig. Average protein lengths (aa) of all orthogroups in Claviceps purpurea pangenome.

(TIF)

Click here for additional data file.^{(720.8KB, tif)}

S3 Fig. Distributions of mean non-synonymous (dN) and synonymous (dS) substitution rates of core single-copy orthogroups in Claviceps purpurea.

(TIF)

Click here for additional data file.^{(364.4KB, tif)}

S4 Fig. Estimated population recombination rates of Claviceps purpurea scaffolds.

(TIF)

Click here for additional data file.^{(1.4MB, tif)}

S5 Fig. Distributions of genes and their association (distance and flanking counts) to LTR transposable elements.

(TIF)

Click here for additional data file.^{(330.6KB, tif)}

S6 Fig. Association of genes within recombination hotspots.

(TIF)

Click here for additional data file.^{(366KB, tif)}

S7 Fig. Correlation of recombination rates and omega ratios.

(TIF)

Click here for additional data file.^{(770KB, tif)}

S1 Table. Claviceps purpurea pangenome spreadsheet.

(XLSX)

Click here for additional data file.^{(11MB, xlsx)}

S2 Table. Enrichment of protein domains within pangenome.

(XLSX)

Click here for additional data file.^{(12.6KB, xlsx)}

S3 Table. Enrichment of protein domains within paralogous orthogroups.

(XLSX)

Click here for additional data file.^{(12.9KB, xlsx)}

S4 Table. PAML summarized results.

(XLSX)

Click here for additional data file.^{(101.7KB, xlsx)}

S5 Table. BLAST results of single-copy core orthologs with an ω (dN/dS) ≥ 1.

(XLSX)

Click here for additional data file.^{(12.9KB, xlsx)}

S6 Table. Annotation information of missing reference scaffolds from 24 isolate whole-genome alignment.

(XLSX)

Click here for additional data file.^{(17.8KB, xlsx)}

Attachment

Submitted filename: Reviewers_comments.docx

Click here for additional data file.^{(248.3KB, docx)}

Attachment

Submitted filename: 2021-09-22_Reviewers_comments.docx

Click here for additional data file.^{(23.2KB, docx)}

Attachment

Submitted filename: Response to reviewers.docx

Click here for additional data file.^{(11.9KB, docx)}

Attachment

Submitted filename: Response to reviewers.docx

Click here for additional data file.^{(11.8KB, docx)}

Data Availability Statement

[pone.0263496.ref001] 1.Araki H, Tian D, Goss EM, Jakob K, Halldorsdottir SS, Kreitman M, et al.Presence/absence polymorphism for alternative pathogenicity islands in Pseudomonas viridiflava, a pathogen of Arabidopsis. Pnas. 2006;103:5887–92. doi: 10.1073/pnas.0601431103 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref002] 2.Hartmann FE, Rodríguez de la Vega RC, Brandenburg J-T, Carpentier F, Giraud T. Gene presence–absence polymorphism in castrating Anther-Smut fungi: recent gene gains and phylogeographic structure. Genome Biology and Evolution. 2018;10:1298–314. doi: 10.1093/gbe/evy089 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref003] 3.Brynildsrud O, Gulla S, Feil EJ, Nørstebø SF, Rhodes LD. Identifying copy number variation of the dominant virulence factors msa and p22 within genomes of the fish pathogen Renibacterium salmoninarum. Microbial Genomics 2016;2:e000055 doi: 10.1099/mgen.0.000055 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref004] 4.McInerney JO, McNally A, O’Connell MJ. Why prokaryotes have pangenomes. Nature Microbiology. 2017;2:17040 doi: 10.1038/nmicrobiol.2017.40 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref005] 5.McCarthy CGP, Fitzpatrick DA. Pan-genome analysis of model fungal species. Microbial Genomics. 2019;5:e000243 doi: 10.1099/mgen.0.000243 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref006] 6.Badet T, Oggenfuss U, Abraham L, McDonald BA, Croll D. A 19-isolate reference-quality global pangenome for the fungal wheat pathogen Zymoseptoria tritici. BMC Biology. 2020;18:12. doi: 10.1186/s12915-020-0744-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref007] 7.Sánchez-Vallet A, Fouché S, Fudal I, Hartmann F, Soyer JL, Tellier A, et al. The genome biology of effector gene evolution in filamentous plant pathogens. Annual Review of Phytopathology. 2018;56:21–40. doi: 10.1146/annurev-phyto-080516-035303 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref008] 8.Píchová K, Pažoutová S, Kostovčík M, Chudíčková M, Stodulůkvá E, Novák P, et al. Evolutionary history of ergot with a new infrageneric classification (Hypocreales: Clavicipitaceae: Claviceps). Molecular Phylogenetics and Evolution. 2018;123:73–87. doi: 10.1016/j.ympev.2018.02.013 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref009] 9.Menzies JG, Turkington TK. An overview of the ergot (Claviceps purpurea) issue in western Canada: challenges and solutions. Canadian Journal of Plant Pathology. 2015;37:40–51. [Google Scholar]

[pone.0263496.ref010] 10.Menzies JG, Klein-Gebbinck HW, Gordon A, O’Sullivan DM. Evaluation of Claviceps purpurea isolates on wheat reveals complex virulence and host susceptibility relationships. Canadian Journal of Plant Pathology. 2017;39:307–317. [Google Scholar]

[pone.0263496.ref011] 11.Gordon A, McCartney C, Knox RE, et al. Genetic and transcriptional dissection of resistance to Claviceps purpurea in the durum wheat cultivar Greenshank. Theoretical and Applied Genetics. 2020. doi: 10.1007/s00122-020-03561-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref012] 12.Hinsch J, Vrabka J, Oeser B, Novák O, Galuszka P, Tudzynski P. De novo biosynthesis of cytokinins in the biotrophic fungus Claviceps purpurea. Environmental Microbiology. 2015;17:2935–2951. doi: 10.1111/1462-2920.12838 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref013] 13.Hinsch J, Galuszka P, Tudzynski P. Functional characterization of the first filamentous fungal tRNA-isopentenyltransferase and its role in the virulence of Claviceps purpurea. New Phytologist. 2016;211:980–992. doi: 10.1111/nph.13960 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref014] 14.Oeser B, Kind S, Schurack S, Schmutzer T, Tudzynski P, Hinsch J. Cross-talk of the biotrophic pathogen Claviceps purpurea and its host Secale cereale. BMC Genomics. 2017;18:273. doi: 10.1186/s12864-017-3619-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref015] 15.Kind S, Hinsch J, Vrabka J, Hradilová M, Majeská-Čudejková M, Tudzynski P, et al. Manipulation of cytokinin level in the ergot fungus Claviceps purpurea emphasizes its contribution to virulence. Current Genetics. 2018;64:1303–1319. doi: 10.1007/s00294-018-0847-3 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref016] 16.Kind S, Schurack S, Hinsch J, Tudzynski P. Brachypodium distachyon as alternative model host system for the ergot fungus Claviceps purpurea. Molecular Plant Pathology. 2018;19:1005–1011. doi: 10.1111/mpp.12563 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref017] 17.Wyka SA, Mondo SJ, Liu M, Dettman J, Nalam V, Broders KD. Whole genome comparisons of ergot fungi reveals the divergence and evolution of species within the genus Claviceps are the result of varying mechanisms driving genome evolution and host range expansion. Genome Biology and Evolution. 2021;13:evaa267. doi: 10.1093/gbe/evaa267 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref018] 18.Schardl CL, Young CA, Hesse U, et al. Plant-symbiotic fungi as chemical engineers: multi-genome analysis of the Clavicipitaceae reveals dynamics of alkaloid loci. PLoS Genetics, 2013;9:e1003323. doi: 10.1371/journal.pgen.1003323 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref019] 19.Tudzynski P, Neubauer L. Ergot Alkaloids. In: Martín JF., García-Estrada C., Zeilinger S. (eds) Biosynthesis and Molecular Genetics of Fungal Secondary Metabolites. Fungal Biology. Springer, New York, NY; 2014. doi: 10.1016/j.funbio.2014.06.001 [DOI] [Google Scholar]

[pone.0263496.ref020] 20.Neubauer L, Dopstadt J, Humpf H-U, Tudzynski P. Identification and characterization of the ergochrome gene cluster in the plant pathogenic fungus Claviceps purpurea. Fungal Biology and Biotechnology. 2016;3:2. doi: 10.1186/s40694-016-0020-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref021] 21.Flieger M, Stodůlková E, Wyka SA, et al. Ergochromes: heretofore neglected side of ergot toxicity. Toxins. 2019;11:439. doi: 10.3390/toxins11080439 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref022] 22.Avalos J, Carmen Limon M. Biological roles of fungal carotenoids. Current Genetics. 2015;61:309–324. doi: 10.1007/s00294-014-0454-x [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref023] 23.Pusztahelyi T, Holb IJ, Pócsi I. Secondary metabolites in fungus-plant interactions. Frontiers in Plant Science. 2015;6. doi: 10.3389/fpls.2015.00573 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref024] 24.Wäli PP, Wäli PR, Saikkonen K, Tuomi J. Is the pathogenic ergot fungus a conditional defensive mutualist for its host grass? PLoS ONE. 2013;8:e69249. doi: 10.1371/journal.pone.0069249 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref025] 25.Raybould AF, Gray AJ, Clarke RT. The long-term epidemic of Claviceps purpurea on Spartina anglica in Poole Harbour: pattern of infection, effects on seed production and the role of Fusarium heterosporum. New Phytologist. 1998;138:497–505. [Google Scholar]

[pone.0263496.ref026] 26.Fisher AJ, DiTomaso JM, Gordon TR, Aegerter BJ, Ayres DR. Salt marsh Claviceps purpurea in native and invaded Spartina marshes in Northern California. Plant Disease. 2007;91:380–386. doi: 10.1094/PDIS-91-4-0380 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref027] 27.Wyka SA, Mondo SJ, Liu M, Dettman J, Nalam V, Broders KD. Whole genome comparisons of ergot fungi reveals the divergence and evolution of species within the genus Claviceps are the result of varying mechanisms driving genome evolution and host range expansion. V4, Dryad, Dataset; 2020. Available from: 10.5061/dryad.6hdr7sqxp [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref028] 28.Blin K, Shaw S, Steinke K, Villebro R, Ziemert N, Lee SY, et al. antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic Acids Research. 2019;47:W81–87. doi: 10.1093/nar/gkz310 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref029] 29.Jones P, Binns D, Chang H-Y, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–1240. doi: 10.1093/bioinformatics/btu031 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref030] 30.Wheeler TJ, Eddy SR. nhmmer: DNA homology search with profile HMMs. Bioinformatics. 2013;29:2487–2489. doi: 10.1093/bioinformatics/btt403 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref031] 31.Rawlings ND, Barrett AJ, Thomas PD, Huang X, Bateman A, Finn R. The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database. Nucleic Acids Research. 2018;46:624–632. doi: 10.1093/nar/gkx1134 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref032] 32.Käll L, Krogh A, Sonnhammer EL. Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server. Nucleic Acids Research. 2007;35:W429–32. doi: 10.1093/nar/gkm256 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref033] 33.Nielsen H. Predicting secretory proteins with SignalP In: Protein function prediction (eds. Kihara D). Methods in Molecular Biology 1611 Humana Press, New York, NY; 2017. [Google Scholar]

[pone.0263496.ref034] 34.Krogh A, Larsson B, von Heijne G, Sonnhammer ELL. Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. Journal of Molecular Biology. 2001;305:567–580. doi: 10.1006/jmbi.2000.4315 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref035] 35.Sperschneider J, Dobbs PN, Gardiner DM, Singh KB, Taylor JM. Improved prediction of fungal effector proteins from secretomes with EffectorP 2.0. Molecular Plant Pathology. 2018;19:2094–2110. doi: 10.1111/mpp.12682 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref036] 36.Berriman M, Coghlan A, Tsai IJ. Creation of a comprehensive repeat library for newly sequenced parasitic worm genome. Protocol Exchange. 2018. doi: 101038/protex2018054 [Google Scholar]

[pone.0263496.ref037] 37.Smit AFA, Hubley R, Green P. RepeatMasker Open-40; 2015. Available from: http://wwwrepeatmaskerorg. [Google Scholar]

[pone.0263496.ref038] 38.Hass B. TransposonPSI; 2010. Available from: http://transposonpsisourceforgenet. [Google Scholar]

[pone.0263496.ref039] 39.Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Research. 2007;35:W265–268. doi: 10.1093/nar/gkm286 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref040] 40.Ellinghaus D, Kurtz S, Willhoeft U. LTRharvest, a efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics. 2008;9:18. doi: 10.1186/1471-2105-9-18 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref041] 41.Steinbiss S, Willhoeft U, Gremme G, Kurtz S. Fine-grained annotation and classification of de novo predicted LTR retrotransposons. Nucleic Acids Research. 2009;37:7002–7013. doi: 10.1093/nar/gkp759 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref042] 42.Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–2461. doi: 10.1093/bioinformatics/btq461 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref043] 43.Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref044] 44.Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology. 2019;20:238. doi: 10.1186/s13059-019-1832-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref045] 45.Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 2005;32:1792–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref046] 46.Yang Z, Neilsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Molecular Biology and Evolution. 2000;17:32–34. doi: 10.1093/oxfordjournals.molbev.a026236 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref047] 47.Stukenbrock EH, Dutheil JY. Fine-scale recombination maps of fungal plant pathogens reveal dynamic recombination landscape and intragenic hotspots. Genetics. 2018;208:1209–1229. doi: 10.1534/genetics.117.300502 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref048] 48.Auton A, Myers S, and McVean G. Identifying recombination hotspots using population genetic data. 2014. arXiv: 1403.4264.

[pone.0263496.ref049] 49.Wall JD, Stevison LS. Detecting recombination hotspots from patterns of linkage disequilibrium. G3. 2016;62265–2271. doi: 10.1534/g3.116.029587 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref050] 50.Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research. 2004; 14:708–715. doi: 10.1101/gr.1933104 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref051] 51.Dutheil JY, Gaillard S, Stukenbrock EH. MafFilter: a highly flexible and extensible multiple genome alignment files processor. BMC Genomics. 2014; 15. doi: 10.1186/1471-2164-15-53 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref052] 52.Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28:3326–3328. doi: 10.1093/bioinformatics/bts606 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref053] 53.Stamatakis A. RAxML version 8: a tool for phylogeneitc analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref054] 54.Auton A, McVean G. Recombination rate estimation in the presence of hotspots. Genome Research. 2007;17:1219–1227. doi: 10.1101/gr.6386707 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref055] 55.Gremme G, Steinbiss S, Kurtz S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2013;10:645–656. doi: 10.1109/TCBB.2013.68 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref056] 56.Liu M, Shoukouhi P, Bisson KR, Wyka SA, Broders KD, Menzies JG. Sympatric divergence of the ergot fungus, Claviceps purpurea, populations infecting agricultural and nonagricultural grasses in North America. Ecology and Evolution. 2020;11:273–293. doi: 10.1002/ece3.7028 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref057] 57.Jeffares DC, Tomiczek B, Sojo V, dos Reis M. A beginners guide to estimating the non-synonymous to synonymous rate ratio of all protein-coding genes in a genome. In: Peacock C. (eds) Parasite Genomics Protocols. Methods in Molecular Biology, vol 1201. Humana Press, New York, NY; 2015. [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref058] 58.Morran LT, Schmidt OG, Gelarden IA, Parrish RC 2nd, Lively CM. Running with the Red Queen: host-parasite coevolution selects for biparental sex. Science. 2011;333:216–8. doi: 10.1126/science.1206360 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref059] 59.Peng Z, Oliveria-Garcia E, Lin G, et al. Effector gene reshuffling involves dispensable mini-chromosomes in the wheat blast fungus. PLoS Genetics. 2019;15:e1008272. doi: 10.1371/journal.pgen.1008272 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref060] 60.Croll D, Lendenmann MH, Stewart E, BA MD. The impact of recombination hotspots on genome evolution of a fungal plant pathogen. Genetics. 2015;201:1213–28. doi: 10.1534/genetics.115.180968 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref061] 61.Hartmann FE, Sánchez-Vallet A, McDonald BA, Croll D. A fungal wheat pathogen evolved host specialization by extensive chromosomal rearrangements. ISME Journal. 2017;11:1189–204. doi: 10.1038/ismej.2016.196 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref062] 62.Testa A, Oliver R, Hane J. Overview of genomic and bioinformatic resources for Zymoseptoria tritici. Fungal Genetics and Biology. 2015;79:13–16. doi: 10.1016/j.fgb.2015.04.011 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref063] 63.Galagan JE, Calvo SE, Borkovich KA et al. The genome sequence of the filamentous fungus Neurospora crassa. Nature. 2003;422:859–868. doi: 10.1038/nature01554 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref064] 64.Galagan JE Selker EU. RIP: the evolutionary cost of genome defense. TRENDS in Genetics. 2004;20:417–423. doi: 10.1016/j.tig.2004.07.007 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref065] 65.Urquhart AS, Mondo SJ, Makela MR et al. Genomic and genetic insights into a cosmopolitan fungus, Paecilomyces variotii (Eurotiales). Frontiers in Microbiology. 2018;9:3058. doi: 10.3389/fmicb.2018.03058 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref066] 66.Tiley GP, Burleigh JG. The relationship of recombination rate, genome structure, and patterns of molecular evolution across angiosperms. BMC Evolutionary Biology. 2015;15:194. doi: 10.1186/s12862-015-0473-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref067] 67.Esser K, Tudzynski P. Genetics of the ergot fungus Claviceps purpurea. I. Proof of a monoecious life-cycle and segregation patterns for mycelial morphology and alkaloid production. Theoretical Applied Genetics. 1978;53:145–149. doi: 10.1007/BF00273574 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref068] 68.Amici AM, Minghetti A, Scotti T, Spalla C, Tognoli L. Ergotamine production in submerged culture and physiology of Claviceps purpurea. Applied Microbiology. 1967;15:597–602. doi: 10.1128/am.15.3.597-602.1967 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref069] 69.Fudal I, Ross S, Brun H, Besnard A-L, Ermel M, Kuhn M-L, et al. Repeat-induced point mutation (RIP) as an alternative mechanism of evolution towards virulence in Leptosphaeria maculans. Molecular Plant-Microbe Interactions. 2009;22:932–941. doi: 10.1094/MPMI-22-8-0932 [DOI] [PubMed] [Google Scholar]

[pone.0263496.ref070] 70.Hane JK, Williams AH, Taranto AP, Solomon PS, Oliver RP. Repeat-induced point mutation: a fungal-specific, endogenous mutagenesis process. In: van den Berg MA Maruthachalam K, editors. Genetic transformation systems in fungi. Vol. 2. Springer International Publishing, 2015. p.55–68. [Google Scholar]

[pone.0263496.ref071] 71.Van de Wouw A, Cozijnsen AJ, Hane JK, Brunner PC, McDonald BA, Oliver RP, et al. Evolution of linked avirulence effectors in Leptosphaeria maculans is affected by genomic environment and exposure to resistance genes in host plants. PLoS Pathogens. 2010;6:e1001180. doi: 10.1371/journal.ppat.1001180 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref072] 72.Kan F, Davidson MK, Wahls WP. Meiotic recombination protein Rec12: functional conservations, crossover homeostasis and early crossover/noncrossover decision. Nucleic Acids Research. 2011;39:1460–1472. doi: 10.1093/nar/gkq993 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref073] 73.Grandaubert J, Dutheil JY, Stukenbrock EH. The genomic determinants of adaptive evolution in a fungal pathogen. Evolution Letters. 2019;3:299–312. doi: 10.1002/evl3.117 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref074] 74.Stukenbrock EH, Dutheil JY. Data and scripts for: Fine-Scale Recombination Maps of Fungal Plant Pathogens Reveal Dynamic Recombination Landscapes and Intragenic Hotspots; 2018. Database: GitLab [Internet]. Available from: https://gitlab.gwdg.de/molsysevol/ZtPopRec [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref075] 75.Clay K. Fungal endophytes of grasses: a defensive mutualism between plants and fungi. Ecology. 1988;69:10–16. [Google Scholar]

[pone.0263496.ref076] 76.Song H, Nan Z, Song Q, Xia C, Li X, Yao X, et al. Advances in research on Epichloë endophytes in Chinese native grasses. Frontiers in Microbiology. 2016;7:1399. doi: 10.3389/fmicb.2016.01399 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0263496.ref077] 77.Xia C, Christensen MJ, Zhang X, Nan Z. Effect of Epichloë gansuensis endophyte transgenerational effects on the water use efficiency, nutrient and biomass accumulation of Achnatherum inebrians under soil water deficit. Plant Soil. 2018;424:555–571. [Google Scholar]

[pone.0263496.ref078] 78.Wyka SA. From fields to genomes: A comprehensive understanding of the lifestyle and evolution of Claviceps purpurea the ergot fungus, PhD Dissertation, Department of Agricultural Biology, Colorado State University. 2020. Available from: https://mountainscholar.org/handle/10217/211800

PERMALINK

A large accessory genome and high recombination rates may influence global distribution and broad host range of the fungal plant pathogen Claviceps purpurea

Stephen Wyka

Stephen Mondo

Miao Liu

Vamsi Nalam

Kirk Broders

Roles

Abstract

Introduction

Materials and methods

Genome data

Table 1. Collection and annotation statistics for the 24 Claviceps purpurea genomes used in this study.

Pangenome analysis

Positive selection

Table 2. PAML processing information and filtering of core orthogroups for calculation of dN/dS (ω) ratios.

Genome alignment, SNP calling, and recombination

Table 3. Summary statistics of whole-genome alignment filtering and SNP calls for Claviceps purpurea.

Statistical and enrichment analyses

Results

Pangenome analysis

Fig 1. The pangenome of Claviceps purpurea.

Fig 2. Analysis of predicted protein function across the Claviceps purpurea pangenome.

Selection landscape

Fig 3. Distribution of omega (ω, dN/dS) ratios within the Claviceps purpurea core genome.

Recombination landscape

Fig 4. Population recombination rates of representative scaffolds.

Fig 5. Fine-scale recombination patterns across the Claviceps purpurea genome.

Fig 6. Recombination hotspots predicted in Claviceps purpurea with associated genes and transposable elements (TEs).

Discussion

Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Christopher Toomajian

Roles

Transfer Alert

Author response to Decision Letter 0

Decision Letter 1

Christopher Toomajian

Roles

Author response to Decision Letter 1

Decision Letter 2

Christopher Toomajian

Roles

Author response to Decision Letter 2

Decision Letter 3

Christopher Toomajian

Roles

Author response to Decision Letter 3

Decision Letter 4

Christopher Toomajian

Roles

Acceptance letter

Christopher Toomajian

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases