Skip to main content
EMBO Reports logoLink to EMBO Reports
. 2024 Mar 25;25(5):2306–2322. doi: 10.1038/s44319-024-00122-9

High allelic diversity in Arabidopsis NLRs is associated with distinct genomic features

Chandler A Sutherland 1, Daniil M Prigozhin 2, J Grey Monroe 3, Ksenia V Krasileva 1,
PMCID: PMC11093987  PMID: 38528170

Abstract

Plants rely on Nucleotide-binding, Leucine-rich repeat Receptors (NLRs) for pathogen recognition. Highly variable NLRs (hvNLRs) show remarkable intraspecies diversity, while their low-variability paralogs (non-hvNLRs) are conserved between ecotypes. At a population level, hvNLRs provide new pathogen-recognition specificities, but the association between allelic diversity and genomic and epigenomic features has not been established. Our investigation of NLRs in Arabidopsis Col-0 has revealed that hvNLRs show higher expression, less gene body cytosine methylation, and closer proximity to transposable elements than non-hvNLRs. hvNLRs show elevated synonymous and nonsynonymous nucleotide diversity and are in chromatin states associated with an increased probability of mutation. Diversifying selection maintains variability at a subset of codons of hvNLRs, while purifying selection maintains conservation at non-hvNLRs. How these features are established and maintained, and whether they contribute to the observed diversity of hvNLRs is key to understanding the evolution of plant innate immune receptors.

Keywords: Immunity, Nucleotide-binding Leucine-rich-repeat Receptors (NLRs), Genomic Features, Evolution

Subject terms: Chromatin, Transcription & Genomics; Evolution & Ecology; Immunology

Synopsis

graphic file with name 44319_2024_122_Figa_HTML.jpg

Rapid intraspecies diversification of Arabidopsis NLR immune receptors is associated with distinct genomic and epigenomic features. Both mutational likelihood and selection contribute to amino acid diversity at the pangenome level.

  • Rapidly evolving, highly variable NLRs are more expressed, less methylated, and closer to transposable elements than their conserved paralogs.

  • These genomic features are consistent across tissues and linked to gene cluster status, protein domain class, and phylogenetic distance.

  • Increased likelihood of mutation and diversifying selection generate and maintain diversity at highly variable NLRs.

  • Neighboring highly variable and non-highly variable NLR genes maintain distinct genomic features in close genomic proximity.


Rapid intraspecies diversification of Arabidopsis NLR immune receptors is associated with distinct genomic and epigenomic features. Both mutational likelihood and selection contribute to amino acid diversity at the pangenome level.

graphic file with name 44319_2024_122_Figb_HTML.jpg

Introduction

Plants, lacking the adaptive immune systems of vertebrates, use germline-encoded innate immune receptors to defend against rapidly evolving pathogens. Despite their inability to create antibodies through somatic hypermutation and recombination, plants are protected against pathogens due to population-level receptor diversity (Bakker et al, 2006; Van de Weyer et al, 2019; Karasov et al, 2020; Kahlon and Stam, 2021). Nucleotide-binding, Leucine-rich repeat Receptors (NLRs) are the intracellular sensors of the plant immune system, detecting pathogen-secreted, disease-promoting effector proteins (Jones et al, 2016). NLRs have a modular domain structure, with a variable N-terminal domain involved in downstream signaling, a central nucleotide-binding domain shared by APAF-1, various other plant immune proteins, and CED4 (NBARC), and a leucine-rich repeat (LRR) domain involved in direct or indirect recognition of pathogens (Ngou et al, 2022). NLRs are grouped into three anciently diverged classes based on their N-terminal domains: coiled-coil (CC) NLRs (CNL), RPW8-like coiled-coil NLRs (RNL), and Toll/Interleukin-1 receptor (TIR) NLRs (TNLs) (Shao et al, 2016; Tamborski and Krasileva, 2020). After pathogen recognition, NLRs initiate defense responses through oligomerization of the NBARC domain, leading to transcriptional reprogramming, hormone induction, and hypersensitive cell death response (Ngou et al, 2022).

NLRs exhibit remarkable levels of intraspecies allelic diversity (Van de Weyer et al, 2019), due to both the genomic processes that generate variation and selection that promotes its maintenance (Karasov et al, 2014a; Barragan and Weigel, 2021; Märkle et al, 2022). NLRs are in close proximity to each other in genomes and are organized into clusters more often than other genes. This proximity can asymmetrically drive NLR expansion and diversification through tandem duplication, unequal crossing over, and gene conversion (Parker et al, 1997; Michelmore and Meyers, 1998; Lee and Chae, 2020) as well as accumulation of point mutations (Kuang et al, 2004). Point mutations are a major source of within-species NLR diversity but have been difficult to fully resolve through short-read sequencing approaches. The NLR gene family includes the most polymorphic loci and contains the highest frequency of major effect mutations in the Arabidopsis genome (Clark et al, 2007; Gan et al, 2011). There is evidence for balancing selection maintaining polymorphisms and presence-absence variation at several NLR loci through frequency-dependent selection, spatial and temporal fluctuations in pathogen pressure, and heterozygote advantage (Thrall et al, 2012; Karasov et al, 2014b; MacQueen et al, 2019). Diversifying selection has also been observed at NLR loci as an excess of nonsynonymous to synonymous substitutions (Bakker et al, 2006). The NLR gene and protein sequences within a species represent a snapshot of the ongoing interplay between mutation and selection, but disentangling their relative contributions remains challenging.

Mutation rates are unlikely to evolve on a gene-by-gene basis in response to selection given the barrier imposed by genetic drift (Lynch, 2010). However, selection on genic mutation rates is sufficiently strong when acting on mechanisms that couple mutation rate to expression states and epigenomic features, affecting the mutation rates of many genes simultaneously (Martincorena and Luscombe, 2013). The mutation rate of Arabidopsis is heterogeneous across the genome, consistent with the expected effects of selection on mechanisms linking mutation rates to epigenomic features (Monroe et al, 2022a; Staunton et al, 2023). Several mechanisms have been described, including cytosine methylation, which is positively correlated with mutation probability and known to increase the likelihood of spontaneous deamination (Cao et al, 2011; Weng et al, 2019) and H3K4me1, which is negatively correlated with mutation probability and a target of several DNA repair proteins (preprint: Quiroz et al, 2022). Description of genomic features associated with diversity in NLRs will help to understand the role of mutation bias in NLR evolution.

Recent advances in enrichment-based long-read sequencing of NLRs (Jupe et al, 2013) as well as long-read pan-genomes (Jiao and Schneeberger, 2020) allowed for re-examination of NLR variation within species (Barragan and Weigel, 2021). In Arabidopsis datasets, it has been shown that NLRs are enriched in regions of synteny diversity and that NLR repertoires across species could not be easily anchored to a reference genome (Van de Weyer et al, 2019). Phylogenetic analysis independent of reference-based assignment of pan-NLRomes from 62 Arabidopsis thaliana accessions (Van de Weyer et al, 2019) and 54 Brachypodium distachyon (Gordon et al, 2017) lines allowed for amino acid diversity quantification and delineation of highly variable NLRs (hvNLRs) from their low-variability paralogs (non-hvNLRs) (Prigozhin and Krasileva, 2021). At the species level, hvNLRs show rapid rates of diversification and are hypothesized to act as reservoirs of diversity for recognition of pathogen effectors. Comparison of hv and non-hvNLR gene sets allows for investigation of epigenomic, sequence, and regulatory features (hereafter genomic features) and signatures of selection associated with NLR diversification.

In this paper, we report that hvNLRs show a higher transcription level, less gene body CG methylation, and closer proximity to transposable elements (TEs) than non-hvNLRs. Elevated gene-wide nucleotide diversity, a higher likelihood of mutation, and diversifying selection at a subset of sites promote the high amino acid diversity of hvNLRs, while non-hvNLRs are subject to purifying selection. These findings will serve as a starting point for the investigation of the mechanisms that generate and maintain diversity in a subset of plant immune receptors.

Results

Shannon entropy delineates highly variable NLRs in Arabidopsis Col-0

Shannon entropy, a measure of variability derived from information theory, provides an unbiased metric of amino acid diversity of a protein within a population (Asti et al, 2016; Wang et al, 2017). Here, the Shannon entropy is the sum of the frequency of each amino acid times the logarithm of that frequency at each position in a protein sequence alignment, so sites with low variability have low entropy and highly diverse sites have high entropy. When applied to NLRs, this measure is predictive of highly variable effector binding sites (Prigozhin and Krasileva, 2021) and can be used in NLR ligand binding site engineering (Tamborski et al, 2023). Based on the bimodal distribution of Shannon entropy in NLRome, hvNLRs are defined as proteins with 10 or more amino acid positions with Shannon entropy greater than 1.5 bits (Fig. EV1) (Prigozhin and Krasileva, 2021). To examine the relationships between population-level diversity and genomic features of a single accession, we plotted Shannon entropy in reference to each NLR in Col-0 (Fig. 1). As expected, there are functional hvNLRs and non-hvNLRs, with known direct recognition of effectors corresponding to hvNLRs and known indirect recognition to non-hvNLRs. In addition, hvNLRs include all currently known dangerous mix genes that are responsible for hybrid incompatibility across Arabidopsis accessions (Bomblies et al, 2007; Chae et al, 2014). Categorizing NLRs into low and high-entropy groups allows for pairwise comparison of features between groups and gene set enrichment analysis to compare NLRs to the rest of the genome.

Figure EV1. Arabidopsis Col-0 Shannon entropy at the tenth highest amino acid position.

Figure EV1

Distribution of NLR Shannon entropy per sequence at the tenth highest amino acid position as shown by a histogram with 30 bins. Named NLRs with previous functional characterization are labeled on the graph. The designation of hvNLR is entropy of >1.5 bits at the tenth highest position across the Arabidopsis NLRome, as shown by the dashed line. Source data are available online for this figure.

Figure 1. hvNLRs are defined by high amino acid diversity.

Figure 1

Distribution of mean Shannon entropy per gene calculated in reference to Col-0. NLRs shown by a histogram with 30 bins. Named NLRs with previous functional characterization are labeled. Source data are available online for this figure.

hvNLRs have distinct genomic features from non-hvNLRs and the rest of the genome

To compare the expression and methylation status of hv and non-hvNLR gene sets within an individual plant, we examined available paired whole genome bisulfite and RNA sequencing generated from the same rosette leaf of the reference accession Col-0 (Data ref: Williams et al, 2022a, 2022b). We found that the distribution of hvNLR expression is significantly higher than non-hvNLRs (Fig. 2A, unpaired Wilcoxon rank-sum test, P = 7.9e-05). When we ranked all protein-coding Arabidopsis genes based on their expression level, we observed that hvNLRs are enriched in the most expressed genes in each leaf sample (Fig. 2D, singscore rank-based sample scoring, P < 0.005 for hvNLRs in each biological replicate).

Figure 2. Expression, methylation, and proximity to transposable elements (TEs) distinguish hv and non-hvNLRs in Col-0 rosette tissue.

Figure 2

(AC) Average gene expression log2 (transcripts per million (TPM)), average % CG methylation per gene, and distance to the nearest TE (kbp) of hv and non-hvNLRs. (DF) Normalized mean percentile rank density plots of hv and non-hvNLRs. Data Information: (AC) significance shown is the result of unpaired Wilcoxon rank-sum tests with Benjamini–Hochberg correction for multiple testing. (DF) The significance shown is the result of singscore rank-based sample scoring or permutation test for difference in means. For description of the statistical test, see “Methods”. “ns“ indicates a P value > 0.05; * indicates a P value < 0.05 and ≥0.01; ** indicates a P value < 0.01 and ≥0.001; *** indicates a P value < 0.001. n refers to the number of NLR genes tested. Source data are available online for this figure.

In addition, the hvNLR gene set is significantly less CG gene body methylated than non-hvNLRs (Fig. 2B, unpaired Wilcoxon rank-sum test, P = 1.2e-04). We noticed two hvNLRs, RPP4 and RPP7, with higher CG methylation than the average for hvNLRs (Fig. 2B). Upon further inspection, we also found CHH and CHG context methylation within the gene bodies of RPP4 and RPP7, which we rarely observed in other NLRs (Fig. EV2A–C; median NLR CHH methylation = 0.29%, median NLR CHG methylation = 0.30%). Multi-context gene body methylation (CG, CHH, and CHG) is typically used to silence nearby or overlapping transposable elements (Quadrana et al, 2016). This indicates that the elevated CG methylation of RPP4 and RPP7 is likely due to multi-context silencing related to a recent TE insertion, which has been previously reported (Tsuchiya and Eulgem, 2013). hvNLRs are enriched in the CG hypomethylated genes across the genome (permutation test for difference in means, P = 0.002, n = 10,000 replicates). Gene set analysis of methylation can be biased due to the uneven distribution of CG sites within each gene (Geeleher et al, 2013). We repeated our permutation test to compare hvNLRs to a set of non-NLR genes with similar measured CG sites per gene to correct for this bias, and hvNLRs were significantly more hypomethylated than the rest of the genome (Fig. 2E, P < 0.05 for each biological replicate, n = 10,000).

Figure EV2. Multi-context methylation of two NLR clusters likely due to recent TE insertion.

Figure EV2

(A, B) Integrative Genomics Viewer screenshot of methylation, RNAseq coverage, and TE proximity of the RPP4 and RPP7 clusters in rosette leaf tissue. (C) % CHG and CHH gene body methylation of non-hv and hvNLRs in rosette leaf tissue. Named RPP4 and RPP7 cluster members are labeled. Data Information: (C) for both comparisons, n = 97 non-hvNLRs and n = 35 hvNLRs, with n referring to the number of genes tested. Significance shown is the result of unpaired Wilcoxon rank-sum tests with Benjamini–Hochberg correction for multiple testing. “n.s.” indicates a P value > 0.05. Source data are available online for this figure.

We also found that hvNLRs as a set are closer to TEs (Fig. 2C, unpaired Wilcoxon rank-sum test, P = 1.3e-05), and hvNLRs are enriched in the genes closest to TEs (Fig. 2F, permutation test for difference in medians, P = 0, n = 10,000 replicates). In Col-0, hvNLRs have a median TE distance of 0 kbp, meaning the TEs are within the UTR or intronic sequences, while non-hvNLRs have a median TE distance of 2.07 kbp. The highly variable status of NLRs is predictive of TEs within the genic sequence (Fisher’s exact test, P = 3.6e-05). It has been previously observed that TEs are associated with plant immune genes (Hosaka and Kakutani, 2018), but this analysis suggests that the signal is driven by hvNLRs.

Genomic features are robust to cluster status, domain class, and phylogenetic distance

NLRs are found in clusters more frequently than other genes, with clusters defined as a maximum distance of 50 kb between adjacent NLRs (Lee and Chae, 2020), and NLRs belong to three anciently diverged classes named for their N-terminal domains (Fig. 3B; Shao et al, 2016; Tamborski and Krasileva, 2020). While there is not a significant association between the highly variable status of NLRs with cluster membership (Fisher’s exact test, P = 1) or N-terminal domain clade (Fisher’s exact test, P = 0.21), we wanted to examine both factors as potential confounders of our feature associations. We therefore subset our data by cluster status and N-terminal domain and repeated hv and non-hvNLR pairwise comparisons (Fig. 3A) and examined the distribution of features across the phylogeny of Col-0 NLRs (Fig. 3B). The distribution of hvNLR expression is significantly higher than non-hvNLRs, and the distribution of hvNLR %CG methylation is significantly lower than non-hvNLRs within each subset (Fig. 3A, unpaired Wilcoxon rank-sum tests, corrected for multiple hypothesis testing). TE distance is significantly different in clustered NLRs, CNLs, and TNLs, but not singletons. hvNLRs are distributed over the phylogeny of CNLs and TNLs and maintain distinct genomic features despite close phylogenetic relationships with non-hvNLRs (Fig. 3B).

Figure 3. Cluster membership, NLR type, and phylogenetic distance do not explain genomic differences between hv and non-hvNLRs.

Figure 3

(A) Comparison of expression, distance to nearest TE, and CG gene body methylation of hv and non-hvNLRs subset by cluster membership and N-term domain type. (B) Features mapped onto a phylogeny of NLRs in A. thaliana Col-0. NLRs without log2(TPM) or %CG methylation data were determined to be unmappable (see “Methods”). Named NLRs with previous functional characterization are labeled. Data Information: (A) pairwise significance shown is the result of an unpaired Wilcoxon rank-sum test with Benjamini–Hochberg correction for multiple testing. Across subset significance shown is the result of a Kruskal–Wallis test with Benjamini–Hochberg correction for multiple testing. “ns” indicates a P value > 0.05; * indicates a P value <0.05 and ≥0.01; ** indicates a P value < 0.01 and ≥0.001; *** indicates a P value < 0.001. n refers to the number of NLR genes tested in each subset. Source data are available online for this figure.

If cluster status or domain class were confounding variables, we would expect statistically different distributions across hvNLR subsets and non-hvNLR subsets. However, the distributions are not significantly different across all features for both hvNLRs and non-hvNLRs (Fig. 3A; Kruskal–Wallis rank-sum tests corrected for multiple hypothesis testing). This analysis shows that the differences observed between hvNLRs and non-hvNLRs are consistent across the NLR phylogeny and are not driven by cluster membership.

hvNLRs are more expressed and less methylated than non-hvNLRs across tissues

Since our investigation of genomic features of NLRs is so far exclusive to rosette leaf tissue, we were curious if our observed trends held across other tissues. We employed an additional dataset that sampled RNA from 30 tissues of Arabidopsis Col-0 across several developmental stages for a total of 52 samples (Data ref: Mergner et al, 2020a, 2020b). As a gene set, hvNLRs are more highly expressed than non-hvNLRs in 46 of the 52 samples examined (Fig. 4A; unpaired Wilcoxon rank-sum tests corrected for multiple testing). We see the significantly greater expression of hvNLRs than non-hvNLRs in several reproductive tissues, including all sampled stages of flower development and silique development, fruit tissue, and the majority of embryo samples (Fig. 4A). As previously reported (Munch et al, 2018), we observe that NLRs are more expressed in vegetative tissues than reproductive tissues, and more expressed in shoot tissue than root tissue (Fig. 4B). There is no significant difference between the two groups in embryo developmental stage 6, dry or imbibed seed tissue, or any of the three root tissues.

Figure 4. Expression and methylation features are conserved across tissue types and developmental stages.

Figure 4

(A) Pairwise comparison of hv vs non-hvNLR average gene expression in 52 tissue and developmental stage samples. Each comparison includes n = 97 non-hvNLRs and n = 35 hvNLRs, with n referring to the number of genes tested. (B) Heatmap of hv and non-hvNLR expression in log2 (transcripts per million (TPM)). Dendrograms show the result of hierarchical clustering across tissues and within hv and non-hvNLR gene sets. (C) %CG gene body methylation of hv and non-hvNLRs in four tissue types. Each comparison shows n = 95 non-hvNLRs and n = 32 hvNLRs genes, with n referring to the number of genes tested. Data Information: (A, C) significance shown is the result of unpaired Wilcoxon rank-sum tests with Benjamini–Hochberg correction for multiple testing. “ns” indicates a P value > 0.05; * indicates a P value < 0.05 and ≥0.01; ** indicates a P value < 0.01 and ≥0.001; *** indicates a P value < 0.001. Source data are available online for this figure.

While CG methylation varies less between tissue types than expression (Seymour et al, 2014; Lloyd and Lister, 2022), we also compared hv and non-hvNLR methylation in cauline leaves, flower buds, embryos, and a distinct rosette leaf sample (Data ref: Williams et al, 2022a, 2022b). non-hvNLRs are more CG methylated than hvNLRs in all tissues examined (Fig. 4C, unpaired Wilcoxon rank-sum tests corrected for multiple testing). Therefore, we conclude that in Col-0, hv-associated genomic features are conserved across most tissue types, including in reproductive organs.

non-hvNLRs are subject to purifying selection

The high level of amino acid diversity in hvNLRs and associated differences in genomic features might be due to differences in mutational processes and/or selection. To investigate the contribution of balancing selection to the observed amino acid diversity at hvNLRs, we calculated Tajima’s D (D) and nucleotide diversity per site (π) in each domain and across the gene body of hv and non-hvNLRs. D is a site frequency spectrum-based statistic that tests for selection by comparing the difference between the average number of nucleotide differences and the total number of segregating sites to the neutral expectation, while π measures the degree of polymorphism within a population by the average pairwise differences per site. In comparison to the rest of the genome, these statistics can be used to test for balancing selection (Schmid et al, 2005). hvNLRs have higher D than non-hvNLRs across the coding sequence and all individual domains (Fig. 5A; unpaired Wilcoxon rank-sum test corrected for multiple testing). Reflecting their differences in amino acid diversity, hvNLRs have higher π than non-hvNLRs across all domains and the coding sequence (Fig. 5A; unpaired Wilcoxon rank-sum test corrected for multiple testing). The difference in π and D between the two groups is not driven exclusively by variation in the LRR region, with the highest values reported for the hvNLR NBARC domains.

Figure 5. hvNLRs have higher Tajima’s D and nucleotide diversity than non-hvNLRs.

Figure 5

(A) D and π calculated across the coding sequence (CDS), coiled-coil (CC), Toll/Interleukin-1 (TIR), nucleotide-binding (NBARC), and leucine-rich repeat (LRR) domains. (B) CDS π vs. D. Gray lines represent the kernel density estimation of statistics computed on all coding sequences of Arabidopsis. Dashed lines represent the 95th percentile of the empirical distribution. Data Information: (A) horizontal black lines denote median values within each box; boxes range from the 25th to 75th percentile of each group’s distribution of values; whiskers extend no further than 1.5× the interquantile range of the hinge. Data beyond the end of the whiskers are outlying points and are plotted individually. n refers to the number of genes tested. Significance shown is the result of unpaired Wilcoxon rank-sum tests with Benjamini–Hochberg correction for multiple testing. * Indicates a P value < 0.05 and ≥0.01; ** indicates a P value < 0.01 and ≥0.001; *** indicates a P value < 0.001. Source data are available online for this figure.

Due to the demographic history of Arabidopsis, the empirical distribution of summary statistics departs from the neutral model (Nordborg et al, 2005; Schmid et al, 2005; Alonso-Blanco et al, 2016). We calculated the genome-wide values of D and π to test for selection, using whole genome SNP information from the accessions used to create the pan-NLRome present in 1001 genomes (Fig. EV3A,B; Alonso-Blanco et al, 2016). Both hv and non-hvNLRs have higher average π than the empirical distribution (Fig. 5B; permutation test for difference in means, P = 0; P = 0, n = 10,000 replicates), and there are significantly more NLRs in the top 5% of the empirical distribution than expected by chance (Fig. EV3B; permutation test for number in the top 5%, P = 0, n = 10,000 replicates). This corroborates previously reported significantly high levels of nucleotide diversity of NLRs (Bakker et al, 2006; Van de Weyer et al, 2019).

Figure EV3. Comparison of NLR population genetics statistics to the empirical distribution; per NLR domain πNS.

Figure EV3

(A, B) Empirical distribution of Tajima’s D and π calculated on coding sequences of Arabidopsis shown as a histogram with 50 bins. The position of hv and non-hvNLRs shown via rug plot beneath the histogram, as well as the 5th and 95th percentiles of the distribution. (C) πNS calculated per domain. Data Information: (C) horizontal black lines denote median values within each box; boxes range from the 25th to 75th percentile of each group’s distribution of values; whiskers extend no further than 1.5× the interquantile range of the hinge. Data beyond the end of the whiskers are outlying points and are plotted individually. Significance shown is the result of an unpaired Wilcoxon rank-sum test with Benjamini–Hochberg correction for multiple testing. “n.s.” indicates a P value > 0.05. Source data are available online for this figure.

hvNLRs have a higher D, and non-hvNLRs have a lower D than the genome average (Figs. 5B and  EV3A; permutation test for difference in means, P = 9.0e-04; P = 0, n = 10,000 replicates). There are no hvNLRs in either tail of the empirical distribution of D, which is not significantly different from the 3.45 expected by chance. There is, however, an excess of non-hvNLRs in the bottom 5% of the distribution of D (permutation test for number in the bottom 5%, P = 0, n = 10,000 replicates), indicating that purifying selection may be reducing diversity at non-hvNLRs. Defining individual genes under balancing selection to be the top 5% of the empirical distribution of π and D values (Bakker et al, 2006; preprint: Gladieux et al, 2022), we identified one non-hvNLR under balancing selection, AT5G47260 (Fig. 5B). However, one gene is not significantly different from the number of NLRs expected to be in the top 5% of both distributions by random chance.

hvNLRs are subject to diversifying selection and an increased probability of mutation

To further investigate the nature of the high nucleotide diversity of NLRs, we compared nucleotide diversity at synonymous and nonsynonymous sites (πS; πN). hvNLRs have greater πS and πN than non-hvNLRs (Fig. 6A; unpaired Wilcoxon rank-sum tests corrected for multiple testing P = 1.2e-12, P = 3.3e-15). However, the ratio of nonsynonymous to synonymous nucleotide diversity (πNS), an intraspecies measurement of selection, is not significantly different between the two groups, indicating possible role of different mutational processes (Fig. 6B; unpaired Wilcoxon rank-sum test, P = 0.17). Average πNS is less than 1 for both hv and non-hvNLRs across the gene and in the LRR region, indicating purifying selection as an excess of synonymous polymorphisms relative to nonsynonymous polymorphisms (Figs. 6B and EV3C).

Figure 6. hvNLR nucleotide diversity is associated with a high likelihood of mutation and codons under diversifying selection.

Figure 6

(A) Average nonsynonymous pairwise nucleotide diversity per site (πN) and average synonymous pairwise nucleotide diversity per site (πS). (B) πS vs πN of the coding sequence of NLRs with per group linear regressions. The neutral expectation, πS = πN, is shown as a gray dotted line. (C) Mutation probability score of hv and non-hvNLRs. (D) Percentage of codons under positive selection determined by MEME (episodic), and FEL (pervasive). Data Information: For all comparisons, n = 97 for non-hvNLRs and n = 35 for hvNLRs, with n referring to the number of genes tested. The significance shown is the result of unpaired Wilcoxon rank-sum tests, with *** indicating a P value < 0.001. Source data are available online for this figure.

Since elevated πN and πS with no difference in πNS could be caused by an increase in the mutation rate of hvNLRs, we compared the predicted SNVs and indels per base pair based on epigenomic states (mutation probability score) (Data ref: Monroe et al, 2022a, 2022b). The median mutation probability is 35% higher for hvNLRs (Fig. 6C; unpaired Wilcoxon rank-sum test, P = 2.5e-05).

Gene-wide πNS is a conservative metric for testing positive selection because positive selection may only be acting at a few codon sites (Kosakovsky Pond and Frost, 2005). Therefore, we used maximum likelihood-based site models to test for positive, diversifying selection. The use of these dN/dS-based models on intraspecies data is problematic because the nucleotide differences do not represent substitutions fixed by selection, but rather polymorphisms segregating within a population (Kryazhimskiy and Plotkin, 2008). We mitigated this effect by restricting our analysis to internal branches of the protein phylogeny, which encompass at least one ancestral sequence that is visible to selection (Pond et al, 2006; Avanzato et al, 2019). hvNLRs have a higher proportion of codons under pervasive and episodic diversifying selection than non-hvNLRs, indicating that diversifying selection at a subset of sites is maintaining diversity at hvNLRs (Fig. 6D, unpaired Wilcoxon rank-sum tests). Given the polymorphism data, summary statistics, and mutational likelihood, hvNLR amino acid diversity appears to be driven by both a higher likelihood of mutation and positive, diversifying selection, while non-hvNLR conservation is maintained by purifying selection.

Distinct genomic features and signatures of selection can persist in close proximity

As described previously, hv and non-hvNLRs can co-exist within the same cluster and even as neighboring genes (Fig. EV2A,B). We chose to examine the RSG2 cluster, which includes neighboring non-hvNLR RSG2 and hvNLR AT5G43740, two CNLs of similar length 1.8 kb apart, as a representative example of neighboring NLRs. The hvNLR is highly expressed, hypomethylated, and has a TE within its 5’ UTR sequence (Fig. 7A). The non-hvNLR shows signatures of purifying selection with a gene-wide Tajima’s D value of −1.9, while the hvNLR has a gene-wide Tajima’s D of −0.24 (Fig. 7B,C). The hvNLR has higher π, πN, and πS, but the two genes have similar πNS values (0.48 and 0.41) (Fig. 7B,C). Despite neighboring genomic positions, RSG2 and AT5G43740 show distinct genomic features and signatures of selection reflective of their species-level amino acid diversity. There are two other examples of neighboring, paired hv and non-hvNLRs in Col-0, cAT1G63350, and cAT5G38340, and both broadly follow our observed differences between hv and non-hvNLRs (Fig. EV4). Therefore, we conclude that genomic features that distinguish hvNLR and non-hvNLRs can persist in close proximity, and likely are not driven by broader genome states but may instead be related to function and evolutionary speed.

Figure 7. An example of neighboring NLRs retaining distinct genomic and epigenomic features.

Figure 7

(A) Methylation, RNAseq coverage, and TE proximity of neighboring non-hvNLR RSG2 and hvNLR AT5G43740. Methylation and RNAseq coverage shown for rosette leaf tissue. (B, C) Tajima’s D and nucleotide diversity across the coding sequence of RSG2 and AT5G43740. Statistics were calculated on 300 bp windows with a step size of 75 bp and plotted at the nucleotide midpoint. Source data are available online for this figure.

Figure EV4. Intra-cluster comparison of neighboring hv and non-hvNLRs.

Figure EV4

Described genomic features and population genetics statistics are shown for hv and non-hvNLR pairs in three clusters: RSG2, cAT1G63350, and cAT5G38340. The median of all hv and non-hvNLR values is shown by a grey dotted line. Expression (log2 (transcripts per million (TPM))) and % CG methylation shown is from rosette leaf tissue. Source data are available online for this figure.

Discussion

The high allelic diversity of NLRs has long been appreciated, though the mechanisms that generate and maintain this diversity have remained difficult to disentangle. Taking advantage of Shannon entropy and available long-read sequencing datasets, we can delineate rapidly and slowly diversifying NLRs and begin to investigate these mechanisms through gene set comparison. Our results show that rapidly evolving NLRs have distinct genomic features from their conserved paralogs and the rest of the genome. Specifically, we found that hvNLRs are more expressed, less methylated, and closer to TEs than non-hvNLRs. Interestingly, hvNLRs are enriched across the genome in highly expressed genes, hypomethylated genes, and genes closest to TEs, while non-hvNLRs are uniformly dispersed among other genes. Differences in expression and methylation are observed across all tested shoot tissues and several important developmental and reproductive tissues, including embryo, flower, and shoot apical meristem.

Since we observed distinct genomic features between hv and non-hvNLRs, we investigated the possibility of increased mutation rate in hvNLRs through examination of nucleotide diversity and mutation probability. Synonymous substitutions are under reduced selection compared to nonsynonymous substitutions because they do not alter the amino acid sequence but are not invisible to selection due to codon bias, GC-biased gene conversion, and RNA folding stability (Martincorena et al, 2012; James et al, 2017; Wei, 2020). πS is therefore an imperfect measurement of mutation rate, but an elevated mutation rate of hvNLRs could result in increased πS and πN relative to non-hvNLRs, but not influence the πNS ratio, as we report here (Bromham et al, 2013). We also find that hvNLRs are maintained in chromatin states associated with a higher mutation probability per base pair relative to non-hvNLRs, leading to the hypothesis that locally high mutation rate at hvNLRs contributes to the observed amino acid diversity. However, high depth quantification of de novo mutations at NLRs before selection is required to evaluate this hypothesis.

The distinct genomic features between the two NLR groups may point to mechanisms of increased mutation rate. Transcription is a source of genomic instability through the exposure of vulnerable single-stranded DNA, which is countered by targeting DNA repair machinery to actively transcribed genes through the stalling RNA polymerase or histone marks associated with actively transcribed genes (Oztas et al, 2018; Preprint: Quiroz et al, 2022). If the high transcription of hvNLRs is not accompanied by targeted DNA repair, this would result in an increased probability of mutation (Staunton et al, 2023). Methylated cytosines increase the likelihood of mutation by increasing the frequency of spontaneous deamination of cytosines (Xia et al, 2012; Weng et al, 2019; Monroe et al, 2022a). However, in Arabidopsis, gene body CG methylation is found preferentially in the exons of conserved, constitutively transcribed housekeeping genes, and gene body CG methylation is associated with lower polymorphism than unmethylated genes across accessions (Gaut et al, 2011; He et al, 2022; Kenchanmane Raju et al, 2023). The CG gene body methylation of non-hvNLRs may therefore be related to their low diversity through some unknown mechanism. TEs generate large effect mutations (Quadrana et al, 2019) and alter the methylation and expression landscape of surrounding genes. hvNLRs are closer to TEs and more likely to have them within their genic sequence than non-hvNLRs, and this likely contributes to hvNLR diversification. Since highly variable-associated genomic features persist in germline tissues, any associated mutational likelihood is evolutionarily relevant.

Once generated, nucleotide diversity can be actively maintained by diversifying or balancing selection, or passively accumulate in the absence of selection. We do not observe any difference in diversifying selection between hv and non-hvNLRs using the πNS metric, but hvNLRs have a significantly higher proportion of codons under pervasive and episodic diversifying selection. While hvNLRs have higher Tajima’s D values than the genome average and non-hvNLRs, they are not present in the tails of the genome-wide distribution. The 5th and 95th percentiles of the empirical distribution are conservative cutoffs, and it is possible for a locus under weak selection to not be in the tail of the empirical distribution. Therefore, balancing selection may play a role in promoting hvNLR diversity but cannot be distinguished from evolution under relaxed selection using this criterion. non-hvNLRs, however, have a strong signature of purifying selection, which helps to explain their low amino acid diversity relative to hvNLRs.

Given the heterogeneous mutation rate across the Arabidopsis genome, it is tempting to speculate that the distinctive genomic features we observed in hvNLRs may be related to their allelic diversity. Alternatively, there might be a selection of specific features on non-hvNLRs to enhance DNA repair and inhibit other diversity-generation activities facilitating their maintenance. Our findings serve as a starting point for the investigation of the mechanisms that promote diversity generation in a subset of the plant immune receptors.

Methods

Reagents and tools table

Reagent/resource Reference or source Identifier or catalog number
Software
   entropy v1.3.1

https://strimmerlab.github.io/software/entropy/

Hausser and Strimmer, 2009

   Trim Galore! v0.6.6

https://github.com/FelixKrueger/TrimGalore

Babraham Bioinformatics

   Bismark v0.23.0

https://github.com/FelixKrueger/Bismark

Krueger and Andrews, 2011

   STAR v2.7.10a

https://github.com/alexdobin/STAR

Dobin et al, 2013

   HTSeq v2.0.2

https://github.com/htseq/htseq

Putri et al, 2022

   Polyester v1.2.0

https://github.com/alyssafrazee/polyester

Frazee et al, 2015

   singscore v1.22.0

https://bioconductor.org/packages/release/bioc/html/singscore.html

Foroutan et al, 2018

   PAL2NAL v14

https://www.bork.embl.de/pal2nal/

Suyama et al, 2006

   EggLib v3.1.0

https://www.egglib.org/

Siol et al, 2022

   LRRpredictor v1

https://github.com/eliza-m/LRRpredictor_v1

Martin et al, 2020

   FEL

https://hyphy.org/

Kosakovsky Pond and Frost, 2005

   MEME

https://hyphy.org/

Murrell et al, 2012

   vcftools v0.1.17

https://vcftools.github.io/index.html

Danecek et al, 2011

Shannon entropy

Entropy was calculated as described previously (Prigozhin and Krasileva, 2021) using the R package entropy v1.3.1 (Hausser and Strimmer, 2009), except entropy was calculated per Col-0 sequence as opposed to across the clade alignment. hvNLR and non-hvNLRs are defined as described previously (Prigozhin and Krasileva, 2021).

DNA methylation analysis

To examine the methylation and expression of NLRs in rosette leaf tissue, we used available matched bisulfite and RNA sequencing from split Col-0 leaves (Data ref: Williams et al, 2022a, 2022b). Reads were trimmed using Trim Galore! v0.6.6 with a Phred score cutoff of 20 and Illumina adapter sequences, with a maximum trimming error rate 0.1 (Babraham Bioinformatics). Using Bismark v0.23.0, reads were mapped to the Araport11 genome, PCR duplicates were removed, and percent methylation at each cytosine was determined using the methylation extraction function (Krueger and Andrews, 2011). Cytosines with at least five reads were used for analysis, and the symmetrical cytosines within CG base pairs were averaged (Williams et al, 2022a, 2022b). The percent methylation of each CG site was averaged across each NLR gene, and across four biological replicates weighted by number of cytosines with sufficient coverage. Five hvNLR genes did not have sufficient coverage at any cytosines and were excluded from rosette leaf analysis (AT1G58807, AT1G58848, AT1G59124, AT1G59218, and AT4G26090). Single sample methylation enrichment of hvNLRs and non-hvNLRs was performed using custom permutation tests weighted for similar CG sites per gene. RPP4 and RPP7 were excluded from enrichment analysis due to their multi-context methylation.

Gene expression analysis

RNAseq reads from four matched leaf samples (explained above) were mapped to the Araport11 genome using STAR v2.7.10a (Dobin et al, 2013) and were counted using HTSeq-count v2.0.2 (Putri et al, 2022). Counts were converted to transcripts per million (TPM) and averaged across four biological replicates, then log2(TPM + 1) transformed for visualization. NLRs are repetitive and often similar, making them difficult to sequence with short reads. To determine if any NLRs were unmappable, RNAseq reads were simulated using Polyester v1.2.0 (Frazee et al, 2015). Four NLRs were determined to be unmappable due to zero assigned read counts and were excluded from rosette tissue expression analysis (AT1G58807, AT1G58848, AT1G59124, and AT1G59218). Single sample gene set enrichment of hvNLRs and non-hvNLRs was performed on each replicate using singscore (Foroutan et al, 2018).

Multi-tissue analysis

To examine methylation across tissues, we used available bisulfite sequencing of cauline leaf, embryo, flower bud, and distinct rosette leaf samples and processed as described above (Data ref: Williams et al, 2022a, 2022b). Two non-hvNLRs (AT4G26090 and AT5G48770) and three hvNLRs (AT1G58807, AT1G59124, and AT1G59218) did not have sufficient coverage at any cytosines and were excluded from the analysis. To examine NLR expression across tissues, we downloaded per-gene count data for 52 tissues of Arabidopsis Col-0 (Data ref: Mergner et al, 2020a, 2020b). No NLRs were determined to be unmappable with the described RNA sequencing conditions. We converted per-gene counts to TPM and log2 (TPM + 1) transformed for visualization.

TE analysis

We determined distance to transposable elements based on the TE annotation file TAIR10_Transposable_Elements.txt and gene annotation file TAIR10_GFF3_genes.gff available from arabidopsis.org.

Subgroup comparison

Clustered NLRs are defined as NLRs within 50 kb of an adjacent NLR (Lee and Chae, 2020). Since we were focused on physical proximity, we did not include exclusively sequence similarity-based clusters. The phylogenetic tree of all NLRs in Col-0 was generated as described previously (Prigozhin and Krasileva, 2021) with feature annotations using iTOL. Paired hvNLR and non-hvNLR neighbors were identified by 1) clusters with two NLRs, one hv and one non-hv 2) NLRs were either directly next to each other or within 2 kb of each other.

Population genetics analysis

Protein alignments for each NLR clade were generated as described previously (Prigozhin and Krasileva, 2021) and converted to codon alignments using PAL2NAL v14 (Suyama et al, 2006). The population genetics statistics of NLRs were calculated using EggLib v3.1.0 (Siol et al, 2022). Domain-specific statistics were calculated on subsets of codon alignments using majority vote across annotations. NBARC, TIR, and CC annotations were collected from previous work (Van de Weyer et al, 2019), and LRR annotations were determined using LRRpredictor (Martin et al, 2020). Sliding window analysis was performed using 300 base pair windows with a 75 base pair step. Sites under pervasive diversifying selection were identified using FEL (Kosakovsky Pond and Frost, 2005), and sites under episodic diversifying selection were identified using MEME (Murrell et al, 2012) using the internal branches of the phylogeny (Pond et al, 2006; Avanzato et al, 2019). Empirical distributions of population genetics statistics of coding sequences were calculated from the all sites 1001 Genomes VCF subset to the accessions used to generate the NLRome long-read dataset using vcftools v0.1.17 (Danecek et al, 2011; Alonso-Blanco et al, 2016; Van de Weyer et al, 2019).

Supplementary information

Source data Fig. 1 (2.7KB, zip)
Source data Fig. 2 (9.7KB, zip)
Source data Fig. 3 (50.5KB, zip)
Source data Fig. 4 (175.9KB, zip)
Source data Fig. 5 (21.6KB, zip)
Source data Fig. 6 (11.2KB, zip)
Source data Fig. 7 (290.9MB, zip)
Figure EV Source Data (291.6MB, zip)
Peer Review File (725KB, pdf)
Expanded View Figures (2.2MB, pdf)

Acknowledgements

The authors are grateful to the Krasileva Lab for the critical reading of the manuscript, and Ben Williams for his advice and comments. The summary figure was created with BioRender.com. This research used the Savio computational cluster resource provided by the Berkeley Research Computing program at the University of California, Berkeley (supported by the UC Berkeley Chancellor, Vice Chancellor for Research, and Chief Information Officer). Chandler A Sutherland has been supported by the Grace Kase-Tsujimoto Graduate Fellowship. Ksenia V Krasileva is funded by NIH Director’s Award (1DP2AT011967-01), Gordon and Betty Moore Inventor Fellowship (grant number: 8802), and the Innovative Genomics Institute.

Expanded view

Author contributions

Chandler A Sutherland: Conceptualization; Data curation; Formal analysis; Validation; Visualization; Methodology; Writing—original draft. Daniil M Prigozhin: Conceptualization; Data curation; Formal analysis; Supervision; Investigation; Methodology; Writing—review and editing. J Grey Monroe: Data curation; Formal analysis; Writing—review and editing. Ksenia V Krasileva: Conceptualization; Resources; Supervision; Project administration; Writing—review and editing.

Data availability

No primary datasets have been generated and deposited. The intermediate datasets and computer code produced in this study are available in the following databases: Data processing pipelines and figure generation code: GitHub (https://github.com/chandlersutherland/nlr_features); Intermediate datasets: Zenodo Public Repository (10.5281/zenodo.10530531) and the accompanying source data files.

Disclosure and competing interests statement

The authors declare no competing interests.

Supplementary information

Expanded view data, supplementary information, appendices are available for this paper at 10.1038/s44319-024-00122-9.

References

  1. Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt KM, Cao J, Chae E, Dezwaan TM, Ding W, et al. 1135 Genomes Reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell. 2016;166:481–491. doi: 10.1016/j.cell.2016.05.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Asti L, Uguzzoni G, Marcatili P, Pagnani A. Maximum-entropy models of sequenced immune repertoires predict antigen-antibody affinity. PLoS Comput Biol. 2016;12:e1004870. doi: 10.1371/journal.pcbi.1004870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Avanzato VA, Oguntuyo KY, Escalera-Zamudio M, Gutierrez B, Golden M, Kosakovsky Pond SL, Pryce R, Walter TS, Seow J, Doores KJ, et al. A structural basis for antibody-mediated neutralization of Nipah virus reveals a site of vulnerability at the fusion glycoprotein apex. Proc Natl Acad Sci USA. 2019;116:25057–25067. doi: 10.1073/pnas.1912503116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bakker EG, Toomajian C, Kreitman M, Bergelson J. A genome-wide survey of R gene polymorphisms in Arabidopsis. Plant Cell. 2006;18:1803–1818. doi: 10.1105/tpc.106.042614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Barragan AC, Weigel D. Plant NLR diversity: the known unknowns of pan-NLRomes. Plant Cell. 2021;33:814–831. doi: 10.1093/plcell/koaa002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bomblies K, Lempe J, Epple P, Warthmann N, Lanz C, Dangl JL, Weigel D. Autoimmune response as a mechanism for a Dobzhansky-Muller-type incompatibility syndrome in plants. PLoS Biol. 2007;5:e236. doi: 10.1371/journal.pbio.0050236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bromham L, Cowman PF, Lanfear R. Parasitic plants have increased rates of molecular evolution across all three genomes. BMC Evol Biol. 2013;13:126. doi: 10.1186/1471-2148-13-126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cao J, Schneeberger K, Ossowski S, Günther T, Bender S, Fitz J, Koenig D, Lanz C, Stegle O, Lippert C, et al. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet. 2011;43:956–963. doi: 10.1038/ng.911. [DOI] [PubMed] [Google Scholar]
  9. Chae E, Bomblies K, Kim S-T, Karelina D, Zaidem M, Ossowski S, Martín-Pizarro C, Laitinen RAE, Rowan BA, Tenenboim H, et al. Species-wide genetic incompatibility analysis identifies immune genes as hot spots of deleterious epistasis. Cell. 2014;159:1341–1351. doi: 10.1016/j.cell.2014.10.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT, Fu G, Hinds DA, et al. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science. 2007;317:338–342. doi: 10.1126/science.1138632. [DOI] [PubMed] [Google Scholar]
  11. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Foroutan M, Bhuva DD, Lyu R, Horan K, Cursons J, Davis MJ. Single sample scoring of molecular phenotypes. BMC Bioinformatics. 2018;19:404. doi: 10.1186/s12859-018-2435-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Frazee AC, Jaffe AE, Langmead B, Leek JT. Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics. 2015;31:2778–2784. doi: 10.1093/bioinformatics/btv272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, Lyngsoe R, Schultheiss SJ, Osborne EJ, Sreedharan VT, et al. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature. 2011;477:419–423. doi: 10.1038/nature10414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Gaut B, Yang L, Takuno S, Eguiarte LE. The patterns and causes of variation in plant nucleotide substitution rates. Annu Rev Ecol Evol Syst. 2011;42:245–266. doi: 10.1146/annurev-ecolsys-102710-145119. [DOI] [Google Scholar]
  17. Geeleher P, Hartnett L, Egan LJ, Golden A, Raja Ali RA, Seoighe C. Gene-set analysis is severely biased when applied to genome-wide methylation data. Bioinformatics. 2013;29:1851–1857. doi: 10.1093/bioinformatics/btt311. [DOI] [PubMed] [Google Scholar]
  18. Gladieux P, Oosterhout C van, Fairhead S, Jouet A, Ortiz D, Ravel S, Shrestha R-K, Frouin J, He X, Zhu Y et al (2022) Extensive immune receptor repertoire diversity in disease-resistant rice landraces. Preprint at Biorxiv 10.1101/2022.12.05.519081 [DOI] [PubMed]
  19. Gordon SP, Contreras-Moreira B, Woods DP, Des Marais DL, Burgess D, Shu S, Stritt C, Roulin AC, Schackwitz W, Tyler L, et al. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat Commun. 2017;8:2184. doi: 10.1038/s41467-017-02292-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hausser J & Strimmer K (2009) Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks. J Mach Learn Res 10:1469–1484
  21. He L, Huang H, Bradai M, Zhao C, You Y, Ma J, Zhao L, Lozano-Durán R, Zhu J-K. DNA methylation-free Arabidopsis reveals crucial roles of DNA methylation in regulating gene expression and development. Nat Commun. 2022;13:1335. doi: 10.1038/s41467-022-28940-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hosaka A, Kakutani T. Transposable elements, genome evolution and transgenerational epigenetic variation. Curr Opin Genet Dev. 2018;49:43–48. doi: 10.1016/j.gde.2018.02.012. [DOI] [PubMed] [Google Scholar]
  23. James J, Castellano D, Eyre-Walker A. DNA sequence diversity and the efficiency of natural selection in animal mitochondrial DNA. Heredity. 2017;118:88–95. doi: 10.1038/hdy.2016.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Jiao W-B, Schneeberger K. Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics. Nat Commun. 2020;11:989. doi: 10.1038/s41467-020-14779-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Jones JDG, Vance RE, Dangl JL. Intracellular innate immune surveillance devices in plants and animals. Science. 2016;354:aaf6395. doi: 10.1126/science.aaf6395. [DOI] [PubMed] [Google Scholar]
  26. Jupe F, Witek K, Verweij W, Sliwka J, Pritchard L, Etherington GJ, Maclean D, Cock PJ, Leggett RM, Bryan GJ, et al. Resistance gene enrichment sequencing (RenSeq) enables reannotation of the NB-LRR gene family from sequenced plant genomes and rapid mapping of resistance loci in segregating populations. Plant J Cell Mol Biol. 2013;76:530–544. doi: 10.1111/tpj.12307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Kahlon PS, Stam R. Polymorphisms in plants to restrict losses to pathogens: from gene family expansions to complex network evolution. Curr Opin Plant Biol. 2021;62:102040. doi: 10.1016/j.pbi.2021.102040. [DOI] [PubMed] [Google Scholar]
  28. Karasov TL, Horton MW, Bergelson J. Genomic variability as a driver of plant–pathogen coevolution? Curr Opin Plant Biol. 2014;18:24–30. doi: 10.1016/j.pbi.2013.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Karasov TL, Kniskern JM, Gao L, DeYoung BJ, Ding J, Dubiella U, Lastra RO, Nallu S, Roux F, Innes RW, et al. The long-term maintenance of a resistance polymorphism through diffuse interactions. Nature. 2014;512:436–440. doi: 10.1038/nature13439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Karasov TL, Shirsekar G, Schwab R, Weigel D. What natural variation can teach us about resistance durability. Curr Opin Plant Biol. 2020;56:89–98. doi: 10.1016/j.pbi.2020.04.010. [DOI] [PubMed] [Google Scholar]
  31. Kenchanmane Raju SK, Lensink M, Kliebenstein DJ, Niederhuth C, Monroe G. Epigenomic divergence correlates with sequence polymorphism in Arabidopsis paralogs. New Phytol. 2023;240:1292–1304. doi: 10.1111/nph.19227. [DOI] [PubMed] [Google Scholar]
  32. Kosakovsky Pond SL, Frost SDW. Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol Biol Evol. 2005;22:1208–1222. doi: 10.1093/molbev/msi105. [DOI] [PubMed] [Google Scholar]
  33. Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 2011;27:1571–1572. doi: 10.1093/bioinformatics/btr167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Kryazhimskiy S, Plotkin JB. The population genetics of dN/dS. PLoS Genet. 2008;4:e1000304. doi: 10.1371/journal.pgen.1000304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Kuang H, Woo S-S, Meyers BC, Nevo E, Michelmore RW. Multiple genetic processes result in heterogeneous rates of evolution within the major cluster disease resistance genes in lettuce. Plant Cell. 2004;16:2870–2894. doi: 10.1105/tpc.104.025502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Lee RRQ, Chae E. Variation patterns of NLR clusters in Arabidopsis thaliana genomes. Plant Commun. 2020;1:100089. doi: 10.1016/j.xplc.2020.100089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Lloyd JPB, Lister R. Epigenome plasticity in plants. Nat Rev Genet. 2022;23:55–68. doi: 10.1038/s41576-021-00407-y. [DOI] [PubMed] [Google Scholar]
  38. Lynch M. Evolution of the mutation rate. Trends Genet TIG. 2010;26:345–352. doi: 10.1016/j.tig.2010.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. MacQueen A, Tian D, Chang W, Holub E, Kreitman M, Bergelson J. Population genetics of the highly polymorphic RPP8 gene family. Genes. 2019;10:691. doi: 10.3390/genes10090691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Märkle H, Saur IML, Stam R. Evolution of resistance (R) gene specificity. Essays Biochem. 2022;66:551–560. doi: 10.1042/EBC20210077. [DOI] [PubMed] [Google Scholar]
  41. Martin EC, Sukarta OCA, Spiridon L, Grigore LG, Constantinescu V, Tacutu R, Goverse A, Petrescu A-J. LRRpredictor—a new LRR motif detection method for irregular motifs of plant NLR proteins using an ensemble of classifiers. Genes. 2020;11:286. doi: 10.3390/genes11030286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Martincorena I, Luscombe NM. Non-random mutation: the evolution of targeted hypermutation and hypomutation. BioEssays. 2013;35:123–130. doi: 10.1002/bies.201200150. [DOI] [PubMed] [Google Scholar]
  43. Martincorena I, Seshasayee ASN, Luscombe NM. Evidence of non-random mutation rates suggests an evolutionary risk management strategy. Nature. 2012;485:95–98. doi: 10.1038/nature10995. [DOI] [PubMed] [Google Scholar]
  44. Mergner J, Frejno M, List M, Papacek M, Chen X, Chaudhary A, Samaras P, Richter S, Shikata H, Messerer M, et al. Mass-spectrometry-based draft of the Arabidopsis proteome. Nature. 2020;579:409–414. doi: 10.1038/s41586-020-2094-2. [DOI] [PubMed] [Google Scholar]
  45. Mergner J, Frejno M, List M, Papacek M, Chen X, Chaudhary A, Samaras P, Richter S, Shikata H, Messerer M et al (2020b) ArrayExpress E-MTAB-7978. https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-7978 [DOI] [PubMed]
  46. Michelmore RW, Meyers BC. Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process. Genome Res. 1998;8:1113–1130. doi: 10.1101/gr.8.11.1113. [DOI] [PubMed] [Google Scholar]
  47. Monroe JG, Srikant T, Carbonell-Bejerano P, Becker C, Lensink M, Exposito-Alonso M, Klein M, Hildebrandt J, Neumann M, Kliebenstein D et al (2022a) Mutation bias reflects natural selection in Arabidopsis thaliana. Nature 602:101–105 [DOI] [PMC free article] [PubMed]
  48. Monroe JG, Srikant T, Carbonell-Bejerano P, Becker C, Lensink M, Exposito-Alonso M, Klein M, Hildebrandt J, Neumann M, Kliebenstein D et al (2022b) Arabidopsis Col-0 gene-level mutation probability score. https://github.com/greymonroe/mutation_bias_analysis/blob/main/data/gene_level_data.csv
  49. Munch D, Gupta V, Bachmann A, Busch W, Kelly S, Mun T, Andersen SU. The Brassicaceae family displays divergent, shoot-skewed NLR resistance gene expression. Plant Physiol. 2018;176:1598–1609. doi: 10.1104/pp.17.01606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Murrell B, Wertheim JO, Moola S, Weighill T, Scheffler K, Pond SLK. Detecting individual sites subject to episodic diversifying selection. PLoS Genet. 2012;8:e1002764. doi: 10.1371/journal.pgen.1002764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Ngou BPM, Ding P, Jones JDG. Thirty years of resistance: Zig-zag through the plant immune system. Plant Cell. 2022;34:1447–1478. doi: 10.1093/plcell/koac041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Nordborg M, Hu TT, Ishino Y, Jhaveri J, Toomajian C, Zheng H, Bakker E, Calabrese P, Gladstone J, Goyal R, et al. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 2005;3:e196. doi: 10.1371/journal.pbio.0030196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Oztas O, Selby CP, Sancar A, Adebali O. Genome-wide excision repair in Arabidopsis is coupled to transcription and reflects circadian gene expression patterns. Nat Commun. 2018;9:1503. doi: 10.1038/s41467-018-03922-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Parker JE, Coleman MJ, Szabò V, Frost LN, Schmidt R, van der Biezen EA, Moores T, Dean C, Daniels MJ, Jones JD. The Arabidopsis downy mildew resistance gene RPP5 shares similarity to the toll and interleukin-1 receptors with N and L6. Plant Cell. 1997;9:879–894. doi: 10.1105/tpc.9.6.879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Pond SLK, Frost SDW, Grossman Z, Gravenor MB, Richman DD, Brown AJL. Adaptation to different human populations by HIV-1 revealed by codon-based analyses. PLOS Comput Biol. 2006;2:e62. doi: 10.1371/journal.pcbi.0020062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Prigozhin DM, Krasileva KV. Analysis of intraspecies diversity reveals a subset of highly variable plant immune receptors and predicts their binding sites. Plant Cell. 2021;33:998–1015. doi: 10.1093/plcell/koab013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Putri GH, Anders S, Pyl PT, Pimanda JE, Zanini F. Analysing high-throughput sequencing data in Python with HTSeq 2.0. Bioinformatics. 2022;38:2943–2945. doi: 10.1093/bioinformatics/btac166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Quadrana L, Bortolini Silveira A, Mayhew GF, LeBlanc C, Martienssen RA, Jeddeloh JA, Colot V. The Arabidopsis thaliana mobilome and its impact at the species level. eLife. 2016;5:e15716. doi: 10.7554/eLife.15716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Quadrana L, Etcheverry M, Gilly A, Caillieux E, Madoui M-A, Guy J, Bortolini Silveira A, Engelen S, Baillet V, Wincker P, et al. Transposition favors the generation of large effect mutations that may facilitate rapid adaption. Nat Commun. 2019;10:3421. doi: 10.1038/s41467-019-11385-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Quiroz D, Lopez-Mateos D, Zhao K, Pierce A, Ortega L, Ali A, Carbonell-Bejerano P, Yarov-Yarovoy V, Monroe JG (2022) The H3K4me1 histone mark recruits DNA repair to functionally constrained genomic regions in plants. Preprint at Biorxiv 10.1101/2022.05.28.493846
  61. Schmid KJ, Ramos-Onsins S, Ringys-Beckstein H, Weisshaar B, Mitchell-Olds T. A multilocus sequence survey in Arabidopsis thaliana reveals a genome-wide departure from a neutral model of DNA sequence polymorphism. Genetics. 2005;169:1601–1615. doi: 10.1534/genetics.104.033795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Seymour DK, Koenig D, Hagmann J, Becker C, Weigel D. Evolution of DNA methylation patterns in the Brassicaceae is driven by differences in genome organization. PLoS Genet. 2014;10:e1004785. doi: 10.1371/journal.pgen.1004785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Shao Z-Q, Xue J-Y, Wu P, Zhang Y-M, Wu Y, Hang Y-Y, Wang B, Chen J-Q. Large-scale analyses of angiosperm nucleotide-binding site-leucine-rich repeat genes reveal three anciently diverged classes with distinct evolutionary patterns. Plant Physiol. 2016;170:2095–2109. doi: 10.1104/pp.15.01487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Siol M, Coudoux T, Ravel S, De Mita S. EggLib 3: a python package for population genetics and genomics. Mol Ecol Resour. 2022;22:3176–3187. doi: 10.1111/1755-0998.13672. [DOI] [PubMed] [Google Scholar]
  65. Staunton PM, Peters AJ, Seoighe C. Somatic mutations inferred from RNA-seq data highlight the contribution of replication timing to mutation rate variation in a model plant. Genetics. 2023;225:iyad128. doi: 10.1093/genetics/iyad128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34:W609–W612. doi: 10.1093/nar/gkl315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Tamborski J, Krasileva KV. Evolution of plant NLRs: from natural history to precise modifications. Annu Rev Plant Biol. 2020;71:355–378. doi: 10.1146/annurev-arplant-081519-035901. [DOI] [PubMed] [Google Scholar]
  68. Tamborski J, Seong K, Liu F, Staskawicz BJ, Krasileva KV. Altering specificity and autoactivity of plant immune receptors Sr33 and Sr50 via a rational engineering approach. Mol Plant-Microbe Interactions®. 2023;36:434–446. doi: 10.1094/MPMI-07-22-0154-R. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Thrall PH, Laine A-L, Ravensdale M, Nemri A, Dodds PN, Barrett LG, Burdon JJ. Rapid genetic change underpins antagonistic coevolution in a natural host-pathogen metapopulation. Ecol Lett. 2012;15:425–435. doi: 10.1111/j.1461-0248.2012.01749.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Tsuchiya T, Eulgem T. An alternative polyadenylation mechanism coopted to the Arabidopsis RPP7 gene through intronic retrotransposon domestication. Proc Natl Acad Sci USA. 2013;110:E3535–E3543. doi: 10.1073/pnas.1312545110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Van de Weyer A-L, Monteiro F, Furzer OJ, Nishimura MT, Cevik V, Witek K, Jones JDG, Dangl JL, Weigel D, Bemm F. A species-wide inventory of NLR genes and alleles in Arabidopsis thaliana. Cell. 2019;178:1260–1272.e14. doi: 10.1016/j.cell.2019.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Wang L, Whittemore K, Johnston SA, Stafford P. Entropy is a simple measure of the antibody profile and is an indicator of health status: a proof of concept. Sci Rep. 2017;7:18060. doi: 10.1038/s41598-017-18469-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Wei L. Selection on synonymous mutations revealed by 1135 genomes of Arabidopsis thaliana. Evol Bioinforma Online. 2020;16:1176934320916794. doi: 10.1177/1176934320916794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Weng M-L, Becker C, Hildebrandt J, Neumann M, Rutter MT, Shaw RG, Weigel D, Fenster CB. Fine-Grained analysis of spontaneous mutation spectrum and frequency in Arabidopsis thaliana. Genetics. 2019;211:703–714. doi: 10.1534/genetics.118.301721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Williams BP, Bechen LL, Pohlmann DA, Gehring M. Somatic DNA demethylation generates tissue-specific methylation states and impacts flowering time. Plant Cell. 2022;34:1189–1206. doi: 10.1093/plcell/koab319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Williams BP, Bechen LL, Pohlmann DA, Gehring M (2022b) Gene Expression Omnibus GSE191307. https://www-ncbi-nlm-nih-gov.libproxy.berkeley.edu/geo/query/acc.cgi?acc=GSE191307
  77. Xia J, Han L, Zhao Z. Investigating the relationship of DNA methylation with mutation rate and allele frequency in the human genome. BMC Genomics. 2012;13:S7. doi: 10.1186/1471-2164-13-S8-S7. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Source data Fig. 1 (2.7KB, zip)
Source data Fig. 2 (9.7KB, zip)
Source data Fig. 3 (50.5KB, zip)
Source data Fig. 4 (175.9KB, zip)
Source data Fig. 5 (21.6KB, zip)
Source data Fig. 6 (11.2KB, zip)
Source data Fig. 7 (290.9MB, zip)
Figure EV Source Data (291.6MB, zip)
Peer Review File (725KB, pdf)
Expanded View Figures (2.2MB, pdf)

Data Availability Statement

No primary datasets have been generated and deposited. The intermediate datasets and computer code produced in this study are available in the following databases: Data processing pipelines and figure generation code: GitHub (https://github.com/chandlersutherland/nlr_features); Intermediate datasets: Zenodo Public Repository (10.5281/zenodo.10530531) and the accompanying source data files.


Articles from EMBO Reports are provided here courtesy of Nature Publishing Group

RESOURCES