Skip to main content
eLife logoLink to eLife
. 2019 Jul 29;8:e47891. doi: 10.7554/eLife.47891

Epimutations are associated with CHROMOMETHYLASE 3-induced de novo DNA methylation

Jered M Wendte 1,, Yinwen Zhang 2,, Lexiang Ji 2,, Xiuling Shi 1, Rashmi R Hazarika 3, Yadollah Shahryary 3, Frank Johannes 3,4, Robert J Schmitz 1,
Editors: David Baulcombe5, Detlef Weigel6
PMCID: PMC6663294  PMID: 31356150

Abstract

In many plant species, a subset of transcribed genes are characterized by strictly CG-context DNA methylation, referred to as gene body methylation (gbM). The mechanisms that establish gbM are unclear, yet flowering plant species naturally without gbM lack the DNA methyltransferase, CMT3, which maintains CHG (H = A, C, or T) and not CG methylation at constitutive heterochromatin. Here, we identify the mechanistic basis for gbM establishment by expressing CMT3 in a species naturally lacking CMT3. CMT3 expression reconstituted gbM through a progression of de novo CHG methylation on expressed genes, followed by the accumulation of CG methylation that could be inherited even following loss of the CMT3 transgene. Thus, gbM likely originates from the simultaneous targeting of loci by pathways that promote euchromatin and heterochromatin, which primes genes for the formation of stably inherited epimutations in the form of CG DNA methylation.

Research organism: A. thaliana

Introduction

Heritable gains or losses of DNA methylation, or epimutations, can have important phenotypic consequences. Examples include peloric mutants of toadflax, where differential methylation of a single transcription factor can change flowers from bilaterally to radially symmetrical and the colorless non-ripening locus of tomato, where differential methylation affects fruit ripening (Cubas et al., 1999; Manning et al., 2006). Despite the potential implications for phenotypic change, little is known about how epimutations form.

One form of genic DNA methylation that may provide clues to the mechanisms of epimutation is gene body methylation (gbM), which is found on a subset of expressed genes in many eukaryotic genomes, including most flowering plants (Bewick et al., 2017; Bewick and Schmitz, 2017; Cokus et al., 2008; Feng et al., 2010; Huff and Zilberman, 2014; Lister et al., 2008; Niederhuth et al., 2016; Regulski et al., 2013; Seymour et al., 2014; Takuno and Gaut, 2013; Takuno et al., 2016; Tran et al., 2005; Wang et al., 2015; Zemach et al., 2010; Zhang et al., 2006; Zilberman et al., 2007). GbM has been functionally implicated in certain processes, including transcriptional regulation, transcript processing, and suppression of intragenic transcripts (Bewick and Schmitz, 2017; Choi et al., 2019; Lorincz et al., 2004; Maunakea et al., 2010; Regulski et al., 2013; Vial-Pradel et al., 2018; Zilberman et al., 2008; Zilberman et al., 2007). However, natural and experimental losses of gbM have also been documented with no obvious effects on expression or chromatin structure (Bewick et al., 2016; Bewick et al., 2019; Niederhuth et al., 2016; Zhang et al., 2006). One hypothesis that may reconcile these seemingly conflicting results explains the formation of gbM as a passive process resulting from transient localization of the proteins that maintain heterochromatin to genic space (i.e. euchromatin) (Bewick et al., 2016; Bewick and Schmitz, 2017; Inagaki and Kakutani, 2012; Teixeira and Colot, 2009; Wendte and Schmitz, 2018). Here, gbM is proposed to arise passively, but, once established, is maintained due to the preferential recruitment of maintenance methyltransferases to previously methylated sites. Under this line of reasoning, gbM may have functional consequences in some cases, but can also be non-functional, which could explain experimental findings. A better understanding of the mechanistic basis for the establishment of gbM will provide important insights into its possible functions, the mechanisms underlying epimutations, as well as to factors that might influence DNA methyltransferase mistargeting in disease states (Wendte and Schmitz, 2018).

In plants, most knowledge of DNA methylation is derived from studies of Arabidopsis thaliana, where cytosines in different sequence contexts are the preferred substrates of distinct pathways. The RNA directed DNA methylation (RdDM) pathway acts to de novo methylate cytosines in all sequence contexts (Matzke and Mosher, 2014). In RdDM, DOMAINS REARRANGED METHYLTRANSFERASE 2 (DRM2) is targeted to chromatin via non-coding RNAs produced by two plant-specific multi-subunit RNA polymerases (Haag and Pikaard, 2011; Wendte and Pikaard, 2017; Zhou and Law, 2015). Following de novo establishment, methylation of symmetrical CG cytosines is maintained by METHYLTRANSFERASE 1 (MET1), which is recruited to hemi-methylated CG sites to methylate the complementary, unmethylated CG sites (Finnegan et al., 1996; Ronemus et al., 1996; Woo et al., 2008; Woo et al., 2007). Symmetrical CHG (H = A, T, or C) cytosine methylation is maintained by CHROMOMETHYLASE 3 (CMT3) (Bartee et al., 2001; Lindroth et al., 2001; Papa et al., 2001). Among CHG sites, CMT3 shows a marked preference for CWG (W = A or T) cytosines relative to CCG cytosines (Gouil and Baulcombe, 2016; Stoddard et al., 2019). An additional CHROMOMETHYLASE, CMT2, targets the methylation of CHH cytosines (Stroud et al., 2014; Zemach et al., 2013). In A. thaliana, CHH sites targeted by CMT2 can be distinguished from those targeted by RdDM, as they show an enrichment in CWA methylation relative to other CHH contexts, in contrast to RdDM target regions which show no preferred site enrichment (Gouil and Baulcombe, 2016).

CMT2 and CMT3 participate in a self-reinforcing feedback loop with an additional heterochromatin modification, histone H3 lysine nine di-methylation (H3K9me2) (Du et al., 2015; Du et al., 2012; Stoddard et al., 2019; Stroud et al., 2014). CMT2 and CMT3 are both thought to depend on direct physical binding to H3K9me2 for targeting to chromatin and methylation (Du et al., 2012; Stoddard et al., 2019; Stroud et al., 2014). DNA methylation can be physically bound by the H3K9 methyltransferases, which reinforces the co-localization of H3K9me2 and CMT-dependent DNA methylation (Bernatavichute et al., 2008; Du et al., 2014; Du et al., 2015; Johnson et al., 2007; Li et al., 2018). This co-dependency results in losses of CMT-dependent DNA methylation in H3K9 methyltransferase mutants, as well as losses of H3K9me2 in cmt mutants (Du et al., 2015; Jackson et al., 2002; Malagnac et al., 2002; Mathieu et al., 2005; Soppe et al., 2002; Stroud et al., 2014; Stroud et al., 2013; Tariq et al., 2003).

GbM is restricted to CG context cytosines (Bewick and Schmitz, 2017; Cokus et al., 2008; Lister et al., 2008; Niederhuth et al., 2016; Takuno et al., 2016; Tran et al., 2005), and therefore dependent on MET1 (Cokus et al., 2008; Lister et al., 2008). However, complementation of met1 mutants with wild-type MET1 fails to restore gbM, presumably because hemi-methylated CG cytosines required for MET1 recruitment have been lost (Bewick et al., 2016; Reinders et al., 2009). Thus, the mechanisms required for the establishment of gbM are unclear.

Recent comparative analyses have identified Angiosperm plant species lacking gbM (Bewick et al., 2016; Niederhuth et al., 2016). Concurrent with the loss of gbM is the loss of the gene encoding CMT3, which has led to the hypothesis that CMT3 is required for the initial establishment of gbM (Bewick et al., 2016; Bewick et al., 2017; Bewick and Schmitz, 2017). The lack of immediate restoration of gbM in MET1 complemented met1 mutants has led to the proposition that CMT3 is only rarely localized to gene bodies such that gbM only accumulates slowly over evolutionary time scales (Bewick et al., 2016; Bewick and Schmitz, 2017; Inagaki and Kakutani, 2012; Wendte and Schmitz, 2018). Direct experimental testing of this model is difficult, and it is also currently unclear whether CMT3 can be active at regions with no pre-existing DNA or H3K9 methylation. Furthermore, the favored activity of ZMET2, the maize ortholog of CMT3, is a maintenance methyltransferase of hemi-methylated cytosines in the CHG sequence context (Stoddard et al., 2019), and, thus, a de novo activity that results in methylation of CG cytosines in vivo is unprecedented.

To gain insights into this process and provide a direct test of a role for CMT3 in gbM, we heterologously expressed A. thaliana CMT3 (AtCMT3) in a plant species that has lost both CMT3 and gbM, Eutrema salsugineum (Bewick et al., 2016). Eutrema salsugineum, like A. thaliana, is a member of the Brassicaceae family and diverged from a common ancestor with A. thaliana ~47 million years ago (Arias et al., 2014). Expression of AtCMT3 in E. salsugineum resulted in gains of CHG methylation over repetitive sequences characterized by the presence of H3K9me2, as predicted based on the known mechanism of CMT3 targeting. However, AtCMT3 expressing lines also exhibited ectopic CHG methylation over a subset of genes in an AtCMT3-expression dependent manner. Genes that gained CHG methylation were orthologs of A. thaliana gbM genes and had no prior CHG methylation or detectable H3K9 methylation, suggesting de novo methylation activity of CMT3 in vivo. Unexpectedly, gains of CHG methylation did not result in stable accumulation of H3K9 methylation over gene bodies or expression changes, showing that the genic CHG methylation was uncoupled from transcriptional silencing, similar to gbM, likely due to transcription-coupled de-methylases. Gains in CHG methylation were also associated with gains in CG and CHH methylation, and removal of the transgene via crossing to non-transgenic parents, or progressive AtCMT3 transgene silencing over six generations of propagation, revealed that ectopic genic CG methylation was preferentially maintained relative to genic CHG or CHH methylation following the loss of AtCMT3 expression. The results provide new insights to the mechanism of CMT3-initiation of gbM by demonstrating that CMT3 promotes the establishment of genic CG epimutations which can be maintained even in the absence of CMT3.

Results

Expression of AtCMT3 in E. salsugineum results in increased CHG methylation

To gain insights into the mechanisms of CMT3-targeting of DNA methylation, full-length genomic CMT3 from A. thaliana (AtCMT3) was expressed in E. salsugineum under the native A. thaliana promoter. In total, we assessed plants derived from six transformation events. Two of these lineages, referred to as AtCMT3-L1 and AtCMT3-L2, were propagated by single seed descent for six generations following transformation and serve as the main focus of this study (Figure 1A; see Figure 1—figure supplement 1 and Supplementary file 1 for a complete description of all plant lines and associated data described in this study). Whole genome bisulfite sequencing was completed on individual plants for each generation (numbered T1-T6) of each line to assess the impact of AtCMT3 expression on DNA methylation. Plants expressing AtCMT3 showed increased levels of CHG methylation in intergenic regions, associated with preexisting methylation, as well as over some gene bodies with no prior methylation (Figure 1B and Figure 1—figure supplement 2).

Figure 1. Expression of AtCMT3 in E.salsugineum results in increased CHG methylation.

(A) Schematic of the experiment. Two E. salsugineum lines derived from transformations with genomic A. thaliana CMT3 were propagated by single seed decent for six generations (T1–T6). The two lines are referred to as AtCMT3-L1 and AtCMT3-L2, followed by the generation (T1–T6). For additional lines analyzed in this study see Figure 1—figure supplement 1 and Supplementary file 1. (B) Genome browser view of CHG methylation levels derived from whole genome bisulfite sequencing. The image illustrates the gains in CHG methylation that occur in regions that are methylated in wild type, as well as over gene bodies with no pre-existing DNA methylation (boxed in red). Scales on tracks designate the weighted percent methylation, with 1 = 100% on the top strand and −1 = 100% on the bottom strand. Only CHG methylation is shown. For methylation in all contexts see Figure 1—figure supplement 2.

Figure 1.

Figure 1—figure supplement 1. Diagram of all experimental E. salsugineum lines analyzed in this study with associated data collected.

Figure 1—figure supplement 1.

Figure 1—figure supplement 2. CHG methylation increases in AtCMT3-expressing E. salsugineum lines.

Figure 1—figure supplement 2.

(A) Genome browser view of CHG methylation levels derived from whole genome bisulfite sequencing. The image illustrates the gains in CHG methylation that occur in regions that are methylated in wild type, as well as over gene bodies with no pre-existing DNA methylation (boxed in red). Scales on tracks designate the weighted percent methylation, with 1 = 100% on the top strand and −1 = 100% on the bottom strand. Cytosine methylation is divided into CG (red), CHG (blue), and CHH (yellow) sequence contexts. Corresponds to the region shown in Figure 1B.
Figure 1—figure supplement 3. CG, CHG, and CHH DMRs co-localize.

Figure 1—figure supplement 3.

(A–L) Upset plots showing the number and overlap between hyper DMRs identified for each sequence context, CG, CHG, and CHH, within each generation of each lineage, AtCMT3-L1 and AtCMT3-L2: (A) AtCMT3-L1T1; (B) AtCMT3-L1T2; (C) AtCMT3-L1T3; (D) AtCMT3-L1T4; (E) AtCMT3-L1T5; (F) AtCMT3-L1T6; (G) AtCMT3-L1T1; (H) AtCMT3-L2T2; (I) AtCMT3-L2T3; (J) AtCMT3-L2T4; (K) AtCMT3-L2T5; (L) AtCMT3-L2T6.
Figure 1—figure supplement 4. AtCMT3 expression results in gains in CHG methylation over similar regions between lineages and across generations.

Figure 1—figure supplement 4.

Matrix showing the number and overlap between hyper-CHG DMRs identified in each individual in AtCMT3-L1 and AtCMT3-L2. The top half of the matrix shows the number of overlapping DMRs and the bottom half shows the p-value calculated based on fisher’s exact test (***p significant <0.0004, Bonferroni correction). See also Figure 2A and Supplementary file 3.
Figure 1—figure supplement 5. Genome wide percent CHG methylation is correlated with AtCMT3 transgene expression levels.

Figure 1—figure supplement 5.

AtCMT3 expression was determined by RNA-seq (first panel) or qRT-PCR (second panel). For qRT-PCR AtCMT3 expression was normalized to TUB4 expression (Thhalv10003210m). See also Supplementary file 4.

Differentially methylated regions (DMRs) were identified in each sequence context (CG, CHG, and CHH), comparing each generation of AtCMT3-L1 and AtCMT3-L2, and wild type (Supplementary file 2 and Supplementary file 3). Consistent with AtCMT3 activity, the majority of DMRs identified were CHG DMRs, with 25,096 identified that were characterized by a median size of 407 base pairs (bp) (Supplementary file 2). A smaller number of CG (3,502 DMRs, median size: 189 bp) and CHH (1,763 DMRs, median size: 243 bp) were also identified (Supplementary file 2). The majority of CHG DMRs (>99%, in each lineage) were hypermethylated CHG DMRs in AtCMT3 expressing lineages relative to wild type (Supplementary file 3). Many of the CG and CHH DMRs (between 45–72% CG and 86–98% CHH, depending on the line) were also hypermethylated DMRs in AtCMT3 expressing lines relative to wild type, and overlapped with CHG DMRs, suggesting they result from cross-talk between CMT3-mediated CHG methylation and other pathways (Figure 1—figure supplement 3, Supplementary file 3). There was also a significant overlap of hypermethylated CHG DMRs between all individuals, suggesting that AtCMT3 was not methylating DNA randomly (Figure 1—figure supplement 4).

The levels of CHG methylation varied between individuals expressing AtCMT3 (Figure 1B, Supplementary file 2 and Supplementary file 3). Plants were transformed using the floral dip method (Clough and Bent, 1998), which results in random and potentially multiple transgene insertions that can segregate out over generations. Therefore, we predicted that AtCMT3 transgene expression likely varied between individuals, which could explain differences in CHG methylation levels. To test this, we performed RNA-sequencing in the T3 - T5 generations of AtCMT3-L1, the T3 - T6 generations of AtCMT3-L2, as well as two, T2 generation plants from an additional, independently transformed lineage (AtCMT3-L3) (Figure 1—figure supplement 1). The results demonstrated a significant correlation between the levels of AtCMT3 expression and genome-wide CHG methylation levels (R2 = 0.8828, p=1.669×10−4, Figure 1—figure supplement 5, see Supplementary file 4 for FPKM values). This was especially notable for two T2 generation plants of AtCMT3-L3, which had the highest AtCMT3 expression levels (233.7 and 372.5 FPKM for AtCMT3-L3T2 and T2b, respectively) and the highest genome-wide percent CHG methylation (27% and 32%, for T2 and T2b, respectively). These results also revealed that expression of the AtCMT3 transgene was progressively lost in the AtCMT3-L2 lineage following the T4 generation (Supplementary file 4). We also examined the relationship between genome-wide CHG methylation and AtCMT3 expression using qRT-PCR, including plants from additional lineages, and found a weaker although significant relationship (R2 = 0.5932, p=0.025, Figure 1—figure supplement 1 and Figure 1—figure supplement 5). Taken together, AtCMT3 expression is likely one factor contributing to genome-wide CHG levels in these lines.

Despite lacking CMT3, wild-type E. salsugineum does exhibit residual levels of CHG methylation, likely deposited by other DNA methylation pathways such as RdDM (Bewick et al., 2016) (Figure 1B and Figure 2A). As the maize ortholog of CMT3 has been shown to preferentially act on hemi-methylated CHG cytosines (Stoddard et al., 2019), we predicted that AtCMT3 would be preferentially targeted to regions with pre-existing CHG methylation. To test this, CHG DMRs were ranked based on wild-type CHG methylation levels. Results showed that the majority of CHG DMRs (18,812/25,096 or 75%) were characterized by the presence of low levels of CHG methylation in wild type that increased upon the expression of AtCMT3 (Figure 2A). Characterizing the DMRs based on overlap with genomic features revealed that 19,576 out of 25,096 (78%) total CHG DMRs overlapped with repetitive elements or intergenic regions, which are expected targets of all DNA methylation pathways (Figure 2A, Supplementary file 2). Summarizing CHG methylation over all repetitive elements confirmed that plants expressing AtCMT3 exhibited increased CHG methylation over these regions from ~15% methylation in wild type to ~30–70% methylation in AtCMT3 lines (Figure 2B).

Figure 2. AtCMT3 expression results in CHG methylation over repeats and a subset of gene bodies.

(A) Heatmap of % CHG methylation over hyper-CHG differentially methylated regions (DMRs) defined by comparing all AtCMT3-L1 and L2 generations (T1–T6) to wild type. DMRs are ranked by % CHG methylation levels in wild type plants showing that the majority of gains in CHG methylation occurred over regions with pre-existing CHG methylation classified as repeats or intergenic regions. A subset of regions with no pre-existing CHG methylation, mainly classified as genes, also showed gains in CHG methylation, especially in AtCMT3-L2 individuals. See also Supplementary file 2 and Supplementary file 3. (B) Metaplot summarizing % CHG methylation over repetitive sequences for each Line. (C) Metaplot summarizing % CHG methylation over gene bodies for each Line. (D) The number of genes gaining a minimum of 5% CHG methylation (CHG-gain genes) in each lineage. See Supplementary file 5 for lists of CHG-gain genes.

Figure 2.

Figure 2—figure supplement 1. AtCMT3 expression results in gains in CHG methylation over similar genes between lineages and across generations.

Figure 2—figure supplement 1.

Matrix showing the number and overlap between genes gaining a minimum of 5% CHG methylation identified in each individual in AtCMT3-L1 and AtCMT3-L2. The top half of the matrix shows the number of overlapping genes and the bottom half shows the p-value calculated based on a hypergeometric test (***p significant <0.0004, Bonferroni correction). See also Figure 2D and Supplementary file 5.
Figure 2—figure supplement 2. The number of genes gaining a minimum of 5% CHG methylation (CHG-gain genes) is correlated with AtCMT3 transgene expression levels.

Figure 2—figure supplement 2.

AtCMT3 expression was determined by RNA-seq (first panel) or qRT-PCR (second panel). For qRT-PCR AtCMT3 expression was normalized to TUB4 expression (Thhalv10003210m). See also Figure 2D and Supplementary file 4.

Expression of AtCMT3 in E. salsugineum results in increased CHG methylation in a subset of gene bodies

In addition to methylated regions, a subset of regions (6,284 out of 25,096 (25%) CHG DMRs) that gained CHG methylation in AtCMT3-expressing lines had no pre-existing CHG methylation, and the majority (4,725/6,284 or 75%) of these regions overlapped annotated genes (Figure 1B, Figure 2A, and Figure 1—figure supplement 2, Supplementary file 2). Examination of percent CHG methylation over all annotated genes revealed increases in CHG methylation over gene bodies, ranging from ~2–10% CHG methylation on average, compared to no CHG methylation in wild-type plants (Figure 2C). The patterns of CHG methylation in several of the generations of the AtCMT3-L2 lineage, which exhibited higher levels of AtCMT3 transgene expression (Supplementary file 4), were reminiscent of that seen for gbM (Bewick and Schmitz, 2017), in that the percent methylation levels were highest towards the center of the gene bodies, decreasing towards the transcription start sites (TSS) and transcription termination sites (TTS) (Figure 2C).

To further characterize the genes that gained CHG methylation, genes that had less than 1% total DNA methylation in any context in wild type and that gained a minimum of 5% CHG methylation were identified in each line (CHG-gain genes) (Figure 2D, Supplementary file 5). The number of CHG-gain genes varied between individuals and lineages. In AtCMT3-L1, the number of CHG-gain genes generally increased over generational time from 99 genes in T1 to 583 genes in T6 (Figure 2D, Supplementary file 5). AtCMT3-L2 individuals were characterized by higher numbers of CHG-gain genes relative to AtCMT3-L1, with the T1 generation showing gains in 718 genes, 2,996 genes in T2, 3,553 genes in T3, and 4,055 genes in T4 (Figure 2D, Supplementary file 5). Following the T4 generation, the T5 and T6 generations of AtCMT3-L2 showed a decline in the number of CHG-gain genes, with 1,937 in T5 and only 88 genes in T6 (Figure 2D, Supplementary file 5). Despite the variation in the number of CHG-gain genes, the overlap of CHG-gain genes between all individuals was significantly higher than expected by chance (Figure 2—figure supplement 1). As CHG-gain genes had no prior CHG methylation, these results demonstrate a de novo methyltransferase activity of CMT3 in vivo.

We hypothesized that the variation in the number of CHG-gain genes between individuals could be related to variation in the expression levels of the AtCMT3 transgene, similar to genome-wide CHG methylation levels. Indeed, based on RNA-seq assessment of AtCMT3 expression, the levels of AtCMT3 expression and number CHG-gain genes were correlated (R2 = 0.7951, p=0.001) (Figure 2—figure supplement 2, Supplementary file 4). Again, two T2 generation individuals of the AtCMT3-L3 lineage, which had the highest AtCMT3 expression (Supplementary file 4), also had the highest number of CHG-gain genes (5,566 and 6,346 CHG-gain genes for AtCMT3-L3T2 and T2b, respectively) (Supplementary file 5). This result was also confirmed utilizing qRT-PCR (R2 = 0.9503, p=3.91×10−5) (Figure 2—figure supplement 2). Therefore, the decline in the number of CHG-gain genes in the AtCMT3-L2T5 and T6 generations is correlated with the progressive loss of AtCMT3 transgene expression.

CHG methylation in gene bodies is not associated with stable H3K9 methylation

CMT3 can directly bind to H3K9me2 (Du et al., 2012). To determine if increases of CHG methylation detected in plants expressing AtCMT3 were correlated with H3K9me2, we conducted H3K9me2 chromatin immunoprecipitation and sequencing (ChIP-seq) in wild-type E. salsugineum. Consistent with CMT3 binding of H3K9me2, regions enriched for H3K9me2 in wild type showed increased CHG methylation in all lines expressing AtCMT3 (Figure 3A–B and Figure 3—figure supplement 1).

Figure 3. Gains in CHG methylation do not alter H3K9me2 levels or distribution.

(A) Genome browser view of CHG methylation levels and H3K9me2 ChIP sequencing levels in the T3 and T5 generations of the AtCMT3-L1 and L2 lineages. Arrows indicate gains of CHG methylation over gene bodies in the AtCMT3-L2 generations that do not show H3K9me2 enrichment. Scales on methylation tracks designate the weighted percent methylation, with 1 = 100% on the top strand and −1 = 100% on the bottom strand. Scales on the H3K9me2 tracks indicate the number of mapped reads and are not adjusted for library size (See C-F for comparison of normalized reads). DNA methylation is only shown in the CHG context. For DNA methylation in all contexts see Figure 3—figure supplement 1. (B) Metaplot of % CHG methylation over H3K9me2 ChIP peaks identified in wild type plants. (C–F) Metaplot of H3K9me2 ChIP-sequencing enrichment over H3K9me2 ChIP peaks defined in wild type plants and over CHG-gain genes in AtCMT3-L1T3 (C), AtCMT3-L1T5 (D), AtCMT3-L2T3 (E), and AtCMT3-L2T5 (F). Reads were normalized to library size. See Supplementary file 5 for lists of CHG-gain genes in each lineage.

Figure 3.

Figure 3—figure supplement 1. Gains in CHG methylation do not alter H3K9me2 levels or distribution.

Figure 3—figure supplement 1.

(A) Genome browser view of CHG methylation levels and H3K9me2 ChIP sequencing levels in the T3 and T5 generations of the AtCMT3-L1 and AtCMT3-L2 lineages. Arrows indicate gene bodies that gain CHG methylation in the AtCMT3-L2 generations but do not show H3K9me2 enrichment. Scales on methylation tracks designate the weighted percent methylation, with 1 = 100% on the top strand and −1 = 100% on the bottom strand. Scales on the H3K9me2 tracks indicate the number of mapped reads and are not normalized. Cytosine methylation is divided into CG (red), CHG (blue), and CHH (yellow) sequence contexts. Corresponds to the region shown in Figure 3A.
Figure 3—figure supplement 2. Gains in CHG methylation do not alter H3K9me1 levels or distribution.

Figure 3—figure supplement 2.

(A) Genome browser view of CHG methylation levels and H3K9me1 ChIP sequencing levels in the T2c generation of the AtCMT3-L3 lineage. In wild type and AtCMT3-L3T2c, H3K9me1 can be found enriched in regions also enriched for H3K9me2. The arrow indicates a gene body that gains CHG methylation in AtCMT3-L3 but does not show H3K9me1 enrichment. Scales on methylation tracks designate the weighted percent methylation, with 1 = 100% on the top strand and −1 = 100% on the bottom strand. Scales on the H3K9me1 tracks indicate the number of mapped reads and are not normalized. Cytosine methylation is divided into CG (red), CHG (blue), and CHH (yellow) sequence contexts. (B) Metaplot of H3K9me1 ChIP-sequencing enrichment over H3K9me2 ChIP peaks defined in wild type plants and over CHG-gain genes in AtCMT3-L3T2c. (C) Metaplot of H3K9me1 ChIP-sequencing enrichment comparing enrichment over CHG-gain genes in AtCMT3-L3T2c to genes that do not gain CHG methylation (UM genes).

We next sought to determine how gains in CHG methylation affected the distribution of H3K9me2 by conducting H3K9me2 ChIP-seq in the T3 and T5 generation plants for AtCMT3-L1 and AtCMT3-L2. We found plants expressing AtCMT3 also showed enrichment for H3K9me2 across H3K9me2 ChIP peaks identified in wild type, but there were no further increases of H3K9me2 in heterochromatin, despite the increase in CHG methylation (Figure 3A,C–F, and Figure 3—figure supplement 1). In contrast to repeat regions, gains in CHG methylation over gene bodies in AtCMT3-expressing lines were not associated with pre-existing H3K9me2 in wild-type plants (Figure 3A,C–F, and Figure 3—figure supplement 1). Also unexpected, the establishment of CHG methylation following AtCMT3 expression did not result in detectable H3K9me2 across CHG-gain genes (Figure 3A,C–F, and Figure 3—figure supplement 1).

CMT3 can also directly bind to H3K9me1, which distinguishes it from CMT2 (Du et al., 2012; Stroud et al., 2014). Therefore, we also conducted H3K9me1 ChIP-seq in transgenic and non-transgenic lines. Similar to findings in A. thaliana (Jackson et al., 2004), H3K9me1 was enriched over regions characterized by H3K9me2 (Figure 3—figure supplement 2A–B). H3K9me1 was preferentially enriched in heterochromatin relative to CHG-gain genes before or after introduction of the transgene and the H3K9me1 signal over CHG-gain genes was indistinguishable from genes that remained unmethylated (Figure 3—figure supplement 2B–C). Overall, we conclude that gains of CHG methylation at genic loci resulting from expression of AtCMT3 are not associated with stable H3K9 methylation.

CHG methylation in gene bodies is not associated with transcriptional silencing

CMT3-mediated CHG methylation in heterochromatin is associated with transcriptional silencing (Bartee et al., 2001; Lindroth et al., 2001; Stroud et al., 2014). To determine if gains in CHG methylation over gene bodies resulted in transcriptional changes, we compared expression of CHG-gain genes between wild-type and transgenic plants. In each line there was only a small proportion of CHG-gain genes (less than or equal to 10%) that showed a greater than two logfold change and a similar number were both down and up-regulated (Figure 4A and Figure 4—figure supplement 1, Supplementary file 5). There was also no relationship between the levels of CHG methylation gain and expression changes (Figure 4A and Figure 4—figure supplement 1). Additionally, we defined up- and down-regulated genes genome wide in AtCMT3-expressing lines (defined as greater than a two logfold change relative to wild type), and CHG gain genes were not significantly enriched in either up- or down-regulated genes (p>0.05, Fishers exact test) (Supplementary file 6 and Supplementary file 7). We also considered that the changes in gene expression may be indirect effects of AtCMT3 expression and assessed up- and down-regulated genes for significant enrichments in Gene Ontology (GO) biologic processes. We found that many of the enriched terms for up- and down-regulated genes were involved in abiotic stress responses (Supplementary file 8). However, there were no consistently identified GO-term enrichments across lineages suggesting that AtCMT3 expression was unlikely the direct or indirect cause of these changes (Supplementary file 8). Lack of transcriptional changes directly related to CHG methylation is consistent with the lack of stable H3K9me2 at these loci (Figure 3C–F). We conclude that genic CHG methylation in AtCMT3-expressing lines is uncoupled from heterochromatin formation and transcriptional silencing, similar to gbM.

Figure 4. Genes that gain CHG methylation have A. thaliana gbM gene characteristics.

(A) Relationship between levels of CHG methylation gain and logfold change (FC) in expression for CHG-gain genes in AtCMT3-L2T4 relative to wild type. Genes with zero FPKM values were removed from the analysis. See Figure 4—figure supplement 1 for additional individuals analyzed. (B–E) Comparison of the (B) distribution of gene lengths, (C) number of exons, (D) expression levels, and (E) frequency of CHG sites between CHG-gain genes and unmethylated genes (UM). P-values were calculated using a Wilcoxon rank-sum test. Boxes indicate the first and third quartiles, with the center line indicating the median and notches the 95% confidence interval of the median. Whiskers show 1.0 times the interquartile range and outliers beyond this range were excluded for visualization purposes, but included in all calculations.

Figure 4.

Figure 4—figure supplement 1. Changes in expression are not related to CHG methylation levels.

Figure 4—figure supplement 1.

Relationship between fold change in expression relative to wild type and CHG methylation levels for genes identified as gaining a minimum of 5% CHG methylation in each respective line. Lines analyzed include those with both RNA sequencing and whole genome bisulfite sequencing data: (A) AtCMT3-L1T3; (B) AtCMT3-L1T4; (C) AtCMT3-L1T5; (D) AtCMT3-L2T3; (E) AtCMT3-L2T5; (F) AtCMT3-L2T6; (G) AtCMT3-L3T2; (H) AtCMT3-L3T2b. Genes with zero FPKM values were removed from the analysis.

Genes that gain CHG methylation in AtCMT3-expressing lines are orthologs of A. thaliana gbM genes and possess similar characteristics

GbM is present on conserved orthologous genes across diverse plant species (Bewick et al., 2017; Niederhuth et al., 2016; Seymour et al., 2014; Takuno and Gaut, 2013; Takuno et al., 2016). To determine if genes that gain CHG methylation in AtCMT3-expressing lineages are genes that would be predicted to have gbM based on orthology, orthologs of CHG-gain genes in E. salsugineum were identified in A. thaliana. Among the generations of AtCMT3-L1 and L2, there was a total of 4,769 CHG-gain genes identified in at least one individual (Supplementary file 5). Among these genes, 4,104 have an orthologous gene encoded in the A. thaliana genome, and, out of those, 1,526 were classified as gbM in the A. thaliana Col-0 accession, which is significantly more than expected by chance (p=2.83×10−222, hypergeometric test) (Supplementary file 9).

Relative to unmethylated genes, gbM genes are generally characterized as being constitutively expressed at moderate levels, they tend to be longer and have more exons, and have a higher frequency of CMT3-preferred CHG sites (Bewick et al., 2016; Bewick et al., 2017; Lister et al., 2008; Niederhuth et al., 2016; Takuno and Gaut, 2012; Takuno and Gaut, 2013; Takuno et al., 2016; Zhang et al., 2006). To determine whether the CHG-gain genes had similar characteristics, we compared the 4,769 CHG-gain genes identified to the remaining unmethylated E. salsugineum genes (UM genes). On average, CHG-gain genes were longer, contained more exons, exhibited a more moderate, but on average higher, range of expression, and had a higher frequency of CHG cytosines relative to gene length compared to unmethylated genes (Figure 4B–E). Therefore, AtCMT3 in E. salsugineum methylates genes that are orthologs and/or characterized by similar properties of gbM loci in A. thaliana.

Gains in CHG methylation over gene bodies are associated with gains in non-CHG methylation

GbM is defined as strictly CG context methylation (Bewick and Schmitz, 2017), which contrasts with the predominantly CHG methylation observed over gene bodies in the AtCMT3-Lines (Figure 1B, Figure 3A, Figure 1—figure supplement 2, and Figure 3—figure supplement 1). To determine if AtCMT3 expression resulted in non-CHG methylation over gene bodies as well, we focused on the AtCMT3-L2 lineage, which showed higher AtCMT3 expression levels and CHG-gain genes relative to AtCMT3-L1. We analyzed all CHG DMRs that overlapped CHG-gain genes identified in the T4 generation (4,312 CHG DMRs overlapping 4,055 CHG-gain genes) (Supplementary file 2 and Supplementary file 5), and determined the levels of CG, CHG, and CHH methylation over these regions (Figure 5A). Re-focusing the analysis to the level of DMRs over genes, rather than whole genes, revealed some residual methylation present on genes below our original cutoff of 1% gene-wide methylation. This residual methylation was almost exclusively in the CG context and is consistent with CMT3 preferentially localizing to methylated DNA (Figure 5A). Importantly, 2,094 of the 4,312 regions had no detectable CHG methylation in wild type and 597 of the 4,312 had no methylation in any sequence context (see also Figure 5A, Figure 1—figure supplement 2, and Figure 3—figure supplement 1).

Figure 5. CHG methylation over gene bodies is associated with gains in non-CHG methylation.

(A) Heatmap of % methylation levels across CHG DMRs overlapping the CHG-gain genes in AtCMT3-L2T4 divided into selected trinucleotide contexts. See Figure 5—figure supplement 1A for further parsing of the data into all 16 possible trinucleotide contexts. (B) Assessment of the relationship of genic cytosine methylation to AtCMT3 expression across AtCMT3-L2 generations. Line plots show the number of methylated cytosines in each context relative to AtCMT3-L2T4 across CHG DMRs overlapping CHG-gain genes identified in AtCMT3-L2T4. Bar plots show the expression of the AtCMT3 transgene. For further parsing of the CHG and CHH contexts into CWG vs. CCG and CWA vs. other CHH, see Figure 5—figure supplement 1B–C. (C) Assessment of the relationship of repeat and intergenic cytosine methylation to AtCMT3 expression across AtCMT3-L2 generations. Line plots show the number of methylated cytosines relative to AtCMT3-L2T4 across hyper CHG DMRs identified in AtCMT3-L2T4 that overlap repeats or intergenic regions. Bar plots show the expression of the AtCMT3 transgene.

Figure 5.

Figure 5—figure supplement 1. CHG methylation over gene bodies is associated with gains in non-CHG methylation.

Figure 5—figure supplement 1.

(A) Heatmap of % methylation levels across CHG hyper-DMRs overlapping the CHG-gain genes in AtCMT3-L2T4 divided into all 16 possible trinucleotide contexts. (B–C) Assessment of the relationship of genic cytosine methylation to AtCMT3 expression across AtCMT3-L2 generations. Line plots show the number of methylated cytosines in (B) CWG vs. CCG contexts and (C) CWA vs. other CHH contexts relative to AtCMT3-L2T4 across CHG DMRs overlapping CHG-gain genes defined in AtCMT3-L2T4. Bar plots show the expression of the AtCMT3 transgene. See also Figure 5A–B. (D) Assessment of the relationship of genic cytosine methylation to AtCMT3 expression across AtCMT3-L1 generations. Line plots show the number of methylated cytosines in each context relative to AtCMT3-L1T4 across CHG DMRs overlapping CHG-gain genes identified in AtCMT3-L1T4. Bar plots show the expression of the AtCMT3 transgene. (E–F) Assessment of the number of methylated cytosines relative to AtCMT3-L1T5 across hyper CHG DMRs identified in AtCMT3-L1T5 that overlap CHG-gain genes identified in AtCMT3-L1T5 (E) or repeats or intergenic regions (F). AtCMT3-L1T5 was crossed to wild type (non-transgenic) and three F2 progeny were assessed: one progeny that retained the transgene (L1T5XWT F2 (+CMT3)) and two where the transgene segregated out (L1T5XWT F2 (-CMT3) #1 and #2). The gel image below the line plot is the result of PCR conducted on genomic DNA from each line with primers to detect the AtCMT3 transgene.
Figure 5—figure supplement 2. AtCMT3-induced genic CG methylation is maintained at higher levels than background following loss of AtCMT3 expression.

Figure 5—figure supplement 2.

(A) Comparison of the % methylation of CG sites that were not found to be methylated in the non-transgenic wild type (Shandong ecotype) parent of AtCMT3 transgenic lines. Shown is the % CG methylation calculated in each lineage across hyper CHG DMRs identified in AtCMT3-L2T4 that overlap CHG-gain genes identified in AtCMT3-L2T4 (black bars; same regions assessed in Figure 5B) compared to the average % CG methylation of an equal amount of sequence space extracted from five randomly chosen sets of unmethylated genes that did not gain CHG methylation in AtCMT3-L2T4. The number of genes chosen in each set was equal to the number of AtCMT3-L2T4 CHG gain genes. Note that AtCMT3-L2T5 and T6 exhibited silencing of the AtCMT3 transgene, yet still maintained CG methylation levels higher than that detected on unmethylated genes. Methylation over the same regions was also assessed in additional, non-transgenic E. salsugineum accession (Yukon) to demonstrate that the levels of CG methylation over CHG gain genes in transgenic lines were unlikely to have occurred independently of AtCMT3. (B) Comparison of the % methylation of CG sites that were not found to be methylated in the non-transgenic wild type (Shandong ecotype) parent of AtCMT3 transgenic lines. Shown is the % CG methylation calculated in each lineage across hyper CHG DMRs identified in AtCMT3-L1T5 that overlap CHG-gain genes identified in AtCMT3-L1T5 (black bars; same regions assessed in Figure 5—figure supplement 1E) compared to the average % CG methylation of an equal amount of sequence space extracted from five randomly chosen sets of unmethylated genes that did not gain CHG methylation in AtCMT3-L1T5. The number of genes chosen in each set was equal to the number of AtCMT3-L1T5 CHG gain genes. AtCMT3-L1T5 was crossed to wild type (non-transgenic) and three F2 progeny were assessed: one progeny that retained the transgene (L1T5XWT F2 (+CMT3)) and two where the transgene segregated out (L1T5XWT F2 (-CMT3) #1 and #2). Note that the F2 progeny where the transgene segregated out still maintained CG methylation levels higher than that detected on unmethylated genes. As in (A), methylation over the same regions was also assessed in additional, non-transgenic E. salsugineum accession (Yukon) to demonstrate that the levels of CG methylation over CHG gain genes in transgenic lines were unlikely to have occurred independently of AtCMT3. Error bars are ± one standard deviation of the mean.

As observed previously, the major gains in genic methylation in AtCMT3-Lines occurred in the CHG contexts (Figure 5A). Further dividing CHG methylation into CWG and CCG contexts revealed that the CHG methylation was highly enriched for CWG methylation, consistent with the preferred substrates of CMT3 (Gouil and Baulcombe, 2016; Stoddard et al., 2019) (Figure 5A). Widespread, but lesser gains were also identified for CHH cytosines (Figure 5A). The CHH methylation was almost exclusively in the CWA context, which is indicative of CMT2 activity and not RdDM (Gouil and Baulcombe, 2016), suggesting that CMT3 activity likely leads to the recruitment of CMT2. Gains in genic CG methylation were comparatively lower than the other sequence contexts, but did occur appreciably over most loci, including regions with no prior CG methylation (Figure 5A). Further parsing of the data into all 16 trinucleotide contexts did not reveal any additional trends in the patterns of methylation (Figure 5—figure supplement 1A).

Ectopic genic CG methylation is preferentially maintained following loss of AtCMT3 expression

Establishment of methylation in non-CHG sequence contexts over CHG-gain genes is consistent with CMT3 activity recruiting additional methyltransferase pathways. The natural transgene silencing that occurred in the AtCMT3-L2 lineage following the T4 generation provided an opportunity to test this hypothesis and determine whether non-CHG methylation remained following loss of AtCMT3 expression. To do so, we analyzed AtCMT3-L2 individuals for which we had both RNA-seq and whole genome bisulfite sequencing data (wild type, AtCMT3-L2 T3-T6) relative to the T4 generation, which showed the highest number of CHG-gain genes. The number of newly methylated cytosines in each sequence context in the CHG DMRs overlapping the 4,055 CHG-gain genes identified in AtCMT3-L2T4 were identified for each individual (same regions as Figure 5A; cytosines analyzed were corrected for coverage in all lines). Then the ratio of methylated cytosines was calculated relative to the T4 generation.

Results showed that the proportional CHG methylation levels on genes mirrored AtCMT3 expression levels, with a slight increase in proportional methylation from 92% in AtCMT3-L2T3 to 100% (by definition) in AtCMT3-L2T4 (Figure 5B). Following the T4 generation, and consistent with progressive silencing of the AtCMT3 transgene, the number of CHG cytosines methylated showed a marked decrease, with only 62% of T4 levels remaining methylated in the T5 generation and 4% remaining methylated in the T6 generation where AtCMT3 expression was greatly reduced (Figure 5B). CHH context cytosines showed a similar trend with 97% cytosines methylated in the T3 relative to the T4 generation, followed by a steep decline, with only 24% and 5% methylated remaining in the T5 and T6 generations, respectively, relative to the T4 (Figure 5B). Further dividing CHG and CHH cytosines into CWG vs. CCG or CWA vs. other CHH contexts did not reveal any differences in these trends (Figure 5—figure supplement 1B–C).

CG context cytosines showed a more substantial increase from the T3 to T4 generation, with CG cytosines in the T3 generation at 54% of the levels measured in T4 (Figure 5B). Also contrasting with CHG and CHH methylation, following the T4 generation, the proportion of methylated CG sites remaining following the progressive AtCMT3 silencing was relatively high and much more stable, with 68% in T5% and 51% in T6 of the CG cytosines remaining methylated, relative to T4 (Figure 5B).

The results of the analysis of genic methylation for AtCMT3-L2 contrast with the identical analysis conducted on AtCMT3-L1, which did not show evidence of silencing of the AtCMT3 transgene. In this case, the relative methylation levels were maintained in all sequence contexts across generations (Figure 5—figure supplement 1D). Furthermore, to verify that the remaining genic CG methylation in AtCMT3-L2T6 did not result from residual AtCMT3 activity, we also completed an alternative approach where we crossed the AtCMT3-L1T5 transgene expressing line to wild type (non-transgenic) to segregate out the transgene. We analyzed two F2 progeny from this cross in which the transgene was segregated out and one F2 progeny that still contained the transgene (Figure 1—figure supplement 1). In the F2 progeny that did not encode the transgene, the ectopic genic CHG and CHH methylation that was present in the in the transgenic parent was lost but the genic CG methylation was maintained, similar to the result from AtCMT3-L2 when the transgene was silenced (Figure 5—figure supplement 1E). In contrast, the F2 progeny that still encoded the transgene maintained similar levels of genic methylation in all contexts as the transgenic parent (Figure 5—figure supplement 1E).

Importantly, the levels of newly methylated CG cytosines on CHG-gain genes in both the lines that lost AtCMT3 through silencing or crossing out were higher than background due to bisulfite non-conversion, estimated by comparing them to randomly sampled, un-methylated genes that did not gain CHG methylation (Figure 5—figure supplement 2A–B). The genic CG methylation is also unlikely to have occurred independently of AtCMT3, as the gains in genic CG methylation at these loci are also higher that those found by comparing an additional, non-transgenic E. salsugineum accession (Yukon) (Figure 5—figure supplement 2A–B). Thus, genic CG methylation showed a lesser dependency on AtCMT3 expression and was preferentially maintained following loss of transgene expression, either through natural silencing or crossing out the transgene, consistent with maintenance of CG methylation by other methyltransferases recruited by initial AtCMT3 activity.

AtCMT3 preferentially methylates heterochromatin relative to genes

The majority CHG DMRs resulting from AtCMT3 expression were annotated as repeats or intergenic regions with prior methylation in wild type (~75% compared to ~25% genic loci with no prior methylation) (Figure 2A). Furthermore, the levels of CHG methylation in AtCMT3 expressing lines were much higher on repeats relative to genes (30–70% CHG methylation on repeats compared to 2–10% on genes) (Figure 2B–C). Therefore, we next sought to determine whether this difference was a result of AtCMT3 preferentially methylating heterochromatic loci relative to genic loci. We reasoned that favorable activity of AtCMT3 on heterochromatin could be revealed in lines where AtCMT3 transgene expression was reduced, as they would be expected to show a greater proportional loss of CHG methylation over genic regions relative to heterochromatic loci when compared to lines with high AtCMT3 expression.

To test this, we performed the same analysis reported in Figure 5B, except instead of genic loci, we focused on regions defined as hyper-CHG DMRs in AtCMT3-L2T4 that did not overlap genes (i.e. repeats and intergenic loci). Results showed that the relative number of methylated cytosines was much more robustly maintained in the AtCMT3-L2 T5 and T6 generations at these loci relative to genic loci following loss of AtCMT3 transgene expression (compare Figure 5B and C). This was especially evident in the T6 generation that showed a −4.6 logfold change in AtCMT3 expression relative to AtCMT3-L2T4 (Figure 5C). Despite substantial loss of transgene expression, 55% and 57% of CHG and CHH cytosines, respectively, remained methylated in the T6 generation relative to T4 across repeats and intergenic loci (Figure 5C). In contrast, only 4% of CHG and 5% of CHH cytosines remaining methylated across genes (Figure 5B). Methylated CG cytosines were also slightly more robust to loss of AtCMT3 transgene expression, dropping to 66% of T4 levels over repeats compared to 51% over genes in the T6 generation (Figure 5B–C).

An alternative explanation for these results is that methylation induced by AtCMT3 expression is more readily maintained by other methylation pathways in heterochromatin relative to genes, as other methyltransferases preferentially target these regions. To consider this possibility, we also analyzed the relative maintenance of DNA methylation in heterochromatin in lines in which the transgene was removed via crossing to wild type. In the F2 progeny of the AtCMT3-L1T5 X wild type cross that no longer encoded the transgene, both CG and CHH methylation were maintained at relatively similar levels as the transgenic parent, in contrast to CHG methylation that was only maintained at ~25% of the levels of the transgenic parent (Figure 5—figure supplement 1F). This is consistent with the possibility that AtCMT3-induced CG and CHH methylation, and to a lesser extent CHG methylation, can be perpetuated following loss of AtCMT3 in heterochromatin in preference to genes by other methylation pathways. However, the relative levels of CHG methylation in particular are lower than those detected over heterochromatin for lines in which the transgene was silenced, consistent with residual AtCMT3 preferentially targeting heterochromatin in these lines (Figure 5C). We conclude that AtCMT3 preferentially targets heterochromatin and does not readily methylate genic loci until expressed at high levels.

Discussion

We have provided experimental evidence that CMT3 can initiate epimutations in the form of gene body CG methylation, which are maintained over generational time, even after loss of AtCMT3 expression. This finding has provided new insights into CMT3 localization and function by showing CMT3 is associated with de novo DNA methylation activity in vivo at genic loci lacking prior DNA methylation. The results also revealed a mechanism for the establishment of gbM that is consistent with the hypothesis that gbM is a passive effect of self-reinforcing positive feedback loops inherent to the heterochromatin machinery.

The natural loss of CMT3 in E. salsugineum and other species associated with the loss of gbM (Bewick et al., 2016; Niederhuth et al., 2016), combined with prior work demonstrating that CMT3 targeting and activity requires binding to H3K9me (Du et al., 2012; Stoddard et al., 2019), suggests that both CMT3 and histone methyltransferases are important in the establishment of gbM. Furthermore, a plausible means for CMT3 activity to recruit additional methyltransferase pathways that deposit methylation in additional sequence contexts, such as CMT2, is indirectly through the promotion of H3K9me2 (Du et al., 2015). We therefore propose that both enzymes work in concert to provide de novo methylation of transcribed loci to initially establish gbM through the model described in Figure 6, which expands on prior models (Inagaki and Kakutani, 2012).

Figure 6. Hypothetical model for CMT3 establishment of gbM.

Figure 6.

The activity of CMT3 and histone methyltransferases (HMTs) maintains CWG methylation and H3K9me2, respectively, and is most readily detected at silenced loci. At silenced loci, methylation by CMT3 and HMTs is reinforced by other methyltransferase pathways, including MET1 (CG methylation), CMT2 (CWA methylation), and DRM2 (methylation in all contexts), which maintain constitutive heterochromatin. In contrast to silenced loci, CMT3 and HMTs can only transiently establish de novo CWG methylation and H3K9me at transcribed loci characterized by gbM. This process may be initiated by incorporation of H3K9me1 nucleosomes, which are bound exclusively by CMT3 and not CMT2 and normally removed by the H3K9 de-methylase IBM1 in a transcription coupled mechanism. However, on rare occasions CMT3 may bind H3K9me1 located in genes and de novo methylate CWG cytosines. De novo methylation of CMT3 is not a favored activity of the enzyme, so this happens only very rarely or when CMT3 is expressed at high levels. This temporally stabilizes H3K9me due to the self-reinforcing feedback loop between histone and DNA methyltransferases. Transient stabilization of H3K9me promotes H3K9me2 that can then recruit additional methyltransferases including MET1 and CMT2 to methylate CG and CWA cytosines, respectively. Heterochromatin formation is inhibited, however, through eventual removal of H3K9me by IBM1. Loss of H3K9me and/or loss of available CMT3 results in the failure of maintenance of DNA methylation in all contexts except CG following DNA replication. CG methylation is maintained due to the preferential targeting of the CG maintenance methyltransferase, MET1, to hemi-methylated cytosines following replication.

In this model, nucleosomes possessing H3K9me are on rare occasions incorporated into transcribed genes. Initially, genic histone methylation could be restricted to H3K9me1, which is bound by CMT3 and not CMT2 (Stroud et al., 2014), and could potentially explain the phylogenetic correlation between encoding CMT3 and the presence of gbM across plant species (Bewick et al., 2016; Niederhuth et al., 2016). Most H3K9me associated nucleosomes are efficiently removed via the histone de-methylase, INCREASED BONSAI METHYLATION 1 (IBM1), in a transcription coupled mechanism. In A. thaliana, gbM loci are devoid of H3K9me2 due to the activity of IBM1, which prevents the establishment of H3K9me2 at gbM loci through a mechanism that requires active transcription (Inagaki et al., 2010; Saze et al., 2008). It is notable that encoded in the E. salsugineum genome are several expressed orthologs of the A. thaliana IBM1 (Supplementary file 10), which could destabilize H3K9me and plausibly explain the lack of detection of H3K9me over genes that gain CHG methylation (Figure 3, Figure 3—figure supplement 1 and Figure 3—figure supplement 2). However, CMT3 binding to H3K9me may transiently stabilize H3K9me2 through the establishment of de novo CHG methylation, which activates the feedback loop between DNA and histone methyltransferases. This contrasts with CMT3 activity at heterochromatin, where CMT3 preferentially methylates hemi-methylated CWG sites and is re-enforced by additional methylation pathways. As de novo methylation is a less favored activity of CMT3, this process is predicted to occur rarely, but can be promoted with high levels of CMT3 expression. Transient stabilization of H3K9me2 recruits additional methyltransferases, including CMT2, which establish DNA methylation in additional sequence contexts. Finally, removal of H3K9me2 by IBM1 disrupts the feedback loop between H3K9me2 and CMTs resulting in the loss of non-CG methylation following DNA replication. CG methylation is maintained, however, due to the preferential recruitment of CG maintenance methyltransferases to hemi-methylated sites following DNA replication (Figure 6).

From a mechanistic standpoint, it is most parsimonious to conclude that gbM is a passive byproduct inherent to the function of CMT3 in the maintenance of heterochromatin. The presence of pathways that work to uncouple gbM from transcriptional silencing, such as the IBM1 pathway, further support this line of reasoning, as they may have evolved to counteract negative consequences of ‘spillage’ of the heterochromatin machinery into genic space. Why, then, are some genes consistently characterized by body methylation across species and others not? It is telling that the genes that gain methylation in E. salsugineum are homologs of gbM genes in A. thaliana and/or retain similar characteristics including gene length, expression profile, and relative frequency of CHG sites. Rather than an exact determinant of gbM status, it is likely that gene length and constitutive expression contribute to the exposure of a locus to incorporation of H3K9me1/2 nucleosomes, which, combined with the frequency of CMT3-preferred CWG sites and CMT3 levels, make a gene susceptible to methylation by CMT3 in a probabilistic manner. Under this model, gbM can be thought of as an epigenetic scar resulting from transient localization of the heterochromatin machinery that is likely to be present on a given gene as a function of these factors. This model does not exclude functional consequences of gbM, but it also does not require them.

Materials and methods

Key resources table.

Reagent type
(species) or resource
Designation Source or reference Identifiers Additional
information
Strain, strain background (Agrobacterium tumefaciens) C58C1 other Dr. Robert Schmitz (University of Georgia)
Strain, strain background (Eutrema salsugineum) Shandong https://www.arabidopsis.org CS22504
Strain, strain background (E. salsugineum) Shandong AtCMT3 lineages This paper Dr. Robert Schmitz (University of Georgia)
Strain, strain background (E. salsugineum) Yukon https://www.arabidopsis.org CS22664
Antibody anti-H3K9me2 Cell Signaling Technology Cat# 9753 s; RRID: AB_659848 Polyclonal, 5 μg
Antibody anti-H3K9me1 Abcam Cat# 8896; RRID: AB_732929 Polyclonal, 5 μg
Recombinant DNA reagent pEarleyGate 302 pAtCMT3::gAtCMT3 PMID: 23021223
Peptide, recombinant protein T4 DNA Ligase NEB Cat# M0202
Peptide, recombinant protein Klenow Fragment NEB Cat# M0210
Peptide, recombinant protein Phusion DNA Polymerase NEB Cat# M0530
Peptide,
recombinant protein
SuperScript III Reverse Transcriptase Invitrogen Cat# 18080044
Commercial assay or kit Qiagen DNeasy
Plant Mini Kit
Qiagen Cat# 69106
Commercial assay or kit EZ DNA-methylation
Gold Kit
Zymogen Cat# D5006
Commercial assay or kit AMPure beads Beckman Coulter Cat# A63880
Commercial assay or kit NEXTFLEX Bisulfite Sequencing Library Prep Kit Bioo Scientific Cat# NOVA-5119–01
Commercial assay or kit KAPA HiFi Uracil+ Roche Cat #07959079001
Commercial assay or kit Direct-Zol RNA
Mini-prep plus
Zymogen Cat# R2071
Commercial assay or kit Illumina TruSeq mRNA Stranded Library Kit Illumina Cat# 20020594
Commercial assay or kit Protein A Dynabeads Invitrogen Cat# 10001D
Commercial assay or kit LightCycler 480 SYBR green master mix Roche Cat# 04707516001
Chemical compound, drug Silwet L-77 Phyto Technology Laboratories Cat#:S7777
Chemical compound, drug Pierce Protease Inhibitors ThermoFisher Cat# A32963
Chemical compound, drug NEBNext dA-Tailing Reaction Buffer NEB Cat# B6059
Chemical compound, drug proteinase K ThermoFisher Cat# 26160
Software, algorithm methylpy PMID: 26030523 https://github.com/yupenghe/methylpy
Software, algorithm cutadapt v1.9.dev1 DOI: https://doi.org/10.14806/ej.17.1.200 RRID:SCR_011841 https://cutadapt.readthedocs.io/en/stable/
Software, algorithm bowtie 2.2.4 PMID: 22388286 RRID:SCR_005476 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
Software, algorithm Intervene v.0.6.1 PMID: 28569135 https://intervene.readthedocs.io/en/latest/
Software, algorithm Bedtools v2.27.1 PMID: 20110278 RRID:SCR_006646 https://bedtools.readthedocs.io/en/latest/
Software, algorithm HISAT2 v2.0.5 PMID: 25751142 RRID:SCR_015530 https://ccb.jhu.edu/software/hisat2/index.shtml
Software, algorithm StringTie v1.3.3b PMID: 25690850 RRID:SCR_016323 https://ccb.jhu.edu/software/stringtie/#pub
Software, algorithm HOMER 4.10 PMID: 20513432 RRID:SCR_010881 http://homer.ucsd.edu/homer/
Software, algorithm Trimmomatic v0.33 PMID: 24695404 RRID:SCR_011848 http://www.usadellab.org/cms/?page=trimmomatic
Software, algorithm Bowtie v1.1.1 PMID: 19261174 RRID:SCR_005476 http://bowtie-bio.sourceforge.net/index.shtml
Software, algorithm SAMtools v1.2 and v0.1.19 PMID: 19505943 RRID:SCR_00210 http://samtools.sourceforge.net
Software, algorithm R v3.44 other RRID:SCR_001905 https://www.r-project.org

Contact for reagent and resource sharing

Further information and requests for resources and reagents should be directed to and will be fulfilled by Robert J. Schmitz (schmitz@uga.edu).

Experimental models and subject details

Eutrema salsugineum Shandong ecotype was grown on soil at 21°C in long day conditions (16 hr light, 8 hr dark). Plant transgenesis was conducted using the floral dip method (Clough and Bent, 1998). The pEarleyGate 302 vector containing genomic sequence of Arabidopsis thaliana CMT3, including the native promoter, published by Du et al. (2012) was transformed into Agrobacterium tumefaciens strain C58C1. Bacteria were grown for 2 days at 30°C in 200 ml cultures containing gentamicin (25 µg/mL), kanamycin (50 µg/mL), and rifampicin (50 µg/mL) and pelleted by centrifugation at 4°C. Bacterial pellets were resuspended in 5% sucrose and 0.03% Silwet L-77 (Phyto Technology Laboratories) and used to dip open E. salsugineum inflorescences. Transgenic plants were selected for using Finale (BASTA, Bayer).

Whole genome bisulfite sequencing

Whole genome bisulfite sequencing libraries were generated based on methods described in Urich et al. (2015). DNA was extracted from cauline leaves of individual plants flash-frozen in liquid nitrogen using the Qiagen DNeasy Kit according to the manufacturer’s instructions. DNA was fragmented via sonication to a peak size of ~200 bp and further size selected using AMPure beads (Beckman Coulter) to between 150 bp and 500 bp. Fragment end repair was performed using End-It from Lucigen incubated at room temperature for 45 min, followed by purification using AMPure beads. Next, A-tailing was conducted using Klenow Fragment from NEB in NEBNext dA-tailing reaction buffer at 37°C for 30 min, followed by purification with AMPure beads and Illumina indexed adapter (NEXTFLEX Bisulfite-Seq Barcodes) ligation using T4 DNA ligase from NEB. Ligation was conducted at 16°C for 16 hr. Ligation products were purified twice using AMPure beads and bisulfite converted using the EZ DNA Methylation-Gold Kit from Zymogen. Bisulfite converted DNA was then amplified using KAPA HiFi Uracil + and universal primers with the following parameters: 95°C for 2 min, 98°C for 30 s, 8 cycles of 98°C for 15 s, 60°C for 30 s, 72°C for 4 min, and a final extension time of 72°C for 10 min. PCR products were purified using AMPure beads and sequenced with an Illumina NextSeq500 instrument by the Georgia Genomics and Bioinformatics Core. Adapters and primers used were those provided in the NEXTFLEX Bisulfite Sequencing Library Prep Kit (Bioo Scientific).

RNA-sequencing

Total RNA was extracted from cauline leaves of individual plants using the Direct-Zol RNA Mini-prep plus kit from Zymogen according to the manufacturer’s instructions. Sequencing libraries were prepared from 1.3 μg input RNA with the Illumina TruSeq mRNA Stranded Library Kit according to the manufacturer’s instructions, except all volumes were reduced to 1/3 of the recommended quantity. Sequencing was completed using an Illumina NextSeq500 instrument by the Georgia Genomics and Bioinformatics Core.

Chromatin immunoprecipitation and sequencing (ChIP-seq)

ChIP was conducted based on the protocol described in Schubert et al. (2006). Cauline leaves were harvested from individual plants and submerged in cross-linking buffer (10 mM Tris-HCl pH 8, 1 mM EDTA, 0.4 M sucrose, 100 mM PMSF, 1% formaldehyde). Tissues were vacuum infiltrated for 5 min at 85 kPa, followed by release of vacuum and 10 additional minutes at 85 kPa. Crosslinking was quenched with the addition of glycine to a concentration of 100 mM followed by 5 min under vacuum at 85 kPa. Tissues were then washed five times in water, flash frozen in liquid nitrogen, and ground to a fine powder with mortar and pestle. Powder was suspended in 10 mls extraction buffer 1 (0.4 M sucrose, 10 mM Tris-HCl pH 8, 10 mM MgCl2, 5 mM BME, 0,1 mM PMSF, 1 mM EDTA, one tab/10 ml Pierce Protease Inhibitors (ThermoFisher)). The suspension was filtered through 2 layers of Miracloth to enrich nuclei and pelleted by centrifugation for 20 min at 4000 rpm at 4°C. Pellets were resuspended in 1 ml extraction buffer 2 (0.25 M sucrose, 10 mM Tris-HCl pH 8, 10 mM MgCl2, 1% Triton X-100, 1 mM EDTA, 5 mM BME, 0.1 mM PMSF, one tab/10 ml protease inhibitors) and centrifuged for 10 min at 12,000 g at 4°C. Pellets were then suspended in 300 μl extraction buffer 3 (1.7 M sucrose, 10 mM Tris-HCl pH 8, 0.15% Triton X-100, 2 mM MgCl2, 1 mM EDTA, 5 mM BME, 0.1 mM PMSF, one tab/10 ml protease inhibitors) and layered on top of 300 μl extraction buffer 3. Samples were then centrifuged for 1 hr at 16,000 g at 4°C, supernatant was removed, and chromatin pellets were resuspended in 100 μl nuclei lysis buffer (50 mM Tris-HCl pH 8, 10 mM EDTA, 1% SDS, 0.1 mM PMSF, one tab/10 ml protease inhibitors). Chromatin was fragmented via sonication to a fragment size of ~200 base pairs and debris were removed by centrifugation at 16,000 g for 5 min at 4°C. 10 μl of the supernatant was removed for input controls and the remaining supernatant was diluted 1:10 in ChIP dilution buffer (1.1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-HCl pH 8, 167 mM NaCl, 0.1 M PMSF, one tab/10 ml protease inhibitors).

Protein A Dynabeads (Invitrogen) were prepared by washing 25 μl beads three times with 1 ml ChIP dilution buffer (1.1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-HCl pH 8, 167 mM NaCl). Beads were then resuspended in 100 μl ChIP dilution buffer with 5 ug anti-H3K9me2 (Cell Signaling Technology, Cat. # 9753 s) or anti-H3K9me1 (Abcam # 8896) added. Antibodies were bound to beads at 4°C with end over end rotation for 3 hr. Beads were then washed three times with ChIP dilution buffer (1.1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-HCl pH 8, 167 mM NaCl, 0.1 M PMSF, one tab/10 ml protease inhibitors) and resuspended in the diluted chromatin samples from above. Samples were incubated with rotation over night at 4°C.

Following incubation, beads were washed twice with 1 ml each of low salt wash buffer (150 mM NaCl, 0.1% SDS, 1% TritonX-100, 2 mM EDTA, 20 mM Tris-HCl pH 8), high salt wash buffer (500 mM NaCl, 0.1% SDS, 1% TritonX-100, 2 mM EDTA, 20 mM Tris-HCl pH 8), and LiCl wash buffer (0.25 LiCl, 1% NP40, 1% sodium deoxycholate, 1 mM EDTA, 10 mM Tris-HCl pH 8). Beads were then washed one time with 1 ml TE buffer (10 mM Tris-HCl pH 8, 1 mM EDTA) and resuspended in 250 μl elution buffer (1% SDS, 0.1 M NaHCO3). Samples were eluted with incubation at 65°C for 15 min with gentle agitation. The supernatant was removed and saved and the elution was repeated with an additional 250 μl elution buffer. Supernatants were then combined and 20 μl 5 M NaCl was added. 500 μl elution buffer and 20 μl 5 M NaCl were also added to the input controls. Crosslinks were reversed over night at 65°C. Following crosslink reversal, 10 μl of 0.5 M EDTA, 20 μl 1 M Tris-HCl (pH 6.5), and 2 μl of 10 mg/ml proteinase K (ChIP grade, Thermo Fisher Scientific) was added to each sample and incubated at 45°C for 1 hr. DNA was extracted with phenol/chloroform/isoamyl alcohol (25:24:1) and resuspended in water.

ChIP sequencing libraries were prepared by conducting end repair, A-tailing, and adaptor ligation steps identical to those described for bisulfite sequencing library preparation, except that with the substitution of Illumina TruSeq adaptors and indexed primers. Libraries were amplified with Phusion DNA Polymerase (NEB) with the following parameters: 95°C for 2 min, 98°C for 30 s, 15 cycles of 98°C for 15 s, 60°C for 30 s, 72°C for 4 min, and a final extension step of 72°C for 10 min. Libraries were sequenced on an Illumina NextSeq500 instrument by the Georgia Genomics and Bioinformatics Core.

qRT-PCR

RNA was extracted from cauline leaves of individual plants using the Direct-Zol RNA Mini-prep plus kit from Zymogen according to the manufacturer’s instructions. Synthesis of cDNA was completed using SuperScript III with random hexamers (Invitrogen) according to the manufacturer’s instructions. Real time qRT-PCR was conducted using LightCycler 480 SYBR green master mix in a Light Cycler 480 instrument (Roche). Primers used include: AtCMT3 qRT-PCR FP: TGGTTTGAACCTCGTCACTAAA; AtCMT3 qRT-PCR RP: CGTTTGTCTCTGGGTGGTTAT; EsTUB4 qRT-PCR FP: CCTCCATATCCAAGGCGGTC; EsTUB4 qRT-PCR RP: GTACTGGCCGGTGTGATCAA.

Whole genome bisulfite sequencing mapping and analyses

WGBS data were processed using ‘single-end-pipeline’ function from Methylpy as described in Schultz et al. (2015). Briefly, quality-filtering and adapter-trimming were performed using cutadapt v1.9.dev1 (Martin, 2011). Qualified Reads were aligned to the E. salsugineum v1.0 reference genome (Yang et al., 2013) (downloaded from: https://phytozome.jgi.doe.gov) using bowtie 2.2.4 (Langmead and Salzberg, 2012). Only uniquely aligned and non-clonal reads were retained. Chloroplast DNA (which is fully unmethylated) was used as a control to calculate the sodium bisulfite reaction non-conversion rate of unmodified cytosines. A binomial test was used to determine the methylation status of cytosines with a minimum coverage of three reads.

Identification of DMRs (Differential Methylated Regions) was performed using ‘DMRfind’ function from Methylpy pipeline as described in Schultz et al. (2015). Default parameters were adopted and only DMRs with at least 5 DMSs (Differential Methylated Sites) were reported and used for subsequent analysis.

To produce metaplots, 1 kb regions upstream and downstream features of interest were divided into 20 bins each. Features of interest were also divided into total of 20 bins. Weighted methylation levels were computed as the number of methylated reads divided by the total reads for each bin.

To generate the heatmap shown in Figure 2A, weighted percent CHG methylation was calculated for all significant CHG DMRs with a minimum of 5 cytosines with three read coverage each in all lines (listed in Supplementary file 2). DMRs were then ranked by weighted %CHG methylation levels in wild type (vertical axis) and arranged by lineage (horizontal axis). To identify called DMRs reported in Supplementary file 3, all significant DMRs with minimum coverage requirements were filtered by cutoffs of a minimum of 10% change, relative to wild type, to be considered a hypo- or hyper- CHH or CHG DMR, and a minimum of 20% change, relative to WT, to be considered a hypo- or hyper- CG DMR. Overlaps of hyper-CG, CHG, and CHH DMRs and generation of upset plots, shown in Figure 1—figure supplement 3, were calculated using Intervene v.0.6.1 (Khan and Mathelier, 2017). To calculate the overlap of hyper-CHG DMRs between all individuals of the AtCMT3-L1 and AtCMT3-L2 lineages reported in Figure 1—figure supplement 4, coordinates of the called hyper-CHG DMRs reported in Supplementary file 3 were input to the bedtools v2.27.1 fisher command (Quinlan and Hall, 2010). P values reported were calculated from Fisher’s Exact Test with significance set at p<0.0004 based on the Bonferroni correction for multiple testing.

To identify CHG-gain genes, genes were first filtered for those that had less than 1% cytosine methylation in any sequence context in wild-type. Genes were then filtered for coverage and only those with at least 10 informative CHG cytosines (min. five read coverage) in each line were assessed. Coverage corrections were separately done for: 1. wild type and all individuals of the AtCMT3-L1 and AtCMT3-L2 lineages described in Figure 1A; 2. wild type and all other individuals besides AtCMT3-L3T2c and the individuals of the AtCMT3-L1 and AtCMT3-L2 described in Figure 1A; and 3. wild type and AtCMT3-L3T2c. Genes were then called as CHG-gain genes if they showed a minimum of 5% increase in CHG methylation in AtCMT3 expressing lines relative to wild-type. Percent CHG methylation was calculated as the number of methylated reads mapping to CHG sites at the gene of interest divided by the total number of reads mapping to CHG sites. CHG-gain genes are listed in Supplementary file 5. The significance of the overlap of CHG-gain genes between individuals shown in Figure 2—figure supplement 1 was calculated with a hypergeometric test with significance set at p<0.0004 based on the Bonferroni correction for multiple testing.

To generate the heatmap shown in Figure 5A and Figure 5—figure supplement 1A, all CHG DMRs identified in Supplementary file 2 that overlapped the CHG-gain genes AtCMT3-L2T4 were analyzed. Weighted methylation levels of trinucleotide sub-contexts (CNN, N = A/T/C/G) were calculated for each qualified DMR in wild type and AtCMT3-L2, T1-T6 derived from WGBS. The methylation levels of sub-contexts in wild type was used to determine their hierarchical clustering relationships. Corresponding methylation levels in ATCMT3-L2, T1-T6 were plotted in the same order.

To determine the relative number of methylated cytosines over CHG-gain genes in individuals of the AtCMT3-L2 lineage relative to AtCMT3-L2T4 (shown in Figure 5B, Figure 5—figure supplement 1B–C), the analysis was limited to cytosines located within the regions analyzed in Figure 5A that had a minimum of 3 read coverage in each line and were not methylated in wild type. Methylation status was determined with a binomial test. Number of methylated cytosines in each sequence context were reported as a ratio, relative to the number of methylated cytosines in AtCMT3-L2T4. In Figure 5C, the same analysis was performed as described in Figure 5B, except cytosines analyzed were those found in hyper-CHG DMRs defined in AtCMT3-L2T4 that did not overlap genes and were thus annotated as repeats or intergenic regions. In Figure 5—figure supplement 1D, the same analysis was conducted except the regions analyzed were the hyper-CHG DMRs identified in AtCMT3-L1T4 that overlapped AtCMT3-L1T4 CHG-gain genes and the ratios of methylated cytosines are relative to AtCMT3-L1T4. The same analysis was also conducted to produce Figure 5—figure supplement 1E–F, except the regions analyzed were the hyper CHG DMRs identified in AtCMT3-L1T5 that overlapped AtCMT3-L1T5 CHG gain genes (Figure 5—figure supplement 1E) or did not overlap genes (Figure 5—figure supplement 1F) and the ratios of methylated cytosines are relative to AtCMT3-L1T5.

To identify the background levels of CG methylation shown in Figure 5—figure supplement 2, an equal number of genes as the number of CHG-gain genes identified in AtCMT3-L2T4 (for Figure 5—figure supplement 2A) or AtCMT3-L1T5 (for Figure 5—figure supplement 2B) was randomly selected from all genes that were classified as unmethylated in wild type and did not gain CHG methylation in AtCMT3-expressing lineages using the R command sample. Then an equal amount of sequence as analyzed in Figure 5B (for Figure 5—figure supplement 2A) or Figure 5—figure supplement 1E (for Figure 5—figure supplement 2B) was extracted from the randomly chosen genes, with the total number of nucleotides distributed evenly across each gene. All CG cytosines with less than three read coverage in each line assessed and cytosines found to be methylated in wild type (Shandong accession) were removed from the analysis. Coverage filtering was completed individually for the lineages assessed in each panel of Figure 5—figure supplement 2. Methylated CG cytosines were then identified in each lineage using a binomial test and percent CG methylation was calculated as the number of methylated CG cytosines divided by the total number of CG cytosines. This was completed for five randomly chosen sets of unmethylated genes and compared to the identical analysis completed on the CHG gain gene regions assessed in Figure 5B (for Figure 5—figure supplement 2A) or Figure 5—figure supplement 1E (for Figure 5—figure supplement 2B). Bisulfite sequencing data for E. salsugineum Yukon accession were from Bewick et al. (2016).

RNA sequencing mapping and analyses

Quality-filtering and adapter-trimming were performed using Trimmomatic v0.33 with default parameters (Bolger et al., 2014). Qualified reads were aligned to the E. salsugineum v1.0 reference genome using HISAT2 v2.0.5 (Kim et al., 2015). Gene expression (calculated as fragments per kilobase million; FPKM) values were computed using StringTie v1.3.3b (Pertea et al., 2016). To compare expression between wild type and AtCMT3-expressing lines, Log2 fold change values were calculated as Log2 (FPKM AtCMT3 line/FPKM wild-type). Genes with zero FPKM values were removed from expression analyses. A cutoff of ±2 Log2 fold change was used to identify genes undergoing substantial changes in expression.

To conduct Gene Ontology enrichment analyses of up- and down-regulated genes, A. thaliana orthologs of E. salsugineum genes identified in Niederhuth et al. (2016) were utilized to extract GO annotations from TAIR (www.arabidopsis.org). These annotations were then used to identify significantly enriched GO terms with an elim Fisher’s exact test (p-value<0.01) using the R package topGO (https://bioconductor.org/packages/release/bioc/html/topGO.html).

ChIP-sequencing mapping and analyses

Quality-filtering and adapter-trimming were performed using Trimmomatic v0.33 (Bolger et al., 2014) with default parameters. The remaining reads were aligned to the E. salsugineum v1.0 reference genome using Bowtie v1.1.1 (Langmead et al., 2009) with the following parameters: ‘bowtie -m 1 v 2 --best --strata --chunkmbs 1024 S’. Aligned reads were sorted using SAMtools v1.2 and duplicated reads were removed using SAMtools v0.1.19 (Li et al., 2009). ChIP-peaks were defined in wild type relative to input using HOMER 4.10 (Heinz et al., 2010) with the following parameters: ‘-region -tagThreshold 10 -size 1000 -minDist 2500 -tbp 0’. Identified peaks that were directly connected together were merged into a single region. Peaks were then further filtered by read density. The density for each merged region was defined as follows: aligned reads divided by region length. Only merged regions with density greater than 0.05 were outputted as peaks and used for subsequent analysis.

We used these ChIP-peak regions and the coordinates of CHG-gain genes to create metaplots to compare enrichment between samples. In the metaplots, mapped reads were normalized to total mapped reads for each locus of interest, and were averaged over 4 bins representing 2 kb upstream, 2 kb from the transcription start site into the gene, 2 kb from the transcription stop site into the gene, and 2 kb downstream. Finally, the average bin values were normalized to account for the number of loci.

qRT-PCR analyses

Relative expression of the AtCMT3 transgene to TUB4 (Thhalv10003210m) was calculated using the double delta threshold cycle (Ct) method as 2 ^ -((Average AtCMT3 Ct) – (Average TUB4 Ct)) (Livak and Schmittgen, 2001). Average Ct values were calculated from three technical replicates.

Characterization CHG-gain genes

Gene length and exon number for CHG-gain genes and UM genes were derived from E. salsugineum reference annotation files. CHG site number was calculated by scanning the whole gene sequence with a three base window and step size of 1 base. Both positive strand and negative strand were considered. The CHG frequency was calculated by normalizing CHG site number to gene length. Significance for differences in characteristics between CHG-gain and UM genes was calculated with a Wilcoxon rank-sum in the stats package of R v3.44.

Orthologs of CHG-gain genes in A. thaliana and their gbM status reported in Supplementary file 9 were those identified in Niederhuth et al. (2016). To determine if gbM-gain genes had more A. thaliana orthologs that were classified as gbM than expected by chance, a hypergeometric test was conducted using the values of 20,211 total A. thaliana genes with an E. salsugineum ortholog, with 4532 of those being classified as gbM in A. thaliana by Niederhuth et al. (2016).

Data availability

All sequencing data generated have been deposited into NCBI Gene Expression Omnibus under accession number: GSE128687.

Acknowledgements

The authors would like to thank Javier Gallego-Bartolomé and Steve Jacobsen for providing the pEG302 AtCMT3 construct and Karen Schumaker for providing E. salsugineum seeds. RJS and FJ acknowledge support from the Technical University of Munich-Institute for Advanced Study funded by the German Excellence Initiative and the European Seventh Framework Programme under grant agreement no. 291763. RJS is supported by NSF MCB-1856143 and is a Pew Scholar in the Biomedical Sciences, supported by The Pew Charitable Trusts. JMW is supported by an NSF NPGI Postdoctoral Fellowship (IOS-1811694).

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Robert J Schmitz, Email: schmitz@uga.edu.

David Baulcombe, University of Cambridge, United Kingdom.

Detlef Weigel, Max Planck Institute for Developmental Biology, Germany.

Funding Information

This paper was supported by the following grants:

  • Pew Charitable Trusts Pew Scholar in the Biomedical Sciences to Robert J Schmitz.

  • National Science Foundation NSF NPGI Postdoctoral Fellowship IOS-1811694 to Jered M Wendte.

  • German Excellence Initiative and the European Seventh Framework Programme Grant agreement no. 291763 to Frank Johannes, Robert J Schmitz.

  • National Science Foundation NSF MCB-1856143 to Robert J Schmitz.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing.

Data curation, Software, Formal analysis, Visualization, Methodology, Writing—review and editing.

Data curation, Software, Formal analysis, Visualization, Methodology, Writing—review and editing.

Investigation, Methodology, Writing—review and editing.

Conceptualization, Software, Formal analysis, Methodology, Writing—review and editing.

Conceptualization, Software, Formal analysis, Methodology, Writing—review and editing.

Conceptualization, Software, Formal analysis, Supervision, Funding acquisition, Methodology, Writing—review and editing.

Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Methodology, Project administration, Writing—review and editing.

Additional files

Supplementary file 1. Sequencing statistics for all next generation sequencing data analyzed in this study.
elife-47891-supp1.xlsx (16KB, xlsx)
DOI: 10.7554/eLife.47891.020
Supplementary file 2. Differentially methylated regions identified by methylpy.
elife-47891-supp2.xlsx (8.4MB, xlsx)
DOI: 10.7554/eLife.47891.021
Supplementary file 3. Called hyper- and hypo-DMRs in each lineage.
elife-47891-supp3.xlsx (5.7MB, xlsx)
DOI: 10.7554/eLife.47891.022
Supplementary file 4. FPKM values for all genes determined by RNA-seq.
elife-47891-supp4.xlsx (2.7MB, xlsx)
DOI: 10.7554/eLife.47891.023
Supplementary file 5. Lists of genes that gained a minimum of 5% CHG methylation in each line.
elife-47891-supp5.xlsx (589.9KB, xlsx)
DOI: 10.7554/eLife.47891.024
Supplementary file 6. Genes with greater than (+/-) two log2 fold change in expression identified in each line.
elife-47891-supp6.xlsx (59.9KB, xlsx)
DOI: 10.7554/eLife.47891.025
Supplementary file 7. P-values for Fisher's Exact tests of enrichment of CHG-gain genes in up- or down-regulated genes.
elife-47891-supp7.xlsx (9.5KB, xlsx)
DOI: 10.7554/eLife.47891.026
Supplementary file 8. Gene Ontology analysis for biologic processes enriched in up- or down-regulated genes.
elife-47891-supp8.xlsx (27.6KB, xlsx)
DOI: 10.7554/eLife.47891.027
Supplementary file 9. List of CHG-gain genes with closest A. thaliana ortholog and A. thaliana gbM status.
elife-47891-supp9.xlsx (108.9KB, xlsx)
DOI: 10.7554/eLife.47891.028
Supplementary file 10. List of E. salsugineum IBM1-like genes and expression status.
DOI: 10.7554/eLife.47891.029
Transparent reporting form
DOI: 10.7554/eLife.47891.030

Data availability

All sequencing data generated have been deposited into NCBI Gene Expression Omnibus under accession number: GSE128687.

The following dataset was generated:

Wendte JM, Zhang Y, Ji L, Shi X, Hazarika RR, Shahrary Y, Johannes F, Schmitz RJ. 2019. Epimutations are associated with CHROMOMETHYLASE 3-induced de novo DNA methylation. NCBI Gene Expression Omnibus. GSE128687

The following previously published dataset was used:

Bewick AJ, Ji L, Niederhuth CE, Willing EM, Hofmeister BT, Shi X, Wang L, Lu Z, Rohr NA, Hartwig B, Kiefer C, Deal RB, Schmutz J, Grimwood J, Stroud H, Jacobsen SE, Schneeberger K, Zhang X, Schmitz RJ. 2016. On the origin and evolutionary consequences of gene body DNA methylation. NCBI Gene Expression Omnibus. GSE75071

References

  1. Arias T, Beilstein MA, Tang M, McKain MR, Pires JC. Diversification times among Brassica (Brassicaceae) crops suggest hybrid formation after 20 million years of divergence. American Journal of Botany. 2014;101:86–91. doi: 10.3732/ajb.1300312. [DOI] [PubMed] [Google Scholar]
  2. Bartee L, Malagnac F, Bender J. Arabidopsis cmt3 chromomethylase mutations block non-CG methylation and silencing of an endogenous gene. Genes & Development. 2001;15:1753–1758. doi: 10.1101/gad.905701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bernatavichute YV, Zhang X, Cokus S, Pellegrini M, Jacobsen SE. Genome-wide association of histone H3 lysine nine methylation with CHG DNA methylation in Arabidopsis thaliana. PLOS ONE. 2008;3:e3156. doi: 10.1371/journal.pone.0003156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bewick AJ, Ji L, Niederhuth CE, Willing EM, Hofmeister BT, Shi X, Wang L, Lu Z, Rohr NA, Hartwig B, Kiefer C, Deal RB, Schmutz J, Grimwood J, Stroud H, Jacobsen SE, Schneeberger K, Zhang X, Schmitz RJ. On the origin and evolutionary consequences of gene body DNA methylation. PNAS. 2016;113:9111–9116. doi: 10.1073/pnas.1604666113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bewick AJ, Niederhuth CE, Ji L, Rohr NA, Griffin PT, Leebens-Mack J, Schmitz RJ. The evolution of CHROMOMETHYLASES and gene body DNA methylation in plants. Genome Biology. 2017;18:65. doi: 10.1186/s13059-017-1195-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bewick AJ, Zhang Y, Wendte JM, Zhang X, Schmitz RJ. Evolutionary and experimental loss of gene body methylation and its consequence to gene expression. G3: Genes|Genomes|Genetics. 2019 doi: 10.1534/g3.119.400365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bewick AJ, Schmitz RJ. Gene body DNA methylation in plants. Current Opinion in Plant Biology. 2017;36:103–110. doi: 10.1016/j.pbi.2016.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Choi J, Lyons DB, Kim Y, Moore JD, Zilberman D. DNA methylation and histone H1 cooperatively repress transposable elements and aberrant intragenic transcripts. bioRxiv. 2019 doi: 10.1101/527523. [DOI] [PubMed]
  10. Clough SJ, Bent AF. Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. The Plant Journal. 1998;16:735–743. doi: 10.1046/j.1365-313x.1998.00343.x. [DOI] [PubMed] [Google Scholar]
  11. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, Pradhan S, Nelson SF, Pellegrini M, Jacobsen SE. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature. 2008;452:215–219. doi: 10.1038/nature06745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cubas P, Vincent C, Coen E. An epigenetic mutation responsible for natural variation in floral symmetry. Nature. 1999;401:157–161. doi: 10.1038/43657. [DOI] [PubMed] [Google Scholar]
  13. Du J, Zhong X, Bernatavichute YV, Stroud H, Feng S, Caro E, Vashisht AA, Terragni J, Chin HG, Tu A, Hetzel J, Wohlschlegel JA, Pradhan S, Patel DJ, Jacobsen SE. Dual binding of chromomethylase domains to H3K9me2-containing nucleosomes directs DNA methylation in plants. Cell. 2012;151:167–180. doi: 10.1016/j.cell.2012.07.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Du J, Johnson LM, Groth M, Feng S, Hale CJ, Li S, Vashisht AA, Wohlschlegel JA, Patel DJ, Jacobsen SE. Mechanism of DNA methylation-directed histone methylation by KRYPTONITE. Molecular Cell. 2014;55:495–504. doi: 10.1016/j.molcel.2014.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Du J, Johnson LM, Jacobsen SE, Patel DJ. DNA methylation pathways and their crosstalk with histone methylation. Nature Reviews Molecular Cell Biology. 2015;16:519–532. doi: 10.1038/nrm4043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Feng S, Cokus SJ, Zhang X, Chen PY, Bostick M, Goll MG, Hetzel J, Jain J, Strauss SH, Halpern ME, Ukomadu C, Sadler KC, Pradhan S, Pellegrini M, Jacobsen SE. Conservation and divergence of methylation patterning in plants and animals. PNAS. 2010;107:8689–8694. doi: 10.1073/pnas.1002720107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Finnegan EJ, Peacock WJ, Dennis ES. Reduced DNA methylation in Arabidopsis thaliana results in abnormal plant development. PNAS. 1996;93:8449–8454. doi: 10.1073/pnas.93.16.8449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gouil Q, Baulcombe DC. DNA methylation signatures of the plant chromomethyltransferases. PLOS Genetics. 2016;12:e1006526. doi: 10.1371/journal.pgen.1006526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Haag JR, Pikaard CS. Multisubunit RNA polymerases IV and V: purveyors of non-coding RNA for plant gene silencing. Nature Reviews Molecular Cell Biology. 2011;12:483–492. doi: 10.1038/nrm3152. [DOI] [PubMed] [Google Scholar]
  20. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Molecular Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Huff JT, Zilberman D. Dnmt1-independent CG methylation contributes to nucleosome positioning in diverse eukaryotes. Cell. 2014;156:1286–1297. doi: 10.1016/j.cell.2014.01.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Inagaki S, Miura-Kamio A, Nakamura Y, Lu F, Cui X, Cao X, Kimura H, Saze H, Kakutani T. Autocatalytic differentiation of epigenetic modifications within the Arabidopsis genome. The EMBO Journal. 2010;29:3496–3506. doi: 10.1038/emboj.2010.227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Inagaki S, Kakutani T. What triggers differential DNA methylation of genes and TEs: contribution of body methylation? Cold Spring Harbor Symposia on Quantitative Biology. 2012;77:155–160. doi: 10.1101/sqb.2013.77.016212. [DOI] [PubMed] [Google Scholar]
  24. Jackson JP, Lindroth AM, Cao X, Jacobsen SE. Control of CpNpG DNA methylation by the KRYPTONITE histone H3 methyltransferase. Nature. 2002;416:556–560. doi: 10.1038/nature731. [DOI] [PubMed] [Google Scholar]
  25. Jackson JP, Johnson L, Jasencakova Z, Zhang X, PerezBurgos L, Singh PB, Cheng X, Schubert I, Jenuwein T, Jacobsen SE. Dimethylation of histone H3 lysine 9 is a critical mark for DNA methylation and gene silencing in Arabidopsis thaliana. Chromosoma. 2004;112:308–315. doi: 10.1007/s00412-004-0275-7. [DOI] [PubMed] [Google Scholar]
  26. Johnson LM, Bostick M, Zhang X, Kraft E, Henderson I, Callis J, Jacobsen SE. The SRA methyl-cytosine-binding domain links DNA and histone methylation. Current Biology. 2007;17:379–384. doi: 10.1016/j.cub.2007.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Khan A, Mathelier A. Intervene: a tool for intersection and visualization of multiple gene or genomic region sets. BMC Bioinformatics. 2017;18:287. doi: 10.1186/s12859-017-1708-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nature Methods. 2015;12:357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup The sequence alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Li X, Harris CJ, Zhong Z, Chen W, Liu R, Jia B, Wang Z, Li S, Jacobsen SE, Du J. Mechanistic insights into plant SUVH family H3K9 methyltransferases and their binding to context-biased non-CG DNA methylation. PNAS. 2018;115:E8793–E8802. doi: 10.1073/pnas.1809841115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Lindroth AM, Cao X, Jackson JP, Zilberman D, McCallum CM, Henikoff S, Jacobsen SE. Requirement of CHROMOMETHYLASE3 for maintenance of CpXpG methylation. Science. 2001;292:2077–2080. doi: 10.1126/science.1059745. [DOI] [PubMed] [Google Scholar]
  34. Lister R, O'Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell. 2008;133:523–536. doi: 10.1016/j.cell.2008.03.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta delta C(T)) Method. Methods. 2001;25:402–408. doi: 10.1006/meth.2001.1262. [DOI] [PubMed] [Google Scholar]
  36. Lorincz MC, Dickerson DR, Schmitt M, Groudine M. Intragenic DNA methylation alters chromatin structure and elongation efficiency in mammalian cells. Nature Structural & Molecular Biology. 2004;11:1068–1075. doi: 10.1038/nsmb840. [DOI] [PubMed] [Google Scholar]
  37. Malagnac F, Bartee L, Bender J. An Arabidopsis SET domain protein required for maintenance but not establishment of DNA methylation. The EMBO Journal. 2002;21:6842–6852. doi: 10.1093/emboj/cdf687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Manning K, Tör M, Poole M, Hong Y, Thompson AJ, King GJ, Giovannoni JJ, Seymour GB. A naturally occurring epigenetic mutation in a gene encoding an SBP-box transcription factor inhibits tomato fruit ripening. Nature Genetics. 2006;38:948–952. doi: 10.1038/ng1841. [DOI] [PubMed] [Google Scholar]
  39. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17:10. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
  40. Mathieu O, Probst AV, Paszkowski J. Distinct regulation of histone H3 methylation at lysines 27 and 9 by CpG methylation in Arabidopsis. The EMBO Journal. 2005;24:2783–2791. doi: 10.1038/sj.emboj.7600743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Matzke MA, Mosher RA. RNA-directed DNA methylation: an epigenetic pathway of increasing complexity. Nature Reviews Genetics. 2014;15:394–408. doi: 10.1038/nrg3683. [DOI] [PubMed] [Google Scholar]
  42. Maunakea AK, Nagarajan RP, Bilenky M, Ballinger TJ, D'Souza C, Fouse SD, Johnson BE, Hong C, Nielsen C, Zhao Y, Turecki G, Delaney A, Varhol R, Thiessen N, Shchors K, Heine VM, Rowitch DH, Xing X, Fiore C, Schillebeeckx M, Jones SJ, Haussler D, Marra MA, Hirst M, Wang T, Costello JF. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature. 2010;466:253–257. doi: 10.1038/nature09165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Niederhuth CE, Bewick AJ, Ji L, Alabady MS, Kim KD, Li Q, Rohr NA, Rambani A, Burke JM, Udall JA, Egesi C, Schmutz J, Grimwood J, Jackson SA, Springer NM, Schmitz RJ. Widespread natural variation of DNA methylation within angiosperms. Genome Biology. 2016;17:194. doi: 10.1186/s13059-016-1059-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Papa CM, Springer NM, Muszynski MG, Meeley R, Kaeppler SM. Maize chromomethylase Zea methyltransferase2 is required for CpNpG methylation. The Plant Cell. 2001;13:1919–1928. doi: 10.1105/tpc.010064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and ballgown. Nature Protocols. 2016;11:1650–1667. doi: 10.1038/nprot.2016.095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Regulski M, Lu Z, Kendall J, Donoghue MT, Reinders J, Llaca V, Deschamps S, Smith A, Levy D, McCombie WR, Tingey S, Rafalski A, Hicks J, Ware D, Martienssen RA. The maize methylome influences mRNA splice sites and reveals widespread paramutation-like switches guided by small RNA. Genome Research. 2013;23:1651–1662. doi: 10.1101/gr.153510.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Reinders J, Wulff BB, Mirouze M, Marí-Ordóñez A, Dapp M, Rozhon W, Bucher E, Theiler G, Paszkowski J. Compromised stability of DNA methylation and transposon immobilization in mosaic Arabidopsis epigenomes. Genes & Development. 2009;23:939–950. doi: 10.1101/gad.524609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Ronemus MJ, Galbiati M, Ticknor C, Chen J, Dellaporta SL. Demethylation-induced developmental pleiotropy in Arabidopsis. Science. 1996;273:654–657. doi: 10.1126/science.273.5275.654. [DOI] [PubMed] [Google Scholar]
  50. Saze H, Shiraishi A, Miura A, Kakutani T. Control of genic DNA methylation by a jmjC domain-containing protein in Arabidopsis thaliana. Science. 2008;319:462–465. doi: 10.1126/science.1150987. [DOI] [PubMed] [Google Scholar]
  51. Schubert D, Primavesi L, Bishopp A, Roberts G, Doonan J, Jenuwein T, Goodrich J. Silencing by plant Polycomb-group genes requires dispersed trimethylation of histone H3 at lysine 27. The EMBO Journal. 2006;25:4638–4649. doi: 10.1038/sj.emboj.7601311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Schultz MD, He Y, Whitaker JW, Hariharan M, Mukamel EA, Leung D, Rajagopal N, Nery JR, Urich MA, Chen H, Lin S, Lin Y, Jung I, Schmitt AD, Selvaraj S, Ren B, Sejnowski TJ, Wang W, Ecker JR. Human body epigenome maps reveal noncanonical DNA methylation variation. Nature. 2015;523:212–216. doi: 10.1038/nature14465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Seymour DK, Koenig D, Hagmann J, Becker C, Weigel D. Evolution of DNA methylation patterns in the Brassicaceae is driven by differences in genome organization. PLOS Genetics. 2014;10:e1004785. doi: 10.1371/journal.pgen.1004785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Soppe WJ, Jasencakova Z, Houben A, Kakutani T, Meister A, Huang MS, Jacobsen SE, Schubert I, Fransz PF. DNA methylation controls histone H3 lysine 9 methylation and heterochromatin assembly in Arabidopsis. The EMBO Journal. 2002;21:6549–6559. doi: 10.1093/emboj/cdf657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Stoddard CI, Feng S, Campbell MG, Liu W, Wang H, Zhong X, Bernatavichute Y, Cheng Y, Jacobsen SE, Narlikar GJ. A nucleosome bridging mechanism for activation of a maintenance DNA methyltransferase. Molecular Cell. 2019;73:73–83. doi: 10.1016/j.molcel.2018.10.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Stroud H, Greenberg MV, Feng S, Bernatavichute YV, Jacobsen SE. Comprehensive analysis of silencing mutants reveals complex regulation of the Arabidopsis methylome. Cell. 2013;152:352–364. doi: 10.1016/j.cell.2012.10.054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Stroud H, Do T, Du J, Zhong X, Feng S, Johnson L, Patel DJ, Jacobsen SE. Non-CG methylation patterns shape the epigenetic landscape in Arabidopsis. Nature Structural & Molecular Biology. 2014;21:64–72. doi: 10.1038/nsmb.2735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Takuno S, Ran JH, Gaut BS. Evolutionary patterns of genic DNA methylation vary across land plants. Nature Plants. 2016;2:15222. doi: 10.1038/nplants.2015.222. [DOI] [PubMed] [Google Scholar]
  59. Takuno S, Gaut BS. Body-methylated genes in Arabidopsis thaliana are functionally important and evolve slowly. Molecular Biology and Evolution. 2012;29:219–227. doi: 10.1093/molbev/msr188. [DOI] [PubMed] [Google Scholar]
  60. Takuno S, Gaut BS. Gene body methylation is conserved between plant orthologs and is of evolutionary consequence. PNAS. 2013;110:1797–1802. doi: 10.1073/pnas.1215380110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Tariq M, Saze H, Probst AV, Lichota J, Habu Y, Paszkowski J. Erasure of CpG methylation in Arabidopsis alters patterns of histone H3 methylation in heterochromatin. PNAS. 2003;100:8823–8827. doi: 10.1073/pnas.1432939100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Teixeira FK, Colot V. Gene body DNA methylation in plants: a means to an end or an end to a means? The EMBO Journal. 2009;28:997–998. doi: 10.1038/emboj.2009.87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Tran RK, Henikoff JG, Zilberman D, Ditt RF, Jacobsen SE, Henikoff S. DNA methylation profiling identifies CG methylation clusters in Arabidopsis genes. Current Biology. 2005;15:154–159. doi: 10.1016/j.cub.2005.01.008. [DOI] [PubMed] [Google Scholar]
  64. Urich MA, Nery JR, Lister R, Schmitz RJ, Ecker JR. MethylC-seq library preparation for base-resolution whole-genome bisulfite sequencing. Nature Protocols. 2015;10:475–483. doi: 10.1038/nprot.2014.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Vial-Pradel S, Keta S, Nomoto M, Luo L, Takahashi H, Suzuki M, Yokoyama Y, Sasabe M, Kojima S, Tada Y, Machida Y, Machida C. Arabidopsis Zinc-Finger-Like protein ASYMMETRIC LEAVES2 (AS2) and two nucleolar proteins maintain gene body DNA methylation in the leaf polarity gene ETTIN (ARF3) Plant and Cell Physiology. 2018;59:1385–1397. doi: 10.1093/pcp/pcy031. [DOI] [PubMed] [Google Scholar]
  66. Wang H, Beyene G, Zhai J, Feng S, Fahlgren N, Taylor NJ, Bart R, Carrington JC, Jacobsen SE, Ausin I. CG gene body DNA methylation changes and evolution of duplicated genes in cassava. PNAS. 2015;112:13729–13734. doi: 10.1073/pnas.1519067112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Wendte JM, Pikaard CS. The RNAs of RNA-directed DNA methylation. Biochimica Et Biophysica Acta (BBA) - Gene Regulatory Mechanisms. 2017;1860:140–148. doi: 10.1016/j.bbagrm.2016.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Wendte JM, Schmitz RJ. Specifications of Targeting Heterochromatin Modifications in Plants. Molecular Plant. 2018;11:381–387. doi: 10.1016/j.molp.2017.10.002. [DOI] [PubMed] [Google Scholar]
  69. Woo HR, Pontes O, Pikaard CS, Richards EJ. VIM1, a methylcytosine-binding protein required for centromeric heterochromatinization. Genes & Development. 2007;21:267–277. doi: 10.1101/gad.1512007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Woo HR, Dittmer TA, Richards EJ. Three SRA-domain methylcytosine-binding proteins cooperate to maintain global CpG methylation and epigenetic silencing in Arabidopsis. PLOS Genetics. 2008;4:e1000156. doi: 10.1371/journal.pgen.1000156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Yang R, Jarvis DE, Chen H, Beilstein MA, Grimwood J, Jenkins J, Shu S, Prochnik S, Xin M, Ma C, Schmutz J, Wing RA, Mitchell-Olds T, Schumaker KS, Wang X. The reference genome of the halophytic plant Eutrema salsugineum. Frontiers in Plant Science. 2013;4:46. doi: 10.3389/fpls.2013.00046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Zemach A, McDaniel IE, Silva P, Zilberman D. Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science. 2010;328:916–919. doi: 10.1126/science.1186366. [DOI] [PubMed] [Google Scholar]
  73. Zemach A, Kim MY, Hsieh PH, Coleman-Derr D, Eshed-Williams L, Thao K, Harmer SL, Zilberman D. The Arabidopsis nucleosome remodeler DDM1 allows DNA methyltransferases to access H1-containing heterochromatin. Cell. 2013;153:193–205. doi: 10.1016/j.cell.2013.02.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Zhang X, Yazaki J, Sundaresan A, Cokus S, Chan SW, Chen H, Henderson IR, Shinn P, Pellegrini M, Jacobsen SE, Ecker JR. Genome-wide high-resolution mapping and functional analysis of DNA methylation in Arabidopsis. Cell. 2006;126:1189–1201. doi: 10.1016/j.cell.2006.08.003. [DOI] [PubMed] [Google Scholar]
  75. Zhou M, Law JA. RNA pol IV and V in gene silencing: rebel polymerases evolving away from pol II's rules. Current Opinion in Plant Biology. 2015;27:154–164. doi: 10.1016/j.pbi.2015.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Zilberman D, Gehring M, Tran RK, Ballinger T, Henikoff S. Genome-wide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence between methylation and transcription. Nature Genetics. 2007;39:61–69. doi: 10.1038/ng1929. [DOI] [PubMed] [Google Scholar]
  77. Zilberman D, Coleman-Derr D, Ballinger T, Henikoff S. Histone H2A.Z and DNA methylation are mutually antagonistic chromatin marks. Nature. 2008;456:125–129. doi: 10.1038/nature07324. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: David Baulcombe1

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "A positive feedback loop that establishes heterochromatin predisposes transcribed genes to stable epimutations" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Detlef Weigel as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This manuscript aims at understanding the mechanism of CG gene body methylation (gbM) by expressing the Arabidopsis CMT3 gene in Eutrema salsugineum, a species that is lacking a CMT3 ortholog and is devoid of gbM. The authors generated several transgenic lines in Eutrema expressing the Arabidopsis CMT3 gene, and followed the establishment of CHG methylation in two independent lines over six generations. They found de novo CHG methylation on repeat sequences, intergenic sequences and on some genes that share common features with genes targeted by gbM in Arabidopsis thaliana. Interestingly, these genes were not marked by H3K9me2, a modification that is usually associated with CMT3 activity. Finally, it is shown that CMT3 silencing led to a fast decrease of CHG and CHH methylation and to a slow decrease of CG methylation. It is proposed that transient deposition of H3K9me2 on genes may recruit CMT3, which in turn methylates DNA in CHG context, leading to CG methylation by an unknown mechanism.

Essential revisions:

1) The main shortcoming of the manuscript is the failure to point to a mechanism through which CHG methylation mediates CG methylation. The main data to support the assumption that CHG methylation leads to gbM are those in Figure 6, showing that loss of CMT3 expression (after transgene silencing) causes a slower decline of CG methylation compared to CHG and CHH methylation. The authors made use of the fact that CMT3 became silenced in the fifth generation in the AtCMT3-L2 line to test the effect of CMT3 loss on DNA methylation. Since it is unclear if the silencing is triggered by the transgene itself or by other factors, and how quickly silencing occurred, the results in Figure 6 are not completely conclusive. Moreover, the data are based on a single line and one individual per generation.

To address this point we request the analysis of at least one other non-silenced transgenic line, as well as the analysis of transgene-free progeny instead of the apparently silenced line.

2) Plants expressing CMT3 show body CHG methylation, but not body H3K9me2. The authors seem to interpret this observation as H3K9me2 recruiting CMT3 but H3K9me2 being removed by IBM1. An alternative interpretation of the results could be that H3K9me1, rather than H3K9me2, recruits CMT3. CMT3 binds not only to H3K9me2 but also to H3K9me1, even though CMT2 does not bind to H3K9me1 (Stroud et al., 2014, Figure 2D). Therefore, it seems possible that preexisting H3K9me1 guides CMT3, but not CMT2, to introduce CHG methylation in gene bodies.

To address this point we request the comparison of H3K9me1 in transgenic and non-transgenic control plants.

3) In Figure 5A, the authors compared CHG methylation gain and expression change in each gene in the T4 plant with body CHG methylation. Their interpretation of the results is that "genic CHG methylation in AtCMT3-expressing lines is uncoupled from heterochromatin formation and transcriptional silencing, similar to gbM." A possible complication can be that transcriptome should reflect both primary effects of the body CHG methylation and indirect effects from those primary effects. The indirect effects would not correlate with CHG methylation.

Furthermore, the proportion of CHG methylated genes in downregulated genes (X<-2) seems significantly higher than those in upregulated genes (2<X) and control genes (-2<X<2). The results look consistent with transcriptome analyses in ibm1 mutants in Arabidopsis (Inagaki et al., 2017). According to that literature, genes downregulated in ibm1 mutants have significant levels of CHG methylation, while upregulated genes do not. In addition, GO analyses of upregulated genes suggest their link to immune responses. The interpretation by Inagaki et al. was that CHG methylation induces downregulation for a subset of genes, and that the upregulation of many genes reflects indirect effects, likely due to primary changes in expression of some key factor involved in immune responses.

As the results in Figure 5A seem consistent with the transcriptome of ibm1 mutants (downregulation as primary effects and upregulation as indirect effects), we suggest GO analysis of upregulated genes, and statistical test for the overrepresentation of CHG gain in downregulated genes. If GO analysis of upregulated genes show some tendency, even if not immune response, that would suggest indirect effects involved.

More generally, the question is if body CHG methylation affects gene expression. That could be clarified by examining other plants without the gain of CHG methylation, such as other transgenic lines or other generations of plants in the same line, as controls (in addition to the non-transgenic wild-type plants). Especially interesting controls might be T6 plants of the line that lost body CHG methylation while keeping body CG methylation, because that might disentangle effects of body CG and CHG methylation.

Other points:

Please address the following additional points raised by the three reviewers as much as possible.

Reviewer #1:

4) The manuscript suggests that an increase of CMT3 expression leads to an increase of CHG methylation (subsection “Expression of AtCMT3 in E. salsugineum results in increased CHG methylation”, third paragraph). It is surprising why the authors did not use the AtCMT3-L3 for their analysis, as CMT3 is higher expressed in this line than in AtCMT3-L1 (Supplementary file 4). If the model is correct, AtCMT3-L3 should exhibit a higher methylation level than lines -L1 or -L2. As the authors have generated the methylome of this line (Figure 1—figure supplement 1), they should analyze gbM in this line. In fact, it would be advisable to include also the data for the other lines, since based on Figure 1—figure supplement 1 the authors produced methylome data for all lines until the second generation.

5) Figure 4: The choice of the generations used in the analysis requires justification; why analyzing generations 1, 2, 4, 5, 8, 11 for Col-0 and generations 5, 13 for suv4/5/6 ? At least they should include the 13th Col-0 generation in the analysis, since the DNA methylation pattern can be significantly affected after few generations (Figure 1, AtCMT3-L2). It is also advisable to include additional explanations to this paragraph to allow the reader to follow what has been analyzed.

6) It is unclear how the RNA seq data were analyzed (subsection “RNA sequencing mapping and analyses”).

7) For the ChIP-seq analysis, input seems to have been retrieved (subsection “Chromatin immunoprecipitation and sequencing (ChIP-seq)”) but apparently was not used for the normalization, neither have H3 data been used to normalize (subsection “ChIP-sequencing mapping and analyses”). The analysis has to be described in more detail or repeated.

Reviewer #2:

8) In regard to the ways of presentation of the ChIP-seq results, they showed metaplot and browser view for H3K9me2. It might also be informative to see scatter plot, comparing H3K9me2 level between the T4 and non-transgenic plants, to see if the signal increases in a subset of genes or TE genes.

9) Based on results in Figure 4, authors discussed that gain of CG methylation rate was significantly lower in Arabidopsis suvh4/5/6 mutant than in WT, but loss of CG methylation was not significantly different. The background statistics was not clear to me.

Reviewer #3:

10) The manuscript often refers to "targeting" of CMT3 to genes, which to me implies an active process. But the authors reach the conclusion (Discussion, last paragraph) that gbm is likely a passive byproduct of having a functional CMT3 enzyme. It's not that CMT3 is specifically targeted to genes, but that CMT3 acts in non-heterchromatic regions with some low frequency. It is suggested that the authors reconsider their use of the word targeting.

11) One of the results that most strongly supports the authors' model is that CG methylation is retained (and CHG and CHH methylation is lost) in CHGhyper genes after CMT3 is silenced in line L2 (Figure 6A and 6B). This led me to wonder why Eutrema lacks all gbm – presumably the loss of CMT3 occurred relatively recently in its evolutionarily history since its closest relatives retain CMT3 and gbm. Shouldn't some CG methylation still be present? Can the authors date the loss of CMT3 in Eutrema and does the total absence of gbm in this species fit with the timing of that loss and what we know about rates of mCG gain and loss in gene bodies over time?

12) Based on analysis of repeat methylation, the authors suggest that AtCMT3 preferentially targets heterochromatin over genes. There are alternative interpretations for these data. DNA methylation was reduced in all contexts in L2T6 in heterochromatin (Figure 6C), although it remained higher than in genes (Figure 6B). But the remaining methylation could be due to other maintenance and de novo pathways also being more active in those regions (MET1, RdDM, CMT2), rather than any residual CMT3 activity preferentially being directed to heterochromatin.

13) The portion of the manuscript about epimutations rates in suvh4/5/6 in Arabidopsis was a distraction from the main message. The effects on CG methylation gain, while statistically significant, do not appear particularly strong. I recommend removing this section from the paper.

14) The right panel of Figure 1—figure supplement 5 does not support the conclusion that higher levels of CMT3 expression are correlated with increased global CHG methylation. The R2 value of 0.59 is driven by two points and many samples have high CHG methylation but relatively low CMT3 expression. This doesn't seem like a key conclusion of the paper, and the authors should be more cautious in their interpretation.

eLife. 2019 Jul 29;8:e47891. doi: 10.7554/eLife.47891.037

Author response


Essential revisions:

1) The main shortcoming of the manuscript is the failure to point to a mechanism through which CHG methylation mediates CG methylation. The main data to support the assumption that CHG methylation leads to gbM are those in Figure 6, showing that loss of CMT3 expression (after transgene silencing) causes a slower decline of CG methylation compared to CHG and CHH methylation. The authors made use of the fact that CMT3 became silenced in the fifth generation in the AtCMT3-L2 line to test the effect of CMT3 loss on DNA methylation. Since it is unclear if the silencing is triggered by the transgene itself or by other factors, and how quickly silencing occurred, the results in Figure 6 are not completely conclusive. Moreover, the data are based on a single line and one individual per generation.

To address this point we request the analysis of at least one other non-silenced transgenic line, as well as the analysis of transgene-free progeny instead of the apparently silenced line.

These are excellent suggestions and we have added the new data as the reviewers have requested. We conducted the same analysis we completed for AtCMT3-L2 in AtCMT3-L1, which did not demonstrate silencing of the transgene. In this case, the relative methylation levels over genes in all contexts (CG, CHG, and CHH) remained similar over generational time. This result is included in new Figure 5—figure supplement 1D.

We also completed an alternative approach where we crossed the AtCMT3-L1T5 transgene expressing line to wild type (non-transgenic) to segregate out the transgene. We analyzed two F2 progeny where the transgene was segregated out and one F2 progeny that still contained the transgene. In the F2 progeny that did not encode the transgene, the ectopic genic CHG and CHH methylation that was present in the in the transgenic parent was lost but the genic CG methylation was maintained, similar to the line where the transgene was silenced. In contrast, the F2 progeny that still encoded the transgene maintained similar levels of genic methylation in all contexts as the transgenic parent. This result is included in the new Figure 5—figure supplement 1E. We also verified that the levels of newly methylated genic CG sites in these lines, as well as the lines where AtCMT3 was silenced (AtCMT3-L2T5-T6), were higher than background (due to bisulfite non-conversion or epimutations that occur independently of CMT3), by comparing control sets of un-methylated genes that did not gain CHG methylation and by comparing to an additional, non-transgenic accession of E. salsugineum (Yukon). These controls are included in new Figure 5—figure supplement 2.

These new results are described in the text in subsection “Ectopic genic CG methylation is preferentially maintained following loss of AtCMT3 expression”, fourth paragraph.

2) Plants expressing CMT3 show body CHG methylation, but not body H3K9me2. The authors seem to interpret this observation as H3K9me2 recruiting CMT3 but H3K9me2 being removed by IBM1. An alternative interpretation of the results could be that H3K9me1, rather than H3K9me2, recruits CMT3. CMT3 binds not only to H3K9me2 but also to H3K9me1, even though CMT2 does not bind to H3K9me1 (Stroud et al., 2014, Figure 2D). Therefore, it seems possible that preexisting H3K9me1 guides CMT3, but not CMT2, to introduce CHG methylation in gene bodies.

To address this point we request the comparison of H3K9me1 in transgenic and non-transgenic control plants.

This is an intriguing hypothesis, however, we conducted H3K9me1 ChIP-seq in transgenic and non-transgenic lines and found that, similar to H3K9me2, there does not appear to be an enrichment of H3K9me1 in genes that gain CHG methylation before or after the introduction of the transgene (see new supplementary Figure 3—figure supplement 2A-C).

These new results are discussed in the text in the last paragraph of the subsection “CHG methylation in gene bodies is not associated with stable H3K9 methylation”.

We do agree with the reviewer that the possibility of an initial presence of H3K9me1, even if transient, could help explain the phylogenetic correlation between encoding CMT3, but not CMT2, and the presence gene body methylation across plant species. Once CMT3 methylates CWG cytosines, H3K9 methylation may then be stabilized as H3K9me2, which could be bound by CMT2, promoting CWA methylation. We have included this possibility in our Discussion and hypothetical model. See new model Figure 6 and the Discussion section.

3) In Figure 5A, the authors compared CHG methylation gain and expression change in each gene in the T4 plant with body CHG methylation. Their interpretation of the results is that "genic CHG methylation in AtCMT3-expressing lines is uncoupled from heterochromatin formation and transcriptional silencing, similar to gbM." A possible complication can be that transcriptome should reflect both primary effects of the body CHG methylation and indirect effects from those primary effects. The indirect effects would not correlate with CHG methylation.

Furthermore, the proportion of CHG methylated genes in downregulated genes (X<-2) seems significantly higher than those in upregulated genes (2<X) and control genes (-2<X<2). The results look consistent with transcriptome analyses in ibm1 mutants in Arabidopsis (Inagaki et al., 2017). According to that literature, genes downregulated in ibm1 mutants have significant levels of CHG methylation, while upregulated genes do not. In addition, GO analyses of upregulated genes suggest their link to immune responses. The interpretation by Inagaki et al. was that CHG methylation induces downregulation for a subset of genes, and that the upregulation of many genes reflects indirect effects, likely due to primary changes in expression of some key factor involved in immune responses.

As the results in Figure 5A seem consistent with the transcriptome of ibm1 mutants (downregulation as primary effects and upregulation as indirect effects), we suggest GO analysis of upregulated genes, and statistical test for the overrepresentation of CHG gain in downregulated genes. If GO analysis of upregulated genes show some tendency, even if not immune response, that would suggest indirect effects involved.

More generally, the question is if body CHG methylation affects gene expression. That could be clarified by examining other plants without the gain of CHG methylation, such as other transgenic lines or other generations of plants in the same line, as controls (in addition to the non-transgenic wild-type plants). Especially interesting controls might be T6 plants of the line that lost body CHG methylation while keeping body CG methylation, because that might disentangle effects of body CG and CHG methylation.

We conducted RNA sequencing of nine of the AtCMT3 expressing plants we analyzed, including the T6 plant that demonstrated loss of AtCMT3 expression, and in each case, we found most of the CHG-gain genes showed little to no change in expression (Figure 4A, Figure 4—figure supplement 1). As suggested by the reviewer, in each of these lines we conducted a statistical test to determine if the CHG gain genes are enriched in either down- or up- regulated genes identified genome-wide, using a cutoff of a +/- 2 log2 fold change. We found no enrichment in either down- or up- regulated genes (see new Supplementary file 7).

To assess the possibility of indirect effects of AtCMT3 expression, we conducted a GO enrichment analysis of up and down regulated genes in each line. We found that in each line, both up and down regulated genes showed significant enrichments in various abiotic stress response GO terms, but did not see an overrepresentation of immune response genes (see new Supplementary file 8). Despite this general trend for abiotic stress response genes, we found that there was no GO term consistently identified across all lineages, which lead us to conclude that it is unlikely that these changes in expression are related to the transgene expression either directly or indirectly. These results are discussed in the text in the subsection “CHG methylation in gene bodies is not associated with transcriptional silencing”.

With these additional results, we believe our original conclusion that "genic CHG methylation in AtCMT3-expressing lines is uncoupled from heterochromatin formation and transcriptional silencing, similar to gbM," is still valid. An important distinction of the genic CHG methylation we identified in AtCMT3-expressing E. salsugineum and that found in A. thaliana ibm1 mutants is that we did not detect stable H3K9 methylation associated with genic CHG methylation in the Eutrema lineages. Our prediction is that Eutrema IBM1 is actively removing H3K9 methylation, which prevents DNA methylation from leading to heterochromatin formation and silencing (See model Figure 6). This contrasts with A. thaliana ibm1 mutants, where both heterochromatin signals (DNA and histone methylation) are established over genes, which affects transcription.

Other points:

Please address the following additional points raised by the three reviewers as much as possible.

Reviewer #1:

4) The manuscript suggests that an increase of CMT3 expression leads to an increase of CHG methylation (subsection “Expression of AtCMT3 in E. salsugineum results in increased CHG methylation”, third paragraph). It is surprising why the authors did not use the AtCMT3-L3 for their analysis, as CMT3 is higher expressed in this line than in AtCMT3-L1 (Supplementary file 4). If the model is correct, AtCMT3-L3 should exhibit a higher methylation level than lines -L1 or -L2. As the authors have generated the methylome of this line (Figure 1—figure supplement 1), they should analyze gbM in this line. In fact, it would be advisable to include also the data for the other lines, since based on Figure 1—figure supplement 1 the authors produced methylome data for all lines until the second generation.

Our initial inclination was to include all lines in the main figures. However, simply due to space constraints, including all the data resulted in crowded figures that were difficult to read. We wanted to avoid confusion in our discussions of the dynamics of DNA methylation over generational time and chose to focus the main figures on the two lineages that we had samples over the full course of six generations while limiting the analyses of additional lineages to the supplementary figures.

Our analyses of the additional lineages included in the supplementary material confirmed the results presented in the main text figures and we agree with the reviewer that this should be stated more emphatically. We have therefore discussed these results more explicitly in the main text to emphasize that the two L3 individuals with the highest CMT3 expression also had the highest levels of CHG methylation and number of genes gaining CHG methylation:

“The results demonstrated a significant correlation between the levels of AtCMT3 expression and genome-wide CHG methylation levels (R2 = 0.8828, p = 1.669 x 10-4, Figure 1—figure supplement 5, see Supplementary file 4 for FPKM values). This was especially notable for two T2 generation plants of AtCMT3-L3, which had the highest AtCMT3 expression levels (233.7 and 372.5 FPKM for AtCMT3-L3T2 and T2b, respectively) and the highest genome-wide percent CHG methylation (27% and 32%, for T2a and T2b, respectively).”

and:

“Indeed, based on RNA-seq assessment of AtCMT3 expression, the levels of AtCMT3 expression and number CHG-gain genes were correlated (R2 = 0.7951, p = 0.001) (Figure 2—figure supplement 2, Supplementary file 4). Again, two T2 generation individuals of the AtCMT3-L3 lineage, which had the highest AtCMT3 expression (Supplementary file 4), also had the highest number of CHG-gain genes (5,566 and 6,346 CHG-gain genes for AtCMT3-L3T2 and T2b, respectively) (Supplementary file 5).”

5) Figure 4: The choice of the generations used in the analysis requires justification; why analyzing generations 1, 2, 4, 5, 8, 11 for Col-0 and generations 5, 13 for suv4/5/6 ? At least they should include the 13th Col-0 generation in the analysis, since the DNA methylation pattern can be significantly affected after few generations (Figure 1, AtCMT3-L2). It is also advisable to include additional explanations to this paragraph to allow the reader to follow what has been analyzed.

At the request of reviewer #3, we have removed this analysis so as not to distract from the main conclusions of the paper.

6) It is unclear how the RNA seq data were analyzed (subsection “RNA sequencing mapping and analyses”).

We have expanded this section of the Materials and methods to make this clearer:

“Quality-filtering and adapter-trimming were performed using Trimmomatic v0.33 with default parameters (Bolger, Lohse, and Usadel, 2014). […] Genes with zero FPKM values were removed from expression analyses. A cutoff of +/- 2 Log2 fold change was used to identify genes undergoing substantial changes in expression.”

7) For the ChIP-seq analysis, input seems to have been retrieved (subsection “Chromatin immunoprecipitation and sequencing (ChIP-seq)”) but apparently was not used for the normalization, neither have H3 data been used to normalize (subsection “ChIP-sequencing mapping and analyses”). The analysis has to be described in more detail or repeated.

We added in more detail to the Materials and methods section to clarify how we analyzed the ChIP-seq data:

“Quality-filtering and adapter-trimming were performed using Trimmomatic v0.33 (Bolger et al., 2014) with default parameters. […] Finally, the average bin values were normalized to account for the number of loci.”

Reviewer #2:

8) In regard to the ways of presentation of the ChIP-seq results, they showed metaplot and browser view for H3K9me2. It might also be informative to see scatter plot, comparing H3K9me2 level between the T4 and non-transgenic plants, to see if the signal increases in a subset of genes or TE genes.

Since our ChIP experiments were conducted at time points separated by months, due to the multi-generational nature of the study, we focused our analyses on major trends in enrichment over certain genomic features of interest (heterochromatin and genes that gained CHG methylation in transgenic lines). We appreciate there are certainly many ways to display these data, but ultimately decided that metaplots and representative browser views were the best way to accurately portray the general trends while remaining conservative in our approach.

9) Based on results in Figure 4, authors discussed that gain of CG methylation rate was significantly lower in Arabidopsis suvh4/5/6 mutant than in WT, but loss of CG methylation was not significantly different. The background statistics was not clear to me.

At the request of reviewer #3, we have removed this analysis so as not to distract from the main conclusions of the paper.

Reviewer #3:

10) The manuscript often refers to "targeting" of CMT3 to genes, which to me implies an active process. But the authors reach the conclusion (Discussion, last paragraph) that gbm is likely a passive byproduct of having a functional CMT3 enzyme. It's not that CMT3 is specifically targeted to genes, but that CMT3 acts in non-heterchromatic regions with some low frequency. It is suggested that the authors reconsider their use of the word targeting.

We agree this terminology can be confusing and we have removed “targeting” from our references to CMT3-localization to genes.

11) One of the results that most strongly supports the authors' model is that CG methylation is retained (and CHG and CHH methylation is lost) in CHGhyper genes after CMT3 is silenced in line L2 (Figure 6A and 6B). This led me to wonder why Eutrema lacks all gbm – presumably the loss of CMT3 occurred relatively recently in its evolutionarily history since its closest relatives retain CMT3 and gbm. Shouldn't some CG methylation still be present? Can the authors date the loss of CMT3 in Eutrema and does the total absence of gbm in this species fit with the timing of that loss and what we know about rates of mCG gain and loss in gene bodies over time?

This is an interesting question; however, we would require additional phylogenetic sampling to estimate when in evolutionary history the ancestors of modern Eutrema salsugineum lost CMT3.

12) Based on analysis of repeat methylation, the authors suggest that AtCMT3 preferentially targets heterochromatin over genes. There are alternative interpretations for these data. DNA methylation was reduced in all contexts in L2T6 in heterochromatin (Figure 6C), although it remained higher than in genes (Figure 6B). But the remaining methylation could be due to other maintenance and de novo pathways also being more active in those regions (MET1, RdDM, CMT2), rather than any residual CMT3 activity preferentially being directed to heterochromatin.

This is a good point and we have conducted an additional experiment that has allowed us to test this more thoroughly. We crossed the AtCMT3-L1T5 line to wild type in order to segregate out progeny that no longer encode the transgene to better evaluate maintenance of transgene induced ectopic methylation following transgene removal. We found that, similar to the L2T6 lineage, transgene induced methylation was maintained at higher levels in heterochromatin than it was in genes in the CHG and CHH contexts in progeny with the transgene crossed out. This supports the reviewer’s hypothesis that other pathways can preferentially maintain methylation in these regions. However, we also note that the relative CHG methylation in particular was reduced by a further ~50% when the transgene was completely removed compared to the L2T6 lineage which supports the possibility that the residual CMT3 activity remaining in this line was preferentially directed to heterochromatic regions.

We have included these results in Figure 5—figure supplement 1F and discussed them in the last paragraph of the subsection “AtCMT3 preferentially methylates heterochromatin relative to genes”.

13) The portion of the manuscript about epimutations rates in suvh4/5/6 in Arabidopsis was a distraction from the main message. The effects on CG methylation gain, while statistically significant, do not appear particularly strong. I recommend removing this section from the paper.

We have removed this section from the paper to avoid distraction and updated the text and figures accordingly. We have also opted to change the title since we are now focusing the paper mainly on the role of CMT3 in the initiation of gbM. Our new title is: “Epimutations are associated with CHROMOMETHYLASE 3-induced de novo DNA methylation”.

14) The right panel of Figure 1—figure supplement 5 does not support the conclusion that higher levels of CMT3 expression are correlated with increased global CHG methylation. The R2 value of 0.59 is driven by two points and many samples have high CHG methylation but relatively low CMT3 expression. This doesn't seem like a key conclusion of the paper, and the authors should be more cautious in their interpretation.

We agree that the correlation of AtCMT3 expression and genome-wide CHG methylation was not as strong when measured by qRT-PCR compared to RNA-seq (0.59 vs. 0.88 R2, respectively). However, taken together, we think the results do support the likelihood that variation in AtCMT3 expression is a contributing factor to overall CHG methylation levels, which has been noted previously in A. thaliana mutant backgrounds that alter CMT3 expression (e.g. Cell. 2014 Jul 3; 158(1): 98–109). We have altered the text to present this conclusion more cautiously:

“We also examined the relationship between genome-wide CHG methylation and AtCMT3 expression using qRT-PCR, including plants from additional lineages, and found a weaker although significant relationship (R2 = 0.5932, p = 0.025, Figure 1—figure supplement 1 and Figure 1—figure supplement 5). Taken together, AtCMT3 expression is likely one factor contributing to genome-wide CHG levels in these lines.”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Wendte JM, Zhang Y, Ji L, Shi X, Hazarika RR, Shahrary Y, Johannes F, Schmitz RJ. 2019. Epimutations are associated with CHROMOMETHYLASE 3-induced de novo DNA methylation. NCBI Gene Expression Omnibus. GSE128687 [DOI] [PMC free article] [PubMed]
    2. Bewick AJ, Ji L, Niederhuth CE, Willing EM, Hofmeister BT, Shi X, Wang L, Lu Z, Rohr NA, Hartwig B, Kiefer C, Deal RB, Schmutz J, Grimwood J, Stroud H, Jacobsen SE, Schneeberger K, Zhang X, Schmitz RJ. 2016. On the origin and evolutionary consequences of gene body DNA methylation. NCBI Gene Expression Omnibus. GSE75071 [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Supplementary file 1. Sequencing statistics for all next generation sequencing data analyzed in this study.
    elife-47891-supp1.xlsx (16KB, xlsx)
    DOI: 10.7554/eLife.47891.020
    Supplementary file 2. Differentially methylated regions identified by methylpy.
    elife-47891-supp2.xlsx (8.4MB, xlsx)
    DOI: 10.7554/eLife.47891.021
    Supplementary file 3. Called hyper- and hypo-DMRs in each lineage.
    elife-47891-supp3.xlsx (5.7MB, xlsx)
    DOI: 10.7554/eLife.47891.022
    Supplementary file 4. FPKM values for all genes determined by RNA-seq.
    elife-47891-supp4.xlsx (2.7MB, xlsx)
    DOI: 10.7554/eLife.47891.023
    Supplementary file 5. Lists of genes that gained a minimum of 5% CHG methylation in each line.
    elife-47891-supp5.xlsx (589.9KB, xlsx)
    DOI: 10.7554/eLife.47891.024
    Supplementary file 6. Genes with greater than (+/-) two log2 fold change in expression identified in each line.
    elife-47891-supp6.xlsx (59.9KB, xlsx)
    DOI: 10.7554/eLife.47891.025
    Supplementary file 7. P-values for Fisher's Exact tests of enrichment of CHG-gain genes in up- or down-regulated genes.
    elife-47891-supp7.xlsx (9.5KB, xlsx)
    DOI: 10.7554/eLife.47891.026
    Supplementary file 8. Gene Ontology analysis for biologic processes enriched in up- or down-regulated genes.
    elife-47891-supp8.xlsx (27.6KB, xlsx)
    DOI: 10.7554/eLife.47891.027
    Supplementary file 9. List of CHG-gain genes with closest A. thaliana ortholog and A. thaliana gbM status.
    elife-47891-supp9.xlsx (108.9KB, xlsx)
    DOI: 10.7554/eLife.47891.028
    Supplementary file 10. List of E. salsugineum IBM1-like genes and expression status.
    DOI: 10.7554/eLife.47891.029
    Transparent reporting form
    DOI: 10.7554/eLife.47891.030

    Data Availability Statement

    All sequencing data generated have been deposited into NCBI Gene Expression Omnibus under accession number: GSE128687.

    All sequencing data generated have been deposited into NCBI Gene Expression Omnibus under accession number: GSE128687.

    The following dataset was generated:

    Wendte JM, Zhang Y, Ji L, Shi X, Hazarika RR, Shahrary Y, Johannes F, Schmitz RJ. 2019. Epimutations are associated with CHROMOMETHYLASE 3-induced de novo DNA methylation. NCBI Gene Expression Omnibus. GSE128687

    The following previously published dataset was used:

    Bewick AJ, Ji L, Niederhuth CE, Willing EM, Hofmeister BT, Shi X, Wang L, Lu Z, Rohr NA, Hartwig B, Kiefer C, Deal RB, Schmutz J, Grimwood J, Stroud H, Jacobsen SE, Schneeberger K, Zhang X, Schmitz RJ. 2016. On the origin and evolutionary consequences of gene body DNA methylation. NCBI Gene Expression Omnibus. GSE75071


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES