Significance
RNA-directed DNA methylation (RdDM) provides a system for targeting DNA methylation to asymmetric CHH (H = A, C, or T) sites. This RdDM activity is often considered a mechanism for transcriptional silencing of transposons. However, many of the RdDM targets in the maize genome are located near genes or regulatory elements. We find that the regions of elevated CHH methylation, termed mCHH islands, are the boundaries between highly methylated (CG, CHG), silenced chromatin and more active chromatin. Analysis of RdDM mutants suggests that the function of the boundary is to promote and reinforce silencing of the transposable elements located near genes rather than to protect the euchromatic state of the genes.
Keywords: mCHH island, RdDM, chromatin boundary, maize, DNA methylation
Abstract
The maize genome is relatively large (∼2.3 Gb) and has a complex organization of interspersed genes and transposable elements, which necessitates frequent boundaries between different types of chromatin. The examination of maize genes and conserved noncoding sequences revealed that many of these are flanked by regions of elevated asymmetric CHH (where H is A, C, or T) methylation (termed mCHH islands). These mCHH islands are quite short (∼100 bp), are enriched near active genes, and often occur at the edge of the transposon that is located nearest to genes. The analysis of DNA methylation in other sequence contexts and several chromatin modifications revealed that mCHH islands mark the transition from heterochromatin-associated modifications to euchromatin-associated modifications. The presence of an mCHH island is fairly consistent in several distinct tissues that were surveyed but shows some variation among different haplotypes. The presence of insertion/deletions in promoters often influences the presence and position of an mCHH island. The mCHH islands are dependent upon RNA-directed DNA methylation activities and are lost in mop1 and mop3 mutants, but the nearby genes rarely exhibit altered expression levels. Instead, loss of an mCHH island is often accompanied by additional loss of DNA methylation in CG and CHG contexts associated with heterochromatin in nearby transposons. This suggests that mCHH islands and RNA-directed DNA methylation near maize genes may act to preserve the silencing of transposons from activity of nearby genes.
The cytosine bases in a genome can be modified to 5-methylcytosine by adding a methyl group at the 5′ position. This process, called DNA methylation, is conserved from algae to animals and plants (1, 2). DNA methylation can be separated into different types based on the local sequence context. In plants DNA methylation is found at the symmetric CG or CHG (where H = A, C, or T) sites or at nonsymmetric CHH sites. CG and CHG methylation are maintained at high fidelity following DNA replication due to activity of maintenance methyltransferases such as MET1 or chromomethylase (CMT) 3 (3, 4), whereas CHH methylation (mCHH) requires targeting by either domains rearranged methylase 2 (DRM2) or CMT2 (3–6). The DRM2 targeting occurs via RNA-directed DNA methylation (RdDM) and requires the activity of polymerase IV (PolIV) and polymerase V (PolV) complexes (3, 4). There is evidence that recruitment of PolIV and PolV may require the presence of dimethylation of lysine 9 of histone H3 (H3K9me2) or DNA methylation at the targeted genomic regions (7, 8). The specific mechanisms that recruit CMT2 are not well characterized but may require specific histone modifications (5, 6).
Much of our knowledge of DNA methylation in plants is derived from studies of the model plant Arabidopsis thaliana, which has a relatively small genome and relatively few examples of genes with nearby transposons (36.3%; ref. 9). The maize genome is much more complex, with the majority (85.5%) of genes positioned within 1 kb of transposons. In both species, transposons tend to have quite high levels of CG and CHG methylation whereas genes have much lower levels (10). mCHH is often thought to provide an important component for silencing transposons, yet the maize genome has relatively low levels of mCHH despite the high transposon context (11). This is partially attributed to the lack of a CMT2 ortholog in maize (5), which may explain the reduced levels of mCHH in the middle of larger transposons. Although mCHH is low in maize, there are still genomic regions with elevated mCHH (12). Genomic profiles of mCHH in maize revealed that this modification is often found near genes (termed mCHH islands) and is dependent upon RdDM activity (12–14). This elevation of mCHH in regions surrounding genes is much less prevalent in Arabidopsis (10). A recent study showed that high mCHH can also be induced near genes that are up-regulated in plants subjected to phosphate starvation (15).
In this study we further probed the basis and function of these mCHH islands. We found that mCHH islands are short regions of elevated mCHH that flank nearly half of the genes in maize and many conserved noncoding sequences (CNSs). These mCHH islands mark a transition for CG and CHG DNA methylation, several histone modifications, and chromatin accessibility. The mCHH islands are relatively stable across different tissues but show some variation among haplotypes that are often associated with sequence insertions/deletions (InDels). The loss of mCHH islands does not strongly affect gene expression, but instead leads to an additional loss of CG and CHG methylation in some transposons flanking maize genes.
Results and Discussion
mCHH Islands Mark the Boundary Between Different Types of Chromatin in the Maize Genome.
A metaprofile of context-specific DNA methylation surrounding maize genes reveals a gradual decline of CG and CHG methylation from flanking regions toward the genes (Fig. 1A). In contrast, there is an elevated level of mCHH in the regions flanking genes, and these regions have previously been termed CHH islands by Gent et al. (12); in this paper these will be referred to as mCHH islands to specify the regional accumulation of 5-methylcytosine in the CHH sequence context. The metaprofile of mCHH around genes has a fairly broad peak that spans ∼200–800 bp upstream of the transcription start site (TSS). A similar region is observed downstream of the transcription termination site (TTS). We were interested in understanding whether this elevated mCHH in the metaprofile is due to elevated mCHH for all genes or whether this phenomenon was driven by a subset of genes. In addition, we wanted to define the actual size and location of mCHH islands relative to the TSS and TTS.
Fig. 1.
Characterization of mCHH islands that are near genes. (A) Metaprofiles of context-specific DNA methylation for regions surrounding the gene TSS or gene TTS. CG (black) and CHG (blue) methylation are plotted using a different scale (Left) than mCHH (red) (Right) due to differences in abundance. (B) Genes with (green bar) or without (red bar) mCHH islands (100-bp tile with >25% mCHH anywhere within the 2-kb region) were used to create heat maps of mCHH in the 2-kb flanking regions at the 5′ (Left) or 3′ (Right) ends. (C and D) Context-specific DNA methylation profiles were generated for genes that contain an mCHH island. These plots are centered on mCHH islands (mCHHi) at the 5′ end (C) or 3′ end (D) rather than the TSS or TTS. (E and F) Similar profiles were generated to visualize the changes in chromatin at genes with mCHH islands. Different scales are used to visualize the different marks and the labels on the left or right of each plot indicate which scale is used for each mark using color coding of labels and lines. Read counts were used for the two histone marks and ratio of normalized read counts from libraries digested at 1 min and 16 min was used for chromatin accessibility (13).
Whole-genome bisulfite sequencing (WGBS) data from the third leaf of B73 seedlings (16) were used to determine the coverage and level of DNA methylation in each sequence context for nonoverlapping 100-bp tiles across the entire maize genome. There are 29,922 maize genes with coverage for at least 50% of the tiles in the 2-kb region immediately upstream of maize genes and 25,973 genes with at least 50% coverage for the 2-kb region downstream of maize genes. An mCHH island was defined by the presence of a 100-bp tile with at least 25% mCHH within 2 kb. Genome-wide, only 1.2% of all 100-bp tiles have at least 25% mCHH but many genes contain an mCHH island within 2 kb of the 5′ (51.3% of genes) or 3′ (41.8% of genes) ends. Precise quantification of the number of genes with mCHH islands is hampered by the fact that the mCHH islands may exist in tiles with low coverage; thus, these numbers are likely underestimates. Indeed, the visualization of mCHH levels in flanking regions of all genes (Fig. 1B) also provides evidence that some genes classified as not having a strong mCHH island still contain a region of moderate mCHH within the flanking region. For each gene, there is a relatively small region (100–200 bp) with elevated mCHH levels, and these mCHH islands are most common in the first ∼600 bp of the flanking regions but can occur anywhere throughout the 2-kb flanking regions (Fig. 1B). These results suggest that mCHH islands observed in the metaprofile are actually the result of a subset of maize genes that have elevated mCHH in sharply defined regions flanking the gene.
A second set of metaprofiles, only using genes with mCHH islands, were made to evaluate the context-specific profiles of DNA methylation and chromatin state relative to the mCHH islands (Fig. 1 C–F). These plots are centered on the 100-bp tiles identified as mCHH islands rather than on the TSS or TTS. The typical mCHH island has elevated levels of mCHH relative to the flanking regions (Fig. 1 C and D, red lines). The mCHH islands also clearly mark the transition from high levels of CG and CHG methylation that flanks genes to reduced CG and CHG methylation at the beginning and end of genes. The change in CG and CHG methylation is much sharper in plots centered on mCHH islands (Fig. 1 C and D) compared with plots centered on the TSS or TTS (Fig. 1A), suggesting that the mCHH island and not the TSS or TTS is the site of this change in methylation. Previous research suggested that mCHH islands themselves were not particularly enriched in the heterochromatin-associated H3K9me2 modification. Instead, these regions tended to have more accessible chromatin (13). This led us to evaluate whether mCHH islands might mark the transition zone between distinct types of chromatin by assessing the profile of chromatin on either side of mCHH islands (Fig. 1 E and F). At the 5′ end of genes containing mCHH islands, H3K9me2 exhibits a strong decrease at mCHH islands, whereas chromatin accessibility (13) is substantially increased for several hundred bases from mCHH islands toward the TSS. H3K4me3, a histone modification often associated with expressed genes, also shows a clear enrichment beginning in the region 3′ of mCHH islands. The enrichment for accessible chromatin in the region flanking the mCHH island was observed both in highly expressed genes and silenced genes containing mCHH islands but is not as strong as the enrichment for these marks immediately upstream of the TSS (Fig. S1).
Fig. S1.
Chromatin accessibility surrounding mCHH islands and TSSs. For genes containing an mCHH island >500 bp upstream of the TSS we identified four 200-bp regions. These include the regions upstream and downstream of both the mCHH island and the TSS. The chromatin accessibility was assessed in these regions for highly expressed genes or genes not expressed in seedling. Chromatin accessibility is higher 3′ of the mCHH island in both highly expressed and silent genes. The upper portion in A shows the four 200-bp regions (regions I to IV) that are assessed for chromatin accessibility, and the table shows the actual value of chromatin accessibility. The table is also shown as a bar chart in B for easy comparison. The ratio of normalized read counts from libraries digested at 1 min and 16 min was used for chromatin accessibility (13). High ratio means high chromatin accessibility.
The mCHH islands flanking maize genes may provide boundaries between two distinct types of chromatin. We hypothesized that similar boundaries may also be required at regulatory regions to allow access to these regions for transcription factors. Turco et al. (17) identified a number of CNSs in the B73 genome and 11,680 of these are located >5 kb from the nearest gene. The profile of DNA methylation relative to these CNSs >5 kb away from genes reveals the presence of mCHH islands flanking CNSs and reduced DNA methylation at CNSs (Fig. S2A). There are mCHH islands flanking 42.9% of the CNSs that are >5 kb from the nearest gene and the context-specific patterns of methylation and chromatin state at these mCHH islands flanking CNSs are very similar to those observed at mCHH islands flanking maize genes (Fig. S2 B and C).
Fig. S2.
mCHH islands occur near CNSs. (A) DNA methylation profiles centered on 568 CNSs (17) that are >5 kb from the nearest gene. (B) Profile of context-specific DNA methylation relative to mCHH islands (dashed vertical line). Negative coordinates (left) are moving away from the CNSs, whereas positive coordinates (right) encompass the CNSs. (C) A similar plot for chromatin modifications H3K9me2 (green), H3K4me3 (purple), or chromatin accessibility (gray).
mCHH Islands Are Present at Expressed Genes Located Near Terminal Inverted Repeat Elements.
The mCHH islands found near genes and CNSs account for 49% of the regions with elevated (>25%) mCHH in the maize genome (Fig. 2A). Nearly half of the maize genes tested (with read coverage in at least half of the 2-kb region surrounding a gene) contain an mCHH island, and we were interested in understanding the factors that cause some genes to have mCHH islands whereas others do not. Previous research provides evidence that mCHH islands are enriched at more highly expressed genes and frequently occur at transposons located near genes (12). Of the different classes of transposons, the mCHH islands identified in this study are enriched at terminal inverted repeat (TIR) DNA transposons located near genes (Fig. S3A, 38% vs. 6% for all tiles, P < 0.01). This enrichment is most apparent for transposons that are located closest to genes or CNSs and is only present at the edge of the transposon located closest to the gene and is not limited to DNA transposons (Fig. 2B and Fig. S3B).
Fig. 2.
mCHH islands are often located at transposon edges close to highly expressed genes. (A) All of the 100-bp tiles with >25% mCHH levels in B73 seedling were annotated based on their location relative to genes or CNSs. (B) The level of mCHH was determined for the distal and proximal edge (100 bp) of TEs greater than 1 kb in length. These TEs were also split according to whether they were located on the same or opposite strand as the gene. (C) The percentage of genes with a 5′ end mCHH island is shown for genes that are not expressed (NE) or genes in each of the four expression quartiles (Q1–Q4) using RNA-seq data from the same tissue as the WGBS data, B73 seedling leaf. The small letter above each bar indicates whether there is a significant difference between the proportion of genes with mCHH island, with same letter indicating no difference and different letter indicating significant difference at P < 0.01 (prop.test in R) level.
Fig. S3.
Factors influencing presence of mCHH island. (A) Sequence annotation of mCHH islands that are at 5′ end of genes. (B) Percentage overlapping high mCHH tiles for each transposon family that are located at different regions relative to filtered genes (FGS) or conserved noncoding sequences (CNS). Transposons were divided into four categories: nonspreading LTR transposon (LTR_nonspreading), spreading LTR transposon (LTR_spreading), DNA transposon (TIR), or other. Each type of transposon was then furthered classified into six groups based on their location relative to genes or CNS. The percentage of transposons in each group that are overlapping high mCHH tiles (>25%) were determined and plotted. (C) The proportion of genes with a 3′ end mCHH island is shown for genes that are not expressed (NE) or genes in each of the four expression quartiles (Q1–Q4) in the same tissue used for WGBS, third leaf of seedling. The small letter above each bar indicates significant levels at P < 0.05. (D) Genes with body methylation (gBM) tend to have both 5′ and 3′ end mCHH islands. CHH <10, % mCHH <10%; CHH > 25, % mCHH >25%. (E) Percentage of genes with syntenic groups for genes with different levels of highest mCHH in the 2-kb promoter regions. (F and G) The genes in quartile 4 (most highly expressed) were split into genes with or without an mCHH island at 5′ end. The type of transposon located closest to the TSS was determined for each gene (“No TE” indicates no transposon within 2 kb of TSS). The percentage of genes with each type of element is shown in F. In G the 2-kb regions 5′ of the TSS were assessed for the presence of at least one 100-bp tile with >60% CG or CHG methylation (High CG/CHG); genes with CG/CHG <60% in all tiles were put in the group Low CG/CHG. **P < 0.01. (H) The genes that are not expressed in the third leaf tissue were classified as either expressed in other tissues or not detected using the B73 expression atlas (18). The percentage of each of these groups with mCHH islands is shown.
The presence of a mCHH island is also associated with the expression level of the nearby gene. Genes that are expressed in the same tissue used for WGBS are much more likely to contain mCHH islands than silenced genes (P < 0.01, two-sample test for equality of proportions) and there is slight but steady increase in the frequency of genes with mCHH islands for more highly expressed genes (Fig. 2C and Fig. S3C). Genes with gene body CG methylation or genes that are syntenic between maize and sorghum are also more likely to have 5′ mCHH islands (Fig. S3 D and E, P < 0.01). Although these analyses provide evidence that expression might be associated with mCHH islands, there are a number of expressed genes lacking mCHH islands (34%) and many silent genes have mCHH islands (38%). We further investigated these two subsets of genes to better understand the factors that might contribute to the presence of mCHH islands (Fig. 2C).
Highly expressed genes (the fourth expression quartile) in seedling tissue are enriched for having mCHH islands but some (34%) of these genes lack mCHH islands within 2 kb of the gene promoter. We hypothesized that these genes may lack the sequences or chromatin required for mCHH island formation. This could be due to a lack (or poor annotation) of TIR elements in the flanking regions or could be due to a lack of regions containing CG or CHG methylation located near the gene. The most highly expressed genes (fourth expression quartile) were divided into those with and without mCHH islands and then assessed for presence of transposons or highly methylated (>60% at CG/CHG) tiles in the 2-kb flanking region. The genes without mCHH islands are less likely to contain TIR elements or tiles with high CG/CHG methylation compared with genes with mCHH islands (Fig. S3 F and G, P < 0.01). However, many of the genes without mCHH islands do contain either TIR elements or elevated CG/CHG methylation, and it is not clear why these genes lack mCHH islands. Overall, these observations suggest that gene expression and presence of CG/CHG methylated DNA transposons near genes are important factors associated with the presence of an mCHH island, but these factors do not entirely explain the phenomenon.
Although genes that are not expressed in seedling leaf tissue are less likely to have mCHH islands, 38% of them do (Fig. 2C). The genes that are not expressed in seedling leaf tissue were further divided into two groups (expressed in other tissues or never expressed) based on their expression in 51 tissues or developmental stages of B73 (18). Genes that are expressed in other tissues are more likely to have mCHH islands than genes that are not detected in any of the tissues surveyed (Fig. S3H, P < 0.01). This suggests that mCHH islands are often present at expressed genes even in tissues without expression of the gene. This is further supported by the analysis of mCHH islands in four maize tissues in which the mCHH islands are fairly stable (Fig. S4A). Very few (<5%) mCHH islands have major differences among tissues (defined as having >25% mCHH in one tissue and <10% mCHH in another) and there was no evidence that the genes located near these rare mCHH islands that vary among tissues exhibit tissue-specific expression patterns that were related to the elevated mCHH levels (Fig. S4B).
Fig. S4.
mCHH islands variation among tissues or genotypes. (A) Stability of mCHH islands across distinct tissues/organs of maize. A set of 5,770 genes have complete methylation coverage for the 1-kb 5′ of the TSS in all four tissues and have an mCHH island in at least one tissue. Hierarchical clustering was used to assess the patterns of mCHH in the four tissues at these nonredundant mCHH islands. (B) The percentage of mCHH islands that are only present in one of the two contrasted tissues or genotypes and for which the nearby gene is not differentially expressed (black), the gene is more highly expressed when the mCHH island is present (green), or the gene is more highly expressed when the mCHH island is absent (red) was determined using RNA-seq. The differential expression values were based on pairwise contrasts of expression levels in four tissues of B73 (anther, ear, shoot apical meristem enriched tissue, and seedling leaf) whereas the genotype comparisons are based on pairwise contrasts of expression levels in seedling tissue of four genotypes (Mo17, CML322, Oh43, or Tx303) compared with B73. (C) Stability of mCHH islands among diverse maize genotypes. A set of 873 genes that contain an mCHH island in at least one genotype and have coverage in all five genotypes were identified. Hierarchical clustering of mCHH at the mCHH islands of these genes revealed examples of variable mCHH islands. (D) Percentage of genes that are highly expressed in B73 for genes that have an mCHH island only in B73 (B73) or in one of the four inbred lines Mo17, CML322, Oh43, or Tx303 (Other). The values are averages for the four pairwise contrasts between B73 and each of the other genotypes and the error bars indicate SD among the four contrasts.
Differences in mCHH Islands Among Genotypes Are Often Related to InDels.
The stability of mCHH islands among diverse maize genotypes was assessed using WGBS from five maize inbreds (19). This analysis used 873 genes that have a 5′ mCHH island in at least one genotype and have WGBS coverage for all tiles within 1 kb of the TSS in all five genotypes. In many cases the five genotypes have conserved mCHH islands (Fig. S4C). However, ∼36% (312/873) of these mCHH islands have low (<10%) mCHH in at least one genotype. Many of the genes with genotype-specific mCHH islands were not differentially expressed (Fig. S4B). However, when the genes are differentially expressed, we observed that the haplotype with higher mCHH is more likely (66–73%) to exhibit elevated expression levels (Fig. S4D).
To further study the genetic factors leading to mCHH island variation we performed WGBS for PH207, a genotype with a de novo whole-genome assembly. This provided the opportunity to compare the full promoters in both genotypes. The analysis used a subset of 1,760 genes with a one-to-one match in B73 and PH207 that also have WGBS coverage for the first 1 kb of the promoter in both genotypes. We identified 27 genes in this set that have an mCHH island in B73 and have an InDel >100 bp in PH207 that occurs between the location of the mCHH island and the TSS in B73. These include 16 examples with a deletion in the PH207 promoter that removes part or all of the sequence that forms the mCHH island in B73 (Fig. 3). Many (11/16) of the genes with a PH207 deletion lose the mCHH island from the 1-kb proximal promoter in PH207 (example in Fig. 3B). The other five loci containing a PH207 deletion form an mCHH island at a new site in PH207 (example in Fig. 3C). There are 11 loci with an insertion in the PH207 promoter located between the position containing the mCHH island and the TSS in B73 (Fig. 3). For 7 of these 11 insertions in PH207 a new mCHH island is formed at the insertion sequence itself (example in Fig. 3D). The majority of these InDels, including 14/16 PH207 deletions and 7/11 PH207 insertions, are annotated as transposons or have sequence similarity to transposons. These analyses suggest that the presence and location of mCHH islands can be influenced by transposon-derived InDels located in the promoter.
Fig. 3.
Promoter InDels influence mCHH islands. (A) Genes with coverage for the first 1 kb of the promoter in both B73 and PH207 and an mCHH island in B73 were used to search for InDels >100 bp located between the mCHH island and the TSS. Sixteen examples of PH207 deletions and 11 examples of PH207 insertions were identified. The effect on the PH207 mCHH islands was classified for each InDel and is shown in the table. Examples of the most common classes are shown in B–D. (B) At gene GRMZM5G871592_T01 the region containing the mCHH island (−400 to −500) in B73 is deleted in PH207 and no mCHH island is observed in PH207. (C) At gene GRMZM2G136178_T01 PH207 has a deletion that removes much of the B73 mCHH island but an mCHH island forms at a new location in PH207. (D) At gene AC194914.3_FGT002 PH207 has an insertion and an mCHH island forms at this insertion whereas the site of the B73 mCHH island has little mCHH in PH207. In C and D, the thick black lines represent B73 sequences annotated as DNA transposon and the subfamily (PIF or hAT) is indicated for each.
Loss of mCHH Islands Results in Additional Loss of Transposon CG and CHG Methylation.
mCHH islands tend to form at TIR elements near expressed maize genes and mark a clear transition zone relative to CG/CHG methylation and several chromatin features that were assessed. The establishment/maintenance of mCHH islands requires mediator of paramutation 1 (MOP1) (GRMZM2G042443), a maize ortholog of the RDR2 gene in Arabidopsis (20, 21), MOP2 (GRMZM2G054225), and MOP3 (GRMZM2G007681), suggesting that mCHH islands are formed by RdDM activity (13). Sequence-capture bisulfite sequencing (22) data were used to assess the effects of three RdDM mutants mop1 (21), mop2 (23), and mop3 (24) on mCHH within and surrounding mCHH islands at 347 loci that were located in gene promoters and that were also included in a capture design that target a specific set of maize regions (Fig. 4A; ref. 25). All three mutations greatly reduce mCHH levels at mCHH islands. These materials provide a resource to further probe the function of mCHH islands. One hypothesis is that mCHH islands are important for gene expression levels and provide a border preventing the spread of heterochromatin toward the gene. This hypothesis would predict that CG and CHG methylation might be increased in regions 3′ of the mCHH island and that genes containing mCHH islands would be more likely to be differentially expressed (at lower levels) in mop1 or mop3 relative to wild type. We do not see any evidence for increased CG or CHG methylation in regions 3′ of the mCHH island (Fig. 4 B and C). Transcriptome profiling in mop1 and mop3 found that genes with mCHH islands are slightly enriched for differential expression but the majority of these differentially expressed genes were up-regulated in the mutant (Table S1), which is not consistent with a function for mCHH islands in preventing silencing of the genes.
Fig. 4.
Loss of mCHH islands in RdDM mutants and its potential consequences. (A–C) DNA methylation profiles in B73 and three RdDM mutants based on data from sequence capture bisulfite sequencing that includes 347 mCHH islands. The mCHH island is indicated between the two vertical dashed lines. The size of each mCHH island is normalized to 100 bp. (D and E) The context-specific levels of DNA methylation at the mCHH island and surrounding regions are shown for wild-type and two mutants using the Integrative Genomics Viewer (IGV) (26). In D the mutants show loss of methylation in all contexts within the mCHH island but CG and CHG methylation outside the mCHH island is largely maintained. In E there is evidence for loss of CHG methylation for several hundred base pairs outside of the mCHH island. The gray areas represent regions that do not have data coverage due to lack of reads or methylation sites (either CG or CHG or CHH). The mCHH island is indicated by the dashed green boxes. (F) Expression of transposons in mop1 and mop3 mutants relative to wild type. Uniquely mapping reads were used to classify each TE as having few or no reads, expression at similar levels or differential expression with up- or down-regulation in the mutants. (G) A boxplot is used to show the distribution of distances for each type of transposon from the nearest high mCHH tile for the mop1 earshoot data. P value is based on Wilcoxon test.
Table S1.
Summary of differentially expressed genes (DEG) in mop1 and mop3 mutants
| Mutant_tissue | Gene type | Not DEG | DEG | % DEG | Up-regulated in mutant | % Up-regulated in mutant |
| mop1_earshoot | CHH >25 | 9,588 | 428 | 4.3 | 278 | 65 |
| CHH 10–25 | 2,758 | 110 | 3.8 | 71 | 64.5 | |
| CHH <10 | 2,858 | 100 | 3.4 | 54 | 54 | |
| No data | 4,536 | 127 | 2.7 | 81 | 63.8 | |
| mop1_seedling leaf | CHH >25 | 9,426 | 156 | 1.6 | 105 | 67.3 |
| CHH 10–25 | 2,724 | 43 | 1.6 | 30 | 69.8 | |
| CHH <10 | 2,868 | 34 | 1.2 | 20 | 58.8 | |
| No data | 4,297 | 48 | 1.1 | 28 | 58.3 | |
| mop3_seedling leaf | CHH >25 | 9,448 | 103 | 1.1 | 69 | 67 |
| CHH 10–25 | 2,728 | 23 | 0.8 | 20 | 87 | |
| CHH <10 | 2,889 | 21 | 0.7 | 11 | 52.4 | |
| No data | 4,287 | 34 | 0.8 | 24 | 70.6 |
An alternative hypothesis is that mCHH islands act as a border to prevent the spread of euchromatin into transposons near active genes. This hypothesis would predict that the loss of mCHH islands might be accompanied by additional loss of CG and CHG methylation within the transposon and increased expression for some of the transposons near active genes. The sequence capture bisulfite sequencing experiments profiling mop1, mop2, and mop3 mutants reveal that mCHH methylation is greatly reduced at mCHH islands (Fig. 4A). CG and CHG methylation is also reduced at the region defined as the mCHH island with the strongest reduction in mop3 and minimal loss in mop2 (Fig. 4 B and C). The profiles also revealed a reduction in CHG (and to a lesser extent CG) methylation in the region 5′ of the mCHH island. WGBS of mop1 supports these findings (Fig. S5 A–C) with evidence for loss of CG and CHG methylation in the regions near the mCHH islands near both ends of genes and CNSs. The regions outside of the mCHH islands that exhibit loss of CG and CHG methylation have very low levels of CHH methylation and do not seem to be active targets of RdDM in wild-type plants. We examined siRNA distributions around mCHH islands to test whether RdDM activity might actually cover a larger area than revealed just by mCHH. Consistent with mCHH, however, siRNAs were highly enriched specifically within mCHH islands and dramatically depleted in mop1 (Fig. S5 D and E). It is possible that small RNAs and RdDM activity for these regions 5′ of the mCHH islands is present at an earlier developmental stage and the loss of CG/CHG methylation in these regions reflects loss of RdDM activity at an earlier stage.
Fig. S5.
Loss of mCHH island in RdDM mutants and its potential consequences. (A) DNA methylation profiles in B73 (solid lines) and mop1 (dashed lines) centered on mCHH island from 5′ end of genes. (B and C) Plots similar to those in A but centered on mCHH island from 3′ end of genes (B) or from CNSs (C). (D and E) Small RNA levels at mCHH island and its surrounding regions in mop1 and wild type. All small RNAs that can be mapped were used. For small RNAs that are mapped to multiple locations, read counts were normalized to the number of chromosomal hits.
Visual examination of several loci confirms these trends and suggests variability in the locus-specific patterns in the mutants (Fig. 4 D and E). Several subtypes of mCHH islands were identified to better characterize the effects of the mop mutations on CG and CHG loss 5′ of the mCHH islands. This analysis is restricted to 147 regions that have sequence capture probes and that have data in both the mCHH island itself and in the 400 bp 5′ of the island. The analysis of CG and CHG methylation levels at this region revealed that 48 of the loci only have CG/CHG methylation at the mCHH island itself (Fig. S6A). The remaining 99 loci contain elevated CG and CHG methylation upstream of the mCHH islands. Clustering of the difference in CG and CHG methylation in mop3 relative to wild type reveals that about one-third of these loci exhibit loss of CG and CHG methylation only at the mCHH island itself (Fig. S6 B and C). Another one-third of the loci exhibit strong loss of CG and CHG methylation for several hundred base pairs 5′ of the mCHH island, whereas the final set shows partial loss of CG and CHG methylation 5′ of the mCHH island (Fig. S6 B and C). Similar trends were observed for these loci in the mop1 mutant (Fig. S6C) although the severity of the CG and CHG loss was reduced.
Fig. S6.
Loss of CG and CHG methylation at the mCHH island itself and upstream regions in mop1 and mop3 mutants. (A) Clustering of CHG and CG levels at the mCHH islands and 400-bp upstream regions. DNA methylation levels are based on wild-type B73. (B) Clustering of CHG and CG loss in mop3 compared with B73 wild type. The difference for CHG/CG between mop3 mutant and B73 was calculated and used to perform hierarchical clustering. Only the genes that have high CG/CHG beyond the mCHH islands are used. These genes were divided into three groups based on whether they have CG/CHG loss outside the mCHH islands itself. (C) CHG and CG levels in B73, mop1, and mop3 across the four different groups from A and B. The upper three panels showed CHG levels in B73, mop3, and mop1, respectively, from left to right, and the bottom three panels showed CG levels in the same lines. In each panel, methylation levels at the mCHH island itself (between the two vertical dashed lines) and 400-bp upstream and downstream regions are shown. The line colors show the four groups that were labeled using colors as in A and B.
Earlier studies of the Mutator transposon in maize found evidence for progressive loss of DNA methylation and activation for these elements in the mop1 mutant (27) and analysis of shoot apical meristem found evidence for large-scale transposon and gene expression changes in mop1 (28). Therefore, we investigated whether loss of mCHH islands in mop1 or mop3 mutants is associated with activation of nearby transposons. There are several complications with attempts to study transposon expression. The repetitive nature of many transposons severely limits the ability to study specific insertions. In addition, the variation in specific transposon insertions in different haplotypes complicates analysis of transposon expression based on the reference genome. Uniquely mapping RNA sequencing (RNA-seq) reads were used to search for specific transposons that exhibit altered expression in mop1 (back-crossed into B73) and mop3 (in a non-B73 genetic background) mutants relative to wild-type siblings. The vast majority of transposons are not expressed (or not detected by uniquely mapping reads) or have low/similar levels of expression in mutant and wild-type genotypes in either leaves or immature ears (Fig. 4F). Depending on the mutant and tissue analyzed there are 36–208 individual transposons with altered expression (Fig. 4F). The majority (78–89%) of these exhibit increased expression in the mutant relative to wild type (Fig. 4F). There were fewer transposons with altered expression in mop3 mutants than in mop1 mutants. This is likely due to the fact that mop3 mutants are in a distinct genetic background and analysis of transposons in this background will be limited to those that are common in both genotypes. It is likely that there are additional transposons with altered regulation in mop3 that cannot be detected by alignments to the B73 reference. In some cases the up-regulated transposons are located near regions that were targeted for the sequence capture bisulfite sequencing and we could observe the coincident loss of the mCHH island and elevated expression of the transposon (TE) (Fig. S7 A and B). Genome-wide, the transposons that are up-regulated in mop1 or mop3 are significantly (P < 0.01, Wilcoxon test) closer to RdDM sites (>25% mCHH) than transposons that are silent (or equivalently expressed) in mutant and wild type, and a greater proportion of them are within 100 bp of RdDM sites (Fig. 4G and Fig. S7 C and D). This suggests that RdDM sites protect a subset of transposons from activation and these may be transposons that are located near active genes. Given the progressive loss of methylation at some Mutator elements over multiple generations in mop1 (27) we might expect that the erosion of CG/CHG DNA methylation and activation of transposons near genes might affect even more loci in subsequent generations.
Fig. S7.
Transposon expression in mop1 and mop3 mutants. (A) IGV view to show the methylation pattern around an up-regulated transposon in B73, mop1, and mop3. The gray areas represent regions that lack read coverage or regions that have read coverage but no CG/CHG/CHH methylation site. (B) A bar chart shows the expression level of the transposon from A in mop1 and mop3 mutants and their corresponding wild type. TE expression levels were determined using uniquely mapped reads and expression levels were normalized using DESeq. (C and D) Boxplot to show the distance from transposon to nearest high mCHH (>25%) tile for four groups of TEs. “No/few reads,” zero or few uniquely mapped reads in both mutant and wild type; “Non DETE,” not differentially expressed between mutant and wild type; “DETE_mop1 (mop3) high,” TEs that are differentially expressed between mop1 (mop3) and wild type and that are highly expressed in mop1 (mop3); “DETE_WT high,” TEs that are differentially expressed between mop1 (mop3) and wild type and that are highly expressed in wild type. P value is based on Wilcoxon test.
Conclusion
Plant genomes are often composites of genes and transposons with substantial variation in the abundance of transposons across different species. Although most Arabidopsis transposons have elevated mCHH (5), the CHH methylation in the maize genome is only found at some transposon regions. Recent studies have suggested that the recruitment of PolIV and RdDM activities requires DNA methylation and/or elevated H3K9me2 (7, 8). In addition, RdDM seems to target intergenic regions and plant genes located primarily in euchromatin (29). These requirements suggest that PolIV and RdDM will primarily act at the borders between open chromatin and regions with elevated CG/CHG methylation. These borders typically occur at both edges of a transposon in the Arabidopsis genome (5). However, in the maize genome most transposons are present in large blocks of other transposons and therefore only the transposons that are at the edges near expressed genes might recruit RdDM activities. Indeed, much of the mCHH in the maize genome occurs near genes (13). This suggests that RdDM may not be a crucial requirement for silencing all transposons (5, 6, 30). Instead, this activity may be critical for maintaining the silencing of the transposons that are located near genes (31). Indeed, the analysis of changes in DNA methylation in the mop1 and mop3 mutants that perturb components of the maize RdDM system revealed that the loss of mCHH at the transposon edges near genes can often result in additional loss of CG/CHG methylation in the transposon. Only a subset of transposons exhibit transcriptional activation in mop1 and mop3 and these are often located near mCHH islands. This suggests that the mCHH islands and near-gene RdDM activity may be critical for creating a boundary that prevents the spread of open, active chromatin into adjacent transposons.
Materials and Methods
Sequencing Datasets.
A full description of the biological samples, extraction of nucleic acids, library construction, and sequencing is available in SI Materials and Methods, and Table S2 lists all samples and accession numbers of raw sequencing data. WGBS and RNA-seq data for B73 shoot apex, immature ear and anther tissue, H3K4me3 ChIP-seq for B73 seedling leaf tissue, RNA-seq data for mop1, mop3, and wild-type siblings earshoot/seedling leaf, and WGBS data for PH207 seedling leaf were generated for experiments described in this paper. In addition the analyses in this study used previously published WGBS data for B73 seedling leaf and RNA-seq data for B73, Mo17, Oh43, Tx303, and CML322 seedling leaf from Eichten et al. (16), H3K9me2 ChIP-seq data for B73 seedling from West et al. (10), WGBS data for Mo17, Oh43, Tx303, and CML322 seedling leaf from Li et al. (19), WGBS data for mop1 earshoot and targeted bisulfite sequencing data for mop1 earshoot, B73, mop2, and mop3 seedling leaf from Li et al. (22), and chromatin accessibility data for B73 earshoot from Gent et al. (13).
Table S2.
Datasets used in this study
| Genotype | Tissue | Library type | Replicates | Reads_total | Reads_trimmed | Reads_aligned_total | Reads_aligned_unique | NCBI accession no. | Source |
| B73 | Third seedling leaf | H3K4me3 ChIP-seq | R1 | 18,793,091 | 17,227,631 | 16,354,922 | 4,818,497 | SRX1073672 | This study |
| B73 | Anther | RNA-seq | R1 | 33,024,817 | 32,544,227 | 30,224,114 | 21,932,316 | SRX1073645 | This study |
| B73 | Anther | RNA-seq | R2 | 27,799,402 | 27,369,321 | 25,483,611 | 18,347,803 | SRX1073646 | This study |
| B73 | Anther | RNA-seq | R3 | 34,761,756 | 34,236,483 | 31,887,219 | 23,032,277 | SRX1073647 | This study |
| B73 | Earshoot | RNA-seq | R1 | 35,977,897 | 35,539,156 | 33,067,071 | 22,905,307 | SRX1073648 | This study |
| B73 | Earshoot | RNA-seq | R2 | 35,425,126 | 34,903,598 | 32,457,202 | 22,992,761 | SRX1073649 | This study |
| B73 | Earshoot | RNA-seq | R3 | 30,969,135 | 30,471,382 | 28,334,735 | 20,031,591 | SRX1073650 | This study |
| B73 | Shoot apex | RNA-seq | R1 | 33,834,148 | 32,995,593 | 30,516,239 | 21,643,595 | SRX1073651 | This study |
| B73 | Shoot apex | RNA-seq | R2 | 35,386,202 | 34,890,836 | 32,473,405 | 22,232,625 | SRX1073652 | This study |
| B73 | Shoot apex | RNA-seq | R3 | 37,906,509 | 37,216,965 | 34,588,700 | 23,973,626 | SRX1073653 | This study |
| mop1_mut | Ear | RNA-seq | R1 | 21,412,625 | — | 19,840,783 | 18,732,016 | SRX1099831 | This study |
| mop1_mut | Ear | RNA-seq | R2 | 11,414,130 | — | 10,559,746 | 9,964,528 | SRX1099832 | This study |
| mop1_mut | Ear | RNA-seq | R3 | 24,293,659 | — | 22,531,028 | 21,390,969 | SRX1099833 | This study |
| mop1_mut | Third seedling leaf | RNA-seq | R1 | 25,105,056 | 24,761,079 | 22,426,693 | 17,016,556 | SRX1304654 | This study |
| mop1_mut | Third seedling leaf | RNA-seq | R2 | 23,993,746 | 23,638,274 | 21,371,252 | 16,255,953 | SRX1304655 | This study |
| mop1_mut | Third seedling leaf | RNA-seq | R3 | 22,258,732 | 21,800,095 | 19,674,287 | 14,978,821 | SRX1304657 | This study |
| mop1_WT | Third seedling leaf | RNA-seq | R1 | 23,708,017 | 23,303,527 | 21,180,432 | 15,731,514 | SRX1304659 | This study |
| mop1_WT | Third seedling leaf | RNA-seq | R2 | 26,023,097 | 25,593,252 | 23,212,380 | 17,667,881 | SRX1304660 | This study |
| mop1_WT | Third seedling leaf | RNA-seq | R3 | 24,145,350 | 23,789,847 | 21,659,670 | 16,238,208 | SRX1304661 | This study |
| mop3_mut | Third seedling leaf | RNA-seq | R1 | 25,364,193 | 24,982,245 | 19,396,144 | 14,100,660 | SRX1304662 | This study |
| mop3_mut | Third seedling leaf | RNA-seq | R2 | 24,511,848 | 24,108,167 | 18,705,533 | 13,617,608 | SRX1304663 | This study |
| mop3_mut | Third seedling leaf | RNA-seq | R3 | 25,275,443 | 24,878,852 | 19,350,320 | 14,070,457 | SRX1304665 | This study |
| mop3_WT | Third seedling leaf | RNA-seq | R1 | 26,095,075 | 25,647,333 | 19,890,702 | 14,587,778 | SRX1304666 | This study |
| mop3_WT | Third seedling leaf | RNA-seq | R2 | 28,229,361 | 27,816,448 | 21,669,858 | 15,847,602 | SRX1304667 | This study |
| mop3_WT | Third seedling leaf | RNA-seq | R3 | 30,221,785 | 29,844,973 | 23,125,845 | 16,941,454 | SRX1304668 | This study |
| B73 | Anther | WGBS | R1 | 163,576,990 | — | 149,299,483 | 110,061,251 | SRX1073655 | This study |
| B73 | Earshoot | WGBS | R1 | 145,888,659 | — | 133,360,168 | 98,421,820 | SRX1073668 | This study |
| B73 | Shoot apex | WGBS | R1 | 149,972,487 | — | 137,336,092 | 99,528,043 | SRX1073669 | This study |
| PH207 | Third seedling leaf | WGBS | R1 | 381,565,504 | 370,858,304 | 335,710,384 | 252,122,695 | SRX1073654 | This study |
| B73 | Seedling | H3K9me2 ChIP-seq | R1 | 420,115,648 | — | — | — | SRP043372 | 10 |
| B73 | Ear | Nucleosome occupancy | R1_1min digestion | 36,863,110 | — | — | — | SRR1584249 | 13 |
| B73 | Ear | Nucleosome occupancy | R1_16min digestion | 118,464,539 | — | — | — | SRR1584255 | 13 |
| mop1_WT | Ear | RNA-seq | R1 | 20,821,254 | — | 19,334,148 | 18,281,206 | SRX708753 | 13 |
| mop1_WT | Ear | RNA-seq | R2 | 34,043,132 | — | 31,529,171 | 29,787,223 | SRX708779 | 13 |
| mop1_WT | Ear | RNA-seq | R3 | 12,558,325 | — | 11,688,244 | 11,061,908 | SRX708783 | 13 |
| B73 | Third seedling leaf | WGBS | R1 | 167,762,882 | 165,993,413 | 153,157,928 | 77,248,366 | SRR850328 | 16 |
| B73 | Third seedling leaf | RNA-seq | R1 | 13,399,149 | 13,166,087 | 11,856,209 | 9,334,529 | SRX180963 | 16 |
| B73 | Third seedling leaf | RNA-seq | R2 | 14,152,930 | 13,903,708 | 12,531,859 | 10,144,111 | SRX180964 | 16 |
| B73 | Third seedling leaf | RNA-seq | R3 | 19,270,220 | 18,916,505 | 15,868,339 | 12,320,081 | SRX181056 | 16 |
| Mo17 | Third seedling leaf | RNA-seq | R1 | 15,801,487 | 15,282,117 | 12,218,379 | 8,871,055 | SRX181198 | 16 |
| Mo17 | Third seedling leaf | RNA-seq | R2 | 17,745,280 | 17,143,861 | 14,469,043 | 10,682,440 | SRX181199 | 16 |
| Mo17 | Third seedling leaf | RNA-seq | R3 | 14,336,816 | 13,857,679 | 11,705,741 | 8,660,939 | SRX181200 | 16 |
| Oh43 | Third seedling leaf | RNA-seq | R1 | 11,766,590 | 11,396,196 | 9,623,331 | 6,913,949 | SRX181211 | 16 |
| Oh43 | Third seedling leaf | RNA-seq | R2 | 14,922,276 | 14,458,409 | 12,173,809 | 8,959,679 | SRX181212 | 16 |
| Oh43 | Third seedling leaf | RNA-seq | R3 | 18,200,539 | 17,793,763 | 15,337,849 | 11,360,772 | SRX181213 | 16 |
| CML322 | Third seedling leaf | RNA-seq | R1 | 12,775,245 | 12,556,754 | 10,028,106 | 7,804,254 | SRX181176 | 16 |
| CML322 | Third seedling leaf | RNA-seq | R2 | 19,471,126 | 19,317,406 | 16,978,351 | 7,572,975 | SRX246015 | 16 |
| CML322 | Third seedling leaf | RNA-seq | R3 | 23,053,240 | 22,781,831 | 19,577,751 | 14,065,751 | SRX246016 | 16 |
| Tx303 | Third seedling leaf | RNA-seq | R1 | 17,474,393 | 17,015,974 | 14,656,574 | 10,644,377 | SRX181215 | 16 |
| Tx303 | Third seedling leaf | RNA-seq | R2 | 26,313,196 | 25,963,716 | 22,356,835 | 17,326,455 | SRX246073 | 16 |
| Tx303 | Third seedling leaf | RNA-seq | R3 | 25,230,109 | 24,911,311 | 21,467,085 | 16,642,550 | SRX246074 | 16 |
| Mo17 | Third seedling leaf | WGBS | R1 | 143,155,853 | 141,635,754 | 112,238,471 | 43,838,004 | SRR850332 | 19 |
| Oh43 | Third seedling leaf | WGBS | R1 | 160,563,110 | 154,446,717 | 125,846,301 | 52,354,128 | SRX731433 | 19 |
| CML322 | Third seedling leaf | WGBS | R1 | 186,299,504 | 178,448,613 | 144,619,632 | 56,571,692 | SRX731432 | 19 |
| Tx303 | Third seedling leaf | WGBS | R1 | 169,478,899 | 162,731,207 | 131,664,899 | 53,347,056 | SRX731434 | 19 |
| mop1 | Immature earshoot | WGBS | R1 | 19,096,896 | 18,320,118 | 16,952,965 | 12,878,556 | SRX731422 | 22 |
| B73 | Third seedling leaf | SeqCap | R1 | 3,257,454 | — | 2,837,572 | 2,553,613 | SRX729949 | 22 |
| mop1 | Immature earshoot | SeqCap | R1 | 5,314,980 | — | 4,644,155 | 4,012,298 | SRX731453 | 22 |
| mop2 | Third seedling leaf | SeqCap | R1 | 3,912,476 | — | 1,511,963 | 1,311,112 | SRX731454 | 22 |
| mop3 | Third seedling leaf | SeqCap | R1 | 4,247,580 | — | 2,527,260 | 2,140,470 | SRX731455 | 22 |
Data Analysis.
All datasets were aligned to the AGPv2 B73 reference sequence (11) or the PH207 reference sequence. Annotations of genes and transposons were obtained from ftp://ftp.gramene.org/pub/gramene/maizesequence.org/release-5b/. DNA methylation, ChIP-seq, and chromatin accessibility data were calculated for 100-bp nonoverlapping sliding tiles in the maize genome and these tiles were annotated based on location relative to genes or transposons using BEDTools (32). Details for the analysis of gene expression, DNA methylation, and ChIP-seq data are available in SI Materials and Methods.
SI Materials and Methods
WGBS.
WGBS data from several tissues and genotypes were used in this study. Several datasets have been previously described including B73 seedling leaf (16) and Mo17, Oh43, CML322, and Tx303 seedling leaf tissue (19). Four additional WGBS datasets were generated for the experiments in this study. DNA was isolated from the third leaf of 14-d-old PH207 seedlings, shoot apex tissue from 14 d B73 seedlings, and immature ear (5 mm) and anther tissue from B73 plants. For shoot apex tissue the root and the first few outside stem layers were removed with a scalpel, and about a 1-cm segment from the base of the seedling was collected. The uppermost immature earshoot, which is also the largest one, was collected. Earshoots from five plants were pooled together for one replicate. Anther was collected from the middle one-third portion of the main spike. Each replicate is from a pool of three tassels. They were collected when the glume was pale yellow and the length of the anther was about half the length of the glume. For each tissue, we collected three biological replicates. DNA was prepared using CTAB from one replicate and used for bisulfite sequencing.
The sodium bisulfite converted sequencing libraries were prepared as before (16, 22). For each library, 100-bp paired reads were obtained and the accession numbers for each sample are provided in Table S2. Reads quality was assessed by FASTQC and adapters were trimmed by Trim_glore. Read tails with quality less than 20 were also removed. For the tissue samples that were collected in the B73 inbred line, reads were mapped to B73 reference genome using BSMAP with five total mismatches (33). Duplicates reads were removed. Methylation levels at a single cytosine were called using methratio.py in BSMAP for properly paired and uniquely mapped reads. For the regions that are covered by both ends of a read pair, only one read is used to call methylation. For the PH207 seedling sample, reads were mapped to PH207 assembly by Bismark (34) allowing one mismatch in the seed sequence (-N 1 -L 25). PCR duplicates were removed using Picard-tools. Only uniquely mapped and properly paired reads were retained for methylation call, which was performed using the methylation extract tools in Bismark. Methylation levels were then summarized for each of the three sequence contexts (CG, CHG, and CHH) across 100-bp nonoverlapping sliding windows using the formula #C/(#C+#T) (35).
RNA-seq.
RNA-seq data from several tissues and genotypes were used for this study. RNA-seq data for B73, Oh43, Mo17, CML322, and Tx303 seedling leaf were from Eichten et al. (16). Additional RNA-seq data were generated for shoot apex tissue from 14 d B73 seedlings, immature ear (5 mm) and anther tissue from B73, and the third leaf of 14-d-old seedling from mop1 and mop3 mutants and wild-type siblings. The isolation of B73 tissues is described above. Segregating families (n = 30 per replicate) for mop1 and mop3 were grown in a greenhouse and sampled at 9:30 AM. Genotyping was performed to identify homozygous mutant and wild-type plants and at least four plants were pooled for each genotype and each of the three replicates. RNA was isolated using TRIzol and RNA quality was checked using Agilent Bioanalyzer. Samples with an RNA integrity number greater than 8 were submitted to the University of Minnesota Genomics Center for library creation using the Illumina TruSeq protocol. Libraries with an insert size between 200–300 bp and a quantity greater than 2 nM were used for sequencing on HiSeq2500 under the high-throughput 50-nt paired-end mode. Accession numbers are available in Table S2.
RNA-seq data from 4- to 6-cM immature ear shoots of mop1-1 homozygotes in a B73 background (and wild-type sibling) were also generated. The isolation of the tissues and sequencing methods are described in detail in Gent et al. (13). Briefly, total RNA was extracted using a mirVana miRNA Isolation Kit (AM1560; Life Technologies) with Plant RNA Isolation Aid (AM9690; Life Technologies). Five micrograms of total RNA was treated with DNase using a DNA-Free RNA Kit (R1013; Zymo Research) according to the small RNA elimination procedure. Sequencing libraries were prepared using a dUTP stranded procedure according to the Illumina guidelines (TruSeq Stranded Total RNA Sample Preparation Guide, Part 15031048, Rev. C).
RNA-seq read quality was checked using FASTQC and adapters were trimmed using cutadapt (36). Reads were then mapped to B73 reference genome allowing one mismatch using TopHat (37). Uniquely aligned and properly paired reads (-f 0 × 0002 -q 50) were filtered using SAMtools (38). The BAM alignment file was then used to summarize total number of reads covering a gene using HTSeq (39) with the union model. The raw read counts per gene were then imported into R. To identify differentially expressed genes pairwise comparisons were performed using DESeq (40). Only genes that have greater than 20 raw read counts in at least three out of the six libraries in a comparison were used to identify differentially expressed genes. Significant differential expression between the two samples required a false discovery rate (FDR) corrected P value <0.05.
RNA-seq data for mop1 earshoot (this study) and its wild-type siblings (13) are single-end 100-nt reads sequenced with Illumina HiSeq2500. RNA-seq data for seedling leaf of mop1 and mop3 and its wild-type siblings are paired-end 50-nt reads sequenced with Illumina HiSeq2500. To identify differentially expressed transposons between mop1/mop3 mutant and their corresponding wild type, the number of uniquely aligned reads was summarized over each transposon based on maize transposable element consortium (MTEC) annotation. Similar to differential analysis for genes, raw read count per transposon was imported into R and was filtered to keep transposons with at least 10 reads for at least three (out of six) libraries. Differentially expressed transposons were identified using DESeq with a requirement of FDR corrected P value <0.05. This leads to the identification of 208 (out of 9005, 2.3%) transposons that are differentially expressed between mop1 mutant and wild type (earshoot). To assess whether transposon elements that are close to a high mCHH tile (>25%) are more likely to be expressed than elements that are far away from a high mCHH tile, each transposon was associated with the nearest high mCHH tile using BEDTools (32), and the average distance between transposon and mCHH tile was calculated for each of four groups of transposon elements that are classified based on whether they have any uniquely mapped reads and whether the mapped reads are differentially expressed between mop1/mop3 and their corresponding wild type. The results are shown in Fig. 4 and Fig. S7.
ChIP-Seq and Chromatin Accessibility.
The H3K9me2 ChIP-seq data were previously described by West et al. (10). The chromatin accessibility data were previously reported by Gent et al. (13). These sequences were aligned to the B73 AGPv2 genome using Bowtie2 and the number of reads for each 100-bp nonoverlapping tile was determined for libraries from a 1-min or 16-min MNase digestion. The ratio of read counts between 1 min and 16 min was used to represent chromatin accessibility, with a greater ratio indicating more open chromatin. H3K4me3 ChIP-seq was performed using the third seedling leaf from 14-d B73 seedlings. The chromatin isolations, immunoprecipitation, and library construction were performed as described in West et al. (10) using an antibody specific for H3K4me3 (07-473; Millipore). The resulting sequences were aligned to the B73 reference genome and the read counts for each 100-bp tile were determined.
Identification and Annotation of mCHH Islands.
All 100-bp tiles with >25% mCHH were identified. They were first annotated to be within a gene or within 2 kb upstream or downstream of a gene (“Filtered Gene Set, FGS”; n = 39,656) from release 5b.60. Those that are far away from a gene (>2 kb) were then compared with the CNS positions (17) and annotated to be near a CNS if they are within 2 kb of the CNS. The other tiles that cannot be annotated within/near either a gene or a CNS were annotated to be in intergenic regions. To plot methylation levels centering on the mCHH islands, the 20 tiles upstream or downstream of the mCHH islands were identified. Methylation was first averaged for each tile over all of the genes or CNS that has an mCHH island, and was then plotted against the distance between the tile and the mCHH islands. The position of mCHH islands was set to position zero. The location of the nearest repetitive element was determined by using BEDTools (32) to compare the location of each 100-bp tile with the MTEC repeat annotation file for 5b.60 (ZmB73_5a_MTEC_repeats.gff.gz). The transposons in this file were classified into four major groups [spreading LTR, nonspreading LTR, DNA transposons, and other (e.g., LINE)] based on the annotations in this file (described in ref. 11) or the functional classification of DNA methylation spreading from Eichten et al. (41). These transposons were also annotated within 2-kb 5′ or 3′ end of genes, within 2 kb of a CNS, or >2 kb from genes and CNS. Transposons that are within 2 kb of a gene were further split into these that are the closest transposons and others.
To determine the methylation levels at the proximal and distal end of TEs, we first identified transposons that have a size greater than 1,000 bp and that are located within 2 kb 5′ or 3′ end of a gene, not including those that are within a gene. The methylation levels at both ends of these elements were determined and the tiles were classified as one of the four types: proximal end of a transposon that is on same strand as a gene (proximal_same), proximal end of a transposon that is on opposite strand of a gene (proximal_opposite), distal end of a transposon that is on same strand as a gene (distal_same), and distal end of a transposon that is on opposite strand of a gene (distal_opposite). The ends of each transposon were defined as the terminal 100 bp. Methylation levels were then averaged over these four types of intervals for each of the four transposon families: spreading LTR, nonspreading LTR, DNA transposons, and other (e.g., LINE).
Comparative Analysis of mCHH Islands in B73 and PH207.
A set of 19,644 genes that have a one-to-one match between B73 and PH207 were identified. Out of these 19,644 genes, 1,672 were annotated on different strands in the two genomes and were discarded. The remaining genes were used for further analysis. The 2-kb promoter sequences from both genomes were extracted using BEDTools for each gene pairs. Genes without promoter sequences or with >10 Ns in the 2-kb promoters were removed, leaving a total of 10,798 gene pairs. The promoter sequences for each gene pair were aligned using ClustalX (42), and genes with InDels greater than 100 bp between the two sequences were identified. The 100-bp criterion was selected because this is the window size used for mCHH island discovery. We then filtered these gene pairs to include these that (i) have methylation data in both B73 and PH207 in the first 10 tiles immediately upstream of the gene transcriptional start site, (ii) have an mCHH island in B73, and (iii) have >100-bp InDels between B73 mCHH islands and the annotated gene start site. After these filtering, 27 gene pairs remained and were used to analyze the association between sequence polymorphism and variation of mCHH islands.
Identification of Genotype or Tissue-Specific mCHH Islands.
We first identified genes that have data in all of the 10 promoter tiles that are immediately upstream of gene TSS in all of the genotypes or B73 tissues. Only the genes that have an mCHH island in at least one of the genotypes or tissues were kept. Then we performed pairwise comparison between genotypes or tissues. Genotype or tissue-specific mCHH islands are defined as those that have >25% CHH levels in one genotype/tissue and <10% CHH in the other genotype/tissue.
Identification and Analysis of mCHH Islands Using SeqCap Epi Data.
The SeqCap data used in this study are from Li et al. (22). Approximately 4 Mb of total genomic space was profiled in wild-type B73 and mop1, mop2, and mop3 mutants. Immature earshoot was used for mop1 and seedling leaf was used for the other three genotypes. mCHH at individual cytosines was calculated using Bismark. Genomic regions with moderate mCHH (mean CHH levels >5%) were identified with DNAcopy (43). The segments were merged if they were within 50 bp of each other. The merged segments were filtered by overlapping with mCHH islands based on WGBS data (to ensure high mCHH levels) and with regions covered by SeqCap probes (to ensure enough coverage). These segments that overlap with mCHH islands from the WGBS data were defined as mCHH islands with base pair resolution (rather than using 100-bp tiles as in the WGBS data). This led to the identification of 347 5′ mCHH islands and 121 3′ mCHH islands, ranging in size from 80 to 300 bp, with a median size of 146 and 158 bp, respectively. Each individual cytosine with methylation data was then associated with the nearest mCHH islands, and those that were within 400 bp of an mCHH island were kept for further analysis to study the changes in methylation in the mutant samples. The distance between each cytosine and the mCHH islands was calculated as follows. Cytosines that are upstream of an mCHH island were assigned a negative distance between the position of the cytosine and the start of the mCHH islands. Cytosines that are within an mCHH island were assigned a normalized distance on the scale of 100 bp by dividing the physical distance between the cytosine and the start of the mCHH island by the size of the mCHH island. Cytosines that are downstream of an mCHH island were assigned a positive value by calculating the distance between the cytosine and the end of the mCHH island. Upstream and downstream of mCHH islands were relative to the direction of the associated gene: Upstream means far away from gene, and downstream means toward gene. The 900-bp regions were then divided into 45 bins with 20 bp for each bin. Methylation levels within each bin were averaged across all mCHH islands, and plotted against the position of the bin.
Acknowledgments
We thank University of Minnesota Genomics Center for library preparation and sequencing. Data analysis used tools and resources provided by the Texas Advanced Computing Center at the University of Texas at Austin. This work was supported by National Science Foundation Grants DBI-1237931 (to N.M.S., M.W.V., I.M., and D.L.) and 0607123 (to R.K.D.). C.D.H. was supported by National Science Foundation National Plant Genome Initiative Postdoctoral Fellowship in Biology Grant 1202724.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: Sequence data have been deposited in the National Center for Biotechnology Information Sequence Read Archive and all accession numbers are provided in Table S2.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1514680112/-/DCSupplemental.
References
- 1.Feng S, et al. Conservation and divergence of methylation patterning in plants and animals. Proc Natl Acad Sci USA. 2010;107(19):8689–8694. doi: 10.1073/pnas.1002720107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zemach A, McDaniel IE, Silva P, Zilberman D. Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science. 2010;328(5980):916–919. doi: 10.1126/science.1186366. [DOI] [PubMed] [Google Scholar]
- 3.Law JA, Jacobsen SE. Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat Rev Genet. 2010;11(3):204–220. doi: 10.1038/nrg2719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Matzke MA, Mosher RA. RNA-directed DNA methylation: An epigenetic pathway of increasing complexity. Nat Rev Genet. 2014;15(6):394–408. doi: 10.1038/nrg3683. [DOI] [PubMed] [Google Scholar]
- 5.Zemach A, et al. The Arabidopsis nucleosome remodeler DDM1 allows DNA methyltransferases to access H1-containing heterochromatin. Cell. 2013;153(1):193–205. doi: 10.1016/j.cell.2013.02.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Stroud H, et al. Non-CG methylation patterns shape the epigenetic landscape in Arabidopsis. Nat Struct Mol Biol. 2014;21(1):64–72. doi: 10.1038/nsmb.2735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Law JA, et al. Polymerase IV occupancy at RNA-directed DNA methylation sites requires SHH1. Nature. 2013;498(7454):385–389. doi: 10.1038/nature12178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Johnson LM, et al. SRA- and SET-domain-containing proteins link RNA polymerase V occupancy to DNA methylation. Nature. 2014;507(7490):124–128. doi: 10.1038/nature12931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408(6814):796–815. doi: 10.1038/35048692. [DOI] [PubMed] [Google Scholar]
- 10.West PT, et al. Genomic distribution of H3K9me2 and DNA methylation in a maize genome. PLoS One. 2014;9(8):e105267. doi: 10.1371/journal.pone.0105267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Schnable PS, et al. The B73 maize genome: Complexity, diversity, and dynamics. Science. 2009;326(5956):1112–1115. doi: 10.1126/science.1178534. [DOI] [PubMed] [Google Scholar]
- 12.Gent JI, et al. CHH islands: De novo DNA methylation in near-gene chromatin regulation in maize. Genome Res. 2013;23(4):628–637. doi: 10.1101/gr.146985.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gent JI, et al. Accessible DNA and relative depletion of H3K9me2 at maize loci undergoing RNA-directed DNA methylation. Plant Cell. 2014;26(12):4903–4917. doi: 10.1105/tpc.114.130427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Regulski M, et al. The maize methylome influences mRNA splice sites and reveals widespread paramutation-like switches guided by small RNA. Genome Res. 2013;23(10):1651–1662. doi: 10.1101/gr.153510.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Secco D, et al. Stress induced gene expression drives transient DNA methylation changes at adjacent repetitive elements. eLife. 2015;4 doi: 10.7554/eLife.09343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Eichten SR, et al. Epigenetic and genetic influences on DNA methylation variation in maize populations. Plant Cell. 2013;25(8):2783–2797. doi: 10.1105/tpc.113.114793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Turco G, Schnable JC, Pedersen B, Freeling M. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses. Front Plant Sci. 2013;4:170. doi: 10.3389/fpls.2013.00170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sekhon RS, et al. Maize gene atlas developed by RNA sequencing and comparative evaluation of transcriptomes based on RNA sequencing and microarrays. PLoS One. 2013;8(4):e61005. doi: 10.1371/journal.pone.0061005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Li Q, et al. Examining the causes and consequences of context-specific differential DNA methylation in maize. Plant Physiol. 2015;168(4):1262–1274. doi: 10.1104/pp.15.00052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Dorweiler JE, et al. mediator of paramutation1 is required for establishment and maintenance of paramutation at multiple maize loci. Plant Cell. 2000;12(11):2101–2118. doi: 10.1105/tpc.12.11.2101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Alleman M, et al. An RNA-dependent RNA polymerase is required for paramutation in maize. Nature. 2006;442(7100):295–298. doi: 10.1038/nature04884. [DOI] [PubMed] [Google Scholar]
- 22.Li Q, et al. Genetic perturbation of the maize methylome. Plant Cell. 2014;26(12):4602–4616. doi: 10.1105/tpc.114.133140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sidorenko L, et al. A dominant mutation in mediator of paramutation2, one of three second-largest subunits of a plant-specific RNA polymerase, disrupts multiple siRNA silencing processes. PLoS Genet. 2009;5(11):e1000725. doi: 10.1371/journal.pgen.1000725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Sloan AE, Sidorenko L, McGinnis KM. Diverse gene-silencing mechanisms with distinct requirements for RNA polymerase subunits in Zea mays. Genetics. 2014;198(3):1031–1042. doi: 10.1534/genetics.114.168518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Li Q, et al. Post-conversion targeted capture of modified cytosines in mammalian and plant genomes. Nucleic Acids Res. 2015;43(12):e81. doi: 10.1093/nar/gkv244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Robinson JT, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Woodhouse MR, Freeling M, Lisch D. The mop1 (mediator of paramutation1) mutant progressively reactivates one of the two genes encoded by the MuDR transposon in maize. Genetics. 2006;172(1):579–592. doi: 10.1534/genetics.105.051383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Jia Y, et al. Loss of RNA-dependent RNA polymerase 2 (RDR2) function causes widespread and unexpected changes in the expression of transposons, genes, and 24-nt small RNAs. PLoS Genet. 2009;5(11):e1000737. doi: 10.1371/journal.pgen.1000737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zhong X, et al. DDR complex facilitates global association of RNA polymerase V to promoters and evolutionarily young transposons. Nat Struct Mol Biol. 2012;19(9):870–875. doi: 10.1038/nsmb.2354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Li S, et al. Detection of Pol IV/RDR2-dependent transcripts at the genomic scale in Arabidopsis reveals features and regulation of siRNA biogenesis. Genome Res. 2015;25(2):235–245. doi: 10.1101/gr.182238.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zheng Q, et al. RNA polymerase V targets transcriptional silencing components to promoters of protein-coding genes. Plant J. 2013;73(2):179–189. doi: 10.1111/tpj.12034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Quinlan AR, Hall IM. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Xi Y, Li W. BSMAP: Whole genome bisulfite sequence MAPping program. BMC Bioinformatics. 2009;10:232. doi: 10.1186/1471-2105-10-232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 2011;27(11):1571–1572. doi: 10.1093/bioinformatics/btr167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Schultz MD, Schmitz RJ, Ecker JR. ‘Leveling’ the playing field for analyses of single-base resolution DNA methylomes. Trends Genet. 2012;28(12):583–585. doi: 10.1016/j.tig.2012.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17(1):10–12. [Google Scholar]
- 37.Kim D, et al. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Li H, et al. 1000 Genome Project Data Processing Subgroup The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Anders S, Pyl PT, Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31(2):166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Eichten SR, et al. Spreading of heterochromatin is limited to specific families of maize retrotransposons. PLoS Genet. 2012;8(12):e1003127. doi: 10.1371/journal.pgen.1003127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Larkin MA, et al. Clustal W and clustal X version 2.0. Bioinformatics. 2007;23(21):2947–2948. doi: 10.1093/bioinformatics/btm404. [DOI] [PubMed] [Google Scholar]
- 43.Venkatraman ES, Olshen AB. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics. 2007;23(6):657–663. doi: 10.1093/bioinformatics/btl646. [DOI] [PubMed] [Google Scholar]











