Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2020 Jun 25;16(6):e1008864. doi: 10.1371/journal.pgen.1008864

­­­­­­Widespread conservation and lineage-specific diversification of genome-wide DNA methylation patterns across arthropods

Samuel H Lewis 1,2,3, Laura Ross 4, Stevie A Bain 4, Eleni Pahita 2,3, Stephen A Smith 5, Richard Cordaux 6, Eric A Miska 1,7, Boris Lenhard 2,3,8, Francis M Jiggins 1,*,#, Peter Sarkies 2,3,*,#
Editor: Wolf Reik9
PMCID: PMC7343188  PMID: 32584820

Abstract

Cytosine methylation is an ancient epigenetic modification yet its function and extent within genomes is highly variable across eukaryotes. In mammals, methylation controls transposable elements and regulates the promoters of genes. In insects, DNA methylation is generally restricted to a small subset of transcribed genes, with both intergenic regions and transposable elements (TEs) depleted of methylation. The evolutionary origin and the function of these methylation patterns are poorly understood. Here we characterise the evolution of DNA methylation across the arthropod phylum. While the common ancestor of the arthropods had low levels of TE methylation and did not methylate promoters, both of these functions have evolved independently in centipedes and mealybugs. In contrast, methylation of the exons of a subset of transcribed genes is ancestral and widely conserved across the phylum, but has been lost in specific lineages. A similar set of genes is methylated in all species that retained exon-enriched methylation. We show that these genes have characteristic patterns of expression correlating to broad transcription initiation sites and well-positioned nucleosomes, providing new insights into potential mechanisms driving methylation patterns over hundreds of millions of years.

Author summary

Animals develop from a single cell to form a complex organism with many specialised cells. Almost all of the fantastic variety of cells must have the same sequence of DNA, and yet they have distinct identities that are preserved even when they divide. This remarkable process is achieved by turning different sets of genes on or off in different types of cell using molecular mechanisms known as “epigenetic gene regulation”. Surprisingly, though all animals need epigenetic gene regulation, there is a huge diversity in the mechanisms that they use. Characterising and explaining this diversity is crucial in understanding the functions of epigenetic pathways, many of which have key roles in human disease. We studied how an epigenetic regulation known as DNA methylation has evolved within arthropods. Arthropods are an extraordinarily diverse group of animals ranging from horseshoe crabs to fruit flies. We discovered that the levels of DNA methylation and where it is found within the genome changes rapidly throughout arthropod evolution. Nevertheless, there are some features of DNA methylation that seem to be the same across most arthropods- in particular we found that there is a tendency for a similar set of genes to acquire methylation of DNA in most arthropods, and that this is conserved over 350 million years. We discovered that these genes have distinct features that might explain how methylation gets targeted. Our work provides important new insights into the evolution of DNA methylation and gives new hints to its essential functions.

Introduction

In most organisms DNA bases are adorned with a variety of chemical modifications. Amongst the most common of these is methylation at the 5 position of cytosine (C5me), which is present from bacteria to humans [13]. In eukaryotes, a key property of cytosine DNA methylation is its ability to act epigenetically—that is, once introduced, methylation at specific cytosines can remain in place through cell division [4],[5]. This relies on the activity of “maintenance” methyltransferases, DNMT1 in animals [6], which recognise CG dinucleotides (CpG sites) where one strand is methylated and one strand unmethylated and catalyse the introduction of methylation on the unmethylated strand [4]. Meanwhile “de novo” methyltransferases act on unmethylated DNA. In animals this role is performed by DNMT3 enzymes, which introduce 5meC predominantly within CpG sites [4]. Mechanisms also exist to remove methylation from DNA, including the TET family of enzymes, which convert 5meC to a hydroxymethylated intermediate which can be removed by base excision repair or diluted out through cell division [5]. As the maintenance and de novo methylation of CG sequences occurs through the activity of homologous enzymes in plants and animals [6], CpG methylation was likely present among the earliest eukaryotic organisms.

In mammals, a key function of CpG methylation is to defend the genome against transposable elements (TEs) by preventing their transcription and transposition [7], and loss of DNA methylation leads to reactivation of TEs [8]. CpG methylation targeted to the promoters of genes can also suppress transcription, typically resulting in stable silencing [7]. Another notable genome-wide pattern is the enrichment of CpG methylation within the exons of transcribed genes [9]. In contrast to TE and promoter methylation, this is not associated with transcriptional silencing.

Whilst CpG methylation at both TEs and gene bodies is present in both plants and animals [3], across eukaryotic species both DNA methylation levels and the targets of methylation have evolved rapidly [10,11]. Most strikingly, CpG methylation has been independently lost altogether several times, coinciding with the loss of DNMT1 and DNMT3 DNA methyltransferase enzymes [6,10,11]. Across eukaryotes, loss of CpG methylation tends to be accompanied by loss of the DNA alkylation repair enzyme ALKB2, which repairs damage caused by DNMTs introducing 3-methylcytosine into DNA. This suggests that some species correct DNA alkylation using ALKB2, and others avoid it altogether by losing the DNA methylation pathway [12]. Even within species that retain DNA methyltransferases, the genomic distribution of CpG methylation differs widely [1016]. Such variability is surprising given the essential role of CpG methylation in genome regulation in mammals and plants, and there are few clues as to what factors drive the changes. Tracing the evolution of CpG methylation is currently challenging because detailed descriptions of DNA methylation patterns are patchy and focussed on model systems, leaving large parts of the phylogenetic tree underexplored.

Here we study CpG methylation patterns across arthropods. Arthropods have been suggested to represent a very different system of CpG methylation from mammals [17]. Whilst the well-characterised model organism Drosophila melanogaster lacks DNA methylation altogether, DNA methyltransferases 1 and 3 were found in the honey bee Apis mellifera [18]. Genome-wide CpG methylation mapping demonstrated that methylation was globally extremely low, and restricted to the bodies of a subset of transcribed genes [10,19]. Subsequently, similarly restricted patterns of DNA methylation were found in other insects [1922]. Such patterns support the proposal that gene body methylation is conserved across eukaryotes while TE methylation has been lost altogether in arthropods [10,17]. However some insects show considerably higher levels of genome-wide methylation [13], and variation in arthropod methylation levels also exists outside of insects [15,2326]. There is also evidence of TE methylation in the desert locust Schistocerca gregaria [19]. A thorough reconstruction of the evolution of methylation across the phylum is still lacking.

We set out to explore the evolution of arthropod methylation patterns by characterising genome-wide CpG methylation across the phylum. We show that TE methylation was ancestral to arthropods, although at a relatively low level. Methylation of protein-coding genes was also ancestral to arthropods, with similar subsets of genes being targeted for methylation across arthropods. Despite these conserved features, we find many examples of diversification in methylation patterns across arthropods, in particular loss of gene methylation in crustaceans and gain of both promoter methylation and genome-wide TE methylation in the myriapod Strigamia maritima and the insect Planococcus citri. We find that methylation at genes, enriched within exons, is the most widely conserved feature of arthropod methylomes and we use comparative analysis to identify a link between exon methylation and nucleosome positioning. Overall, our findings demonstrate that while key features of global methylation patterns have been conserved across millions of years of arthropod evolution, the targets of DNA methylation can rapidly diverge within individual lineages.

Results

Genome-wide levels of CpG methylation vary widely across the arthropods

We carried out high-coverage whole-genome bisulphite sequencing (WGBS) on 14 species of arthropod and quantified the levels of DNA methylation with base-pair resolution. To examine genome-wide methylation levels we combined this data with published results from 15 additional species [13,24,27,28]. Estimates of genome-wide CpG methylation were then used to reconstruct ancestral methylation levels across the arthropod phylogeny. All 18 species of holometabolous insects had low levels of CpG methylation, and this was likely the ancestral state of this clade (Fig 1A and 1B). While CpG methylation rates in other arthropod clades tended to be higher, they varied considerably (Fig 1A and 1B). The ancestral arthropod likely had moderate methylation levels (9.1±4.8%; Fig 1A) but higher methylation levels evolved in S. maritima. Similarly, the ancestor of insects had methylation levels lower than some taxa such as B. germanica (4.0±3.2% versus 10.9%) indicating that methylation level fluctuated throughout arthropod evolution.

Fig 1. Genome-wide CpG methylation across the arthropod phylogeny.

Fig 1

(A) A phylogeny of 29 31 arthropod species that have publicly available or newly computed genome-wide methylation estimates, with branches coloured to show an ancestral state reconstruction of the percentage of CpG sites that are methylated in the genome. (B) The percentage of CpG sites that are methylated genome-wide. (C) The number of DNMT and ALKB2 homologues in the genomes of each species.

To investigate the evolution of the DNA methylation machinery across arthropods, we searched the genomes of these species for homologues of the genes encoding the methyltransferases DNMT1-3. We confirmed the genes all encoded a full cytosine methyltransferase domain, and where we did not find annotated homologues we directly search the genomic DNA for unannotated genes. In each species we found a single copy of DNMT2, which methylates tRNAs [29] (Fig 1C). DNMT1 was present in all species apart from the five Diptera (Fig 1C). The loss of this gene was associated with the loss of CpG methylation (Fig 1C), with methylation rates in D. melanogaster not significantly different from the unmethylated DNA spike-in included in each reaction. DNMT3 was absent from the genomes of 14 species, with inspection of the tree suggesting at least eight independent losses (Fig 1C). Several of these species possessed moderate levels of CpG methylation (Fig 1B), indicating that DNMT1 alone can be sufficient for introducing genome-wide DNA methylation, consistent with earlier studies in arthropods and nematodes [12,13,20].

Across the eukaryotes ALKB2, which repairs DNA alkylation damage introduced by DNMTs, tends to be lost from the same taxa as DNMT1 and 3 [12]. Arthropods exhibited many exceptions to this general rule—there have been at least five losses of ALKB2 but only one of these is associated with the loss of DNMT1 and 3 (Fig 1C). However, we found that species with ALKB2 possessed higher methylation levels (S1 Fig.; phylogenetic mixed model: p = 0.0194), suggesting ALKB2 is dispensable in species with low levels of DNA methylation.

Rapid loss and gain of TE methylation across arthropods

In mammals, plants and nematodes, transposable elements (TEs) are preeminent targets of DNA methylation, but previous studies have shown that TE methylation is rare in holometabolous insects [10,11,19,21,22]. However, DNA methylation has been found at TEs in some arthropods [15,23,25,30]. To explore the distribution of TE methylation across arthropods we annotated transposable elements using RepeatMasker analysis of the entire genome, and removed annotations that did not contain Pfam domains derived from transposable elements. We focused on 14 species that represent the diversity of arthropods, and have assembled and annotated genomes (see Fig 2B).

Fig 2. Methylation of transposable elements.

Fig 2

For 14 diverse arthropod species with annotated genomes, we explored methylation characteristics of genomic features. (A) Density plot of the mean % CpG methylation per gene and per TE. (B) Ancestral state reconstruction of the mean % methylation of CpGs within TEs. (C) Mean % methylation of CpGs within TEs with 95% bootstrap confidence intervals. Red points are CpGs >1kB from annotated regions of the genome. (D,E) Metagene plot of methylation within TEs (pink) and in flanking sequence for S. maritima and P. citri.

Compared to unannotated regions of the genome, TEs were strongly enriched for methylation in S. maritima and P. citri, and weakly enriched in several other species (Fig 2B and 2C). This pattern is reflected in the distribution of methylation across TEs—this is skewed towards 0% for most species, but in S. maritima and P. citri the large majority of TEs are methylated (Fig 2A,B; S2 Fig). In these two species there was a sharp drop in methylation rates at the boundary of the TE (Fig 2D). In agreement with earlier studies [19,22], the methylation rate of TEs was low in holometabolous insects. However, outside of this group there was moderate methylation of TEs in chelicerates (L. polyphemus and P. tepidariorum), the crustacean P. hawaiensis and hemimetabolous insects (B. germanica and A. pisum) (Fig 1A and 1C). To further quantify the extent of TE methylation, we clustered TEs into highly- and lowly-methylated groups in each species separately, and calculated the proportion of TEs that were assigned to the highly-methylated group (Table 1). Ancestral state reconstruction suggested that a low level of TE methylation was present in the ancestral arthropod, but was lost in the ancestor of holometabolous insects (Fig 2A). Contrastingly, the large majority of TEs were targeted by methylation in S. maritima and P. citri, suggesting independent acquisition of TE methylation in these species.

Table 1. Proportion of Genes and TEs that are highly methylated.

TEsa Genes
Species Number Proportion methylatedb Number Proportion methylatedb
Acyrthosiphon pisum 293 0.017 13147  0.171 
Apis mellifera 7 0.143 10066  0.272 
Armadillidium vulgare 655 0.020 4703  0.019 
Blattella germanica 276 0.145 9272  0.387 
Bombus terrestris 78 0.128 8550  0.069 
Heliconius melpomene 34 0.088 11583  0.077 
Ixodes scapularis 212 0.033 5775  0.219 
Limulus polyphemus 342 0.117 7227  0.265 
Nicrophorus vespilloides 9 0.111 12305  0.032 
Parasteatoda tepidariorum 622 0.032 9742  0.243 
Parhyale hawaiensis 89 0.079 3302  0.028 
Planococcus citri 361 0.751 34044  0.099 
Strigamia maritima 719 0.758 12898  0.326 

a TEs with annotated TE-associated domains (see Methods);

b the proportion falling into the highly methylated group after clustering each feature type within each species

To investigate changes to the DNA methylation machinery that accompanied the gain of TE methylation, we examined the DNMT genes found in S. maritima and P. citri. We did not find DNMT3 homologues in the genome of P. citri. However, both species contain DNMT1 genes that have lost the CXXC domain, despite this domain being conserved in 8 species that lacked TE methylation (S3 Fig and S4 Fig; incomplete genome assembly prevented some species from being examined). This domain has been suggested to contribute to the specificity of DNMT1 for hemimethylated DNA [31], suggesting that DNMT1 may act on unmethylated DNA in these species. It is possible that independent loss of this domain in DNMT1 homologues in the two species might be associated with acquisition of TE methylation. Alternatively, in the fungus Cryptococcus neoformans it has been suggested that high levels of methylation of features including TEs reflect long-term maintenance of DNA methylation coupled with stochastic changes to methylation and selection on beneficial methylated variants [32].

Methylation at exons is conserved across most arthropods

We next investigated methylation at genes across arthropods. In all but one of the species we tested, mean methylation levels across exons were significantly higher than unannotated regions of the genome (Fig 3B). The exception was P. hawaiensis, where exons are significantly less methylated than unannotated regions of the genome (Fig 3B). There is a significant difference between methylation at exons and introns in P. hawaiensis (p = 0.001, paired t test). In the species with exon methylation, the distribution of methylation suggested that a subset of genes is targeted for methylation (Fig 3C). When clustered into highly and lowly methylated genes, the proportion of methylated genes varied similarly to mean methylation across genes (Table 1).

Fig 3. Gene body methylation.

Fig 3

(A) Ancestral state reconstruction of the mean % methylation of CpGs within exons. (B) Mean % methylation of CpGs within exons with 95% bootstrap confidence intervals. Red points are CpGs >1kB from annotated regions of the genome. (C) Metagene plot of methylation across introns (white), exons (pink), UTRs (blue) and 1kB of flanking sequence (white).

To investigate the distribution of methylation within genes, we compared the methylation levels at exons and introns in each species. Methylation was higher at exons in the majority of species, suggesting that the gene body methylation in arthropods is due to targeting of methylation to exons. However, there was little difference between exons and introns for the two crustaceans, P. hawaiensis and A. vulgare (Fig 3C; S3 Fig). Methylation of genes has been described in the crayfish [26] and suggested in Daphnia [24] albeit at very low levels. Given that P. hawaiensis exons are depleted for methylation relative to the genome-wide background while A. vulgare exons are only slightly greater than the background, this may reflect an ancient loss of gene body methylation in the Peracaridian ancestor of these species. Among species with exon methylation, there were differences in how methylation levels changed across the gene (Fig 3C). For example, methylation was largely confined to the first three exons of P. citri and N. vespilloides, while methylation in B. germanica is largely found from exon four onwards (Fig 3C). However, there were no clear phylogenetic trends within these patterns suggesting patterns of methylation across genes likely change frequently during evolution. Together these data suggest that exon-enriched methylation was an ancestral property of arthropod methylomes which is largely conserved across the phylum.

Independent acquisition of promoter methylation in arthropod lineages

In mammals, methylation of regions immediately upstream of genes, often at CpG islands, is associated with gene silencing. However, there is no evidence of promoter methylation in insects [19,20,22]. To examine promoter methylation associated with gene silencing across arthropods, we extracted 1kb upstream of genes for all species. In most species there was little difference in upstream methylation between high and low expression genes; however, low expression genes in P. citri and S. maritima had significantly higher upstream methylation than high expression genes (Fig 4A). In S. maritima only genes with very high upstream methylation showed clearly reduced gene expression (p = 1e-15, Kruskal Wallis test), whilst in P. citri there was a positive correlation between upstream methylation and gene expression across a wider range of upstream methylation levels (Fig 4B). The different relationship between upstream methylation and gene expression between S. maritima and P. citri and the lack of a similar relationship in other arthropod species suggests that promoter methylation associated with gene silencing may have evolved independently in these two species.

Fig 4. Promoter methylation.

Fig 4

(A) Methylation across upstream regions for highly expressed genes (top 20%) and lowly expressed genes (bottom 20%). P. hawaiensis is omitted due to lack of gene expression data. Expression of genes across bins of decreasing upstream methylation in S. maritima (B) and P. citri (C).

Methylated genes are conserved and have moderate to high expression

Our results suggest that the most highly conserved feature of arthropod methylomes is enrichment of methylation at the exons of a subset of genes. Similar conclusions as to the dominance of methylation within genes in arthropods have been reached through a systematic analysis of insects [13] and in analyses of several individual arthropod species outside insects [2426,30]. Across species, we asked whether there was any tendency for orthologous genes to be methylated in different species. We ranked orthologous genes by relative methylation levels across species and observed that there was a clear tendency for orthologs to have similar levels of methylation in different species (Fig 5A). The observation that the same genes are methylated in different species raised the question of what determines which genes acquire methylation. We used comparative analysis to investigate this across the phylum.

Fig 5. The expression and conservation of methylated genes.

Fig 5

(A) Methylation of orthologous genes in different species. Only genes with orthologs in all species are shown, and in species with multiple paralogs the mean % CpG methylation is shown. Genes are ranked by their mean methylation. (B) Histogram of gene expression estimated from RNAseq data for methylated and unmethylated genes in L. polyphemous (FPKM: fragments per kilobase million). (C) The relationship between gene expression and CpG methylation for genes that are conserved across all species and species-specific genes. To combine data across species, the methylation rate was normalised by taking the Z-score of methylation and expression of each gene within each species. Each point is a gene from a single species, and the colour represents the density of overlaid points. (D) Fraction of housekeeping genes as defined in references [37] for the top 20% and bottom 20% of methylated genes.

Methylation has been shown to be enriched at alternatively spliced genes in some insects [19,22]. To test for a link between methylation and splicing across arthropods, we compared the level of methylation between genes with one exon (which cannot undergo splicing) and genes with two or more exons (which may undergo splicing). We found no clear difference in any species (S5 Fig), suggesting that splicing does not explain the propensity of genes to acquire methylation across arthropods.

Previously, methylation of genes in individual insect species has been correlated to higher levels of expression [20,22]. We find a statistically significant tendency for genes with high methylation to have higher expression across most species. However, many highly expressed genes are not methylated. Instead a more prominent trend is for methylated genes to have more focussed levels of gene expression such that genes with very low expression levels are rarely methylated (Fig 5B and 5C; S6 Fig). Curiously, this pattern is reversed in P. citri, where the exons of methylated genes tend to have low expression (S6 Fig).

Previously it has been noted that genes with high levels of methylation in arthropods are more likely to perform conserved “housekeeping” functions [26,33,34]. We clustered genes into orthologous groups across species and examined genes that were conserved across all species compared to species-specific genes. Across all species carrying gene body methylation, conserved genes with moderate to high expression were more likely to be methylated (Fig 5C; S6 Fig). Orthologues of genes with high methylation were strongly enriched for genes annotated as housekeeping genes in D. melanogaster on the basis of consistent expression across developmental timepoints and tissues [35] (50% of methylated genes had housekeeping functions as opposed to 15% of unmethylated genes; Fisher’s Exact Test: p = 2 x 10−16; Fig 5D). Nevertheless many conserved and highly expressed genes, including those annotated as housekeeping genes, were not highly methylated, suggesting that neither conservation nor expression is sufficient to explain gene body methylation.

Nucleosome positioning influences DNA methylation levels across arthropods

In order to investigate molecular mechanisms that might be responsible for influencing DNA methylation we examined how the correlation in methylation between pairs of CpGs varied with increasing separation. In many species with exon-enriched methylation the correlation coefficient between methylation levels of individual CpGs oscillated periodically (Fig 6A and 6B). Fourier analysis showed that the period of oscillation was ~160 nucleotides, roughly corresponding to the average nucleosome repeat length (Fig 6A and 6B; S7 Fig). We quantified this nucleosome-length periodicity within exons across all species. While the majority of species with exonic methylation displayed a potential nucleosome-length periodicity signal, its magnitude varied greatly–for example H. melpomene has gene methylation but less apparent periodicity (Fig 6B). Interestingly a clear signal of periodicity was also seen for TE methylation in S. maritima and P. citri, both of which have high levels of TE methylation (S7 Fig).

Fig 6. Nucleosome occupancy and DNA methylation.

Fig 6

The Pearson correlation coefficient in DNA methylation levels between pairs of CpG at different distances apart in (A) S. maritima and (B) H. melpomene. (C) Nucleosome occupancy in D. melanogaster orthologues of genes that are either highly methylated (grey) or unmethylated (red) in arthropods. Shaded area is a 95% bootstrap confidence interval. Across all species in the dataset, mean methylation levels were estimated for each group of orthologous genes using a general linear mixed model. The top and bottom 20% were classified as methylated and unmethylated respectively. Only genes with orthologs in all species are shown. (D) Interquartile range of the TSS window for the D. melanogaster orthologues of highly methylated orthogroups (top 20%) and lowly methylated orthogroups (bottom 20%). (E) The coefficient of variation in expression of genes with high (top 20%) and low (bottom 20%) methylation across different tissues estimated using RNAseq data. P. hawaiensis is omitted because no tissue-specific data is available for this species.

We wondered whether the periodicity in correlation between methylated DNA might reflect an influence of nucleosome positioning on DNA methylation, as has been shown in plants [36] and inferred from analysis of mammalian DNA methylation profiles[37]. Such a connection has not yet been investigated in arthropods. We did not have genome-wide nucleosome positioning data for the majority of species so decided to investigate high-quality nucleosome positioning from Drosophila [38], examining orthologues of genes either enriched or depleted for DNA methylation across arthropods. The promoters of methylated genes possessed high nucleosome occupancy overall and strongly positioned nucleosomes just upstream (-1) and downstream (+1) of the transcription start site (TSS) (Fig 6C). The promoters of unmethylated genes showed lower nucleosome occupancy overall and demonstrated weaker positioning of the -1 and +1 nucleosome. Previous analyses of promoter types across eukaryotes have indicated that promoters with strong positioning of nucleosomes lead to initiation of transcription across a broad region (broad TSS) whilst promoters with weaker nucleosome positioning tend to have a much narrower TSS focussed around a dominant initiation site[39]. Using cap analysis of gene expression (CAGE) data from D. melanogaster we found that the TSS of D. melanogaster orthologs of methylated genes was broader than the TSS of orthologs unmethylated genes (Fig 6C).

Further evidence for a connection between nucleosome occupancy and a periodic signal in the correlation between methylation sites comes from a comparison of exons and introns. Exons are known to have much higher nucleosome occupancy than introns and accordingly the periodic signal of methylation correlation is markedly weaker in introns than in exons (S8 Fig). Together this supports a potential role for nucleosome occupancy in shaping CpG methylation patterns in arthropods.

The patterns of nucleosome occupancy and transcription initiation corresponded to previous analyses across organisms demonstrating that housekeeping genes tend to have well positioned nucleosomes just downstream of promoters and broad TSS whereas tissue-specific genes tend to have less well-defined nucleosome positions at promoters and narrow TSS [4043]. We therefore tested whether methylated genes were more likely to have tissue-specific or global gene expression using RNAseq data from different tissue types. In every species with gene body methylation, we found that methylated genes tended to have less variable expression across different tissues (Fig 6D). Altogether this suggests that across arthropods conserved genes with strongly positioned nucleosomes, broad TSS and housekeeping functions are targeted for methylation whilst tissue-specific genes with opposite patterns of nucleosome occupancy and TSS width tend to be depleted of methylation.

Given that DNA methylation, nucleosome occupancy and constitutive expression are all correlated with each other, we wondered whether nucleosome occupancy influences DNA methylation directly. To examine this, we compared nucleosome occupancy at the Drosophila orthologues of methylated genes that were annotated as housekeeping or not (S9 Fig). Importantly, the pattern of nucleosome positioning was similar for both methylated genes with housekeeping functions and methylated genes without housekeeping functions. Furthermore, housekeeping genes with low levels of methylation had low levels of nucleosome occupancy at promoters (S9 Fig). This implies that nucleosome positioning influences DNA methylation separately from its connection to the housekeeping functions. Therefore, patterns of nucleosome occupancy may explain why genes with housekeeping functions acquire methylation.

Discussion

Molecular pathways involved in epigenetic gene regulation evolve surprisingly rapidly and DNA methylation is no exception. Our work adds to the complex picture of how DNA methylation patterns change across evolutionary time and offers new insight into potential factors influencing the distribution of DNA methylation within genomes.

Plasticity of DNA methylation landscapes

Prior to this study, DNA methylation had been characterised across insects [13] but only isolated species from more basal arthropod clades had been studied [15,2326,30]. By examining a phylogenetically broad range of arthropod methylomes we reconstructed the trajectory of DNA methylation patterns across the phylum. Our data show that ancestral arthropods likely had moderate genome-wide methylation including methylation of a small number of transposable elements. Methylation of genes was also prominent and was enriched in exons over introns; however, the magnitude of the difference between exonic and intronic methylation was not as striking as in insects such as A. mellifera reflecting the presence of a higher background genomic methylation. Crucially our data also show that changes in methylation patterns can evolve rapidly within individual lineages. Most strikingly, we find strong enrichment of TE methylation evolved independently in the centipede S. maritima and the mealybug P. citri, which very likely occurred independently. This enrichment does not correlate to any obvious change in genome structure such as increased TE proportion or genome size, however it is interesting that a recent paper reported acquisition of a relatively recent TE family in S. maritima that acquires high levels of methylation [15], which may underpin gain of TE methylation in that species.

It is intriguing that the two species with high TE methylation had independently acquired methylation of promoters of silent genes, whilst the exons of these genes are devoid of methylation. Gene regulation by promoter methylation is also found in mammals and was likely acquired independently in the sponge Amphimedon queenslandica [16]. In all these cases TE methylation is also prominent so it is possible that the two are linked, perhaps relating to a requirement to control TE-derived promoter regions; however testing this hypothesis would require experimental manipulation of methylation in P. citri or S. maritima which is currently not possible.

It is curious that repeated acquisition of similar types of DNA methylation occurs across phylogenies. This may indicate that targeting of DNA methylation to new regions can be achieved with very few genetic changes. In vertebrates, a possible example is the KRAB-Zinc finger proteins which can recruit DNA methylation to TEs through sequence-specific binding [44]. Further work to identify potential “pioneer” factors that recruit DNMTs to specific regions and underlie the divergence of methylation patterns between species will be of great interest.

Potential factors influencing methylation of genes

Our study confirms earlier studies indicating that the most widely conserved feature of arthropod methylomes is methylation of genes, biased towards exon methylation [17]. Additionally, we confirm insights from insects and crustaceans that broadly expressed, housekeeping genes are more likely to be targeted for methylation than tissue-specific genes [26,34]. This is similar to observations in other animal groups including basal metazoans [45]. Moreover, several analyses in plants provide compelling evidence that genes with consistent expression across tissues and lifestages and evolve at a slower rate tend to have higher levels of methylation [4649]. These facts suggest that methylation of transcribed gene bodies has functional importance in plants; however exactly what the function is unclear and still debated. For example, gene body methylation has been lost completely in the land plant Eutrema salsugineum [50], but studies have come to differing conclusions about whether this has a subtle effect on expression of these genes compared to orthologues in species retaining gene methylation [28,51]. Taken together with studies in animals, it does seem clear that methylation of gene bodies has an ancient evolutionary origin [48,52]. Nevertheless exactly what the function of this modification is remains to be elucidated. Similar to the case of E. salsugineum in plants, our data shows that within arthropods it is dispensable even in species where DNA methylation is present in other regions of the genome, such as the two crustaceans that we examined.

Whilst we cannot decipher the function of exon-enriched DNA methylation, our analyses potentially offer new insights into the molecular mechanisms whereby DNA methylation might be deposited. We identify a remarkable methylation pattern across many arthropods such that methylation levels vary periodically with the nucleosome-repeat length. This striking genome-wide pattern that we observe in some species, in particular S. maritima, has not been observed to our knowledge in any animal species previously. However, there are specific regions within the human genome that display apparently nucleosome length periodicity in the correlation between adjacent sites [37]; furthermore the influence of nucleosomes on methylation by DNMT3B was observed in human and yeast cells[53,54]. Moreover, DNA methylation levels show a 10bp periodicity in Arabidopsis, corresponding to methylation targeting nucleotides on the same face of the nucleosome [36]. Together these observations reflect a positive correlation between nucleosome occupancy and DNA methylation in Arabidopsis and mammals [36]. Exons are known to have better positioned nucleosomes than introns [55,56] which might explain why exons are enriched in methylation across species. We also find that promoters of genes with high levels of methylation tend to carry a clear nucleosome positioning pattern, typical of housekeeping genes, where nucleosome occupancy is high upstream and just downstream of the TSS with a nucleosome-free region between the two [42,43]. Both nucleosome positioning and DNA methylation could be linked to transcription. Since tissue-specific genes are highly expressed in only a few cell types, this might explain why they do not appear methylated in whole animal bisulphite sequencing. This would also explain why across all species genes with very low expression are depleted of methylation (Fig 4D). Alternatively, nucleosomes themselves could dictate where DNA methylation takes place. Supporting this point there is little periodicity in DNA methylation in introns compared to exons (S8 Fig), suggesting that transcription itself is insufficient to account for this effect.

Importantly, the fact that we see these patterns based on nucleosome positioning in Drosophila, where DNA is not methylated, suggests that nucleosome positioning may cause differences in DNA methylation. Consistent with this idea, we show that nucleosome occupancy is high at the promoters of methylated genes that do not have housekeeping functions in Drosophila. Furthermore, nucleosome occupancy is low at the promoters of unmethylated housekeeping genes. Thus, we suggest that nucleosome positioning, rather than housekeeping function, may be a primary determinant of variation in DNA methylation across arthropod genomes. The observed enrichment of methylation at housekeeping genes may therefore be a consequence of the fact that housekeeping genes tend to have distinct patterns of nucleosome occupancy, and not directly related to a function for DNA methylation in regulating housekeeping genes. Such a view is supported by the fact that the effect of loss of gene body methylation on gene expression is subtle [28] and that DNMT1 knockout in the milkweed bug has little consequence for gene expression but still seems essential for embryo viability, implying that there may be distinct functions of DNA methylation unrelated to methylation of a subset of genes [34]. To test these ideas directly will require detailed examination of nucleosome positioning data across arthropods alongside knockouts of DNA methyltransferases. Nevertheless, our analyses may prompt a search for how nucleosome occupancy might determine methylation patterns across eukaryotes.

Methods

DNMT identification

To identify species that have retained or lost the DNA methylation pathway, we searched for homologues of DNMT. For each species, we used DIAMOND [57] to perform BLASTp searches against all annotated proteins, with A. mellifera DNMT1 (NM001171051), DNMT2 (XM006562945) and DNMT3 (NM001190421) as query sequences. We used InterProScan to screen out hits that lacked the C-5 cytosine-specific DNA methylase domain, and NCBI BLASTP to screen out bacterial contaminants (i.e. hits that were more similar to bacterial DNMTs than eukaryotic DNMTs). To classify DNMTs into subclades (DNMT1, 2 & 3) we aligned all homologues with MAFFT, screened out badly-aligned regions with Gblocks [58], and inferred a neighbour-joining phylogenetic tree under the Jukes-Cantor model using Geneious v10.1.3 (https://www.geneious.com).

Genome annotation

To annotate exons in each genome we used existing annotations, excluding genes that were split across multiple contigs. To annotate regions which may contain promoters or enhancers, we took 1,000 bases upstream of each gene, excluding genes where this exceeded the contig start or end point. We annotated introns based on the position of exons, excluding genes that were split across multiple contigs (using intron_finder.py script available at https://github.com/SamuelHLewis/BStoolkit/). To annotate TEs, we used RepeatModeller v1.0.8 to generate a model of TEs for each genome separately, and then RepeatMasker v4.0.6 to annotate TEs based on the model for that genome. Within each TE, we used interproscan [59] to search for the following TE-associated domains: PF03184, PF02914, PF13358, PF03732, PF00665 & PF00077.

To annotate rRNA, we either used existing annotations or RNAmmer v1.2 [60]. To annotate tRNA, we either used existing annotations or tRNAscan-SE v1.3.1 [61]. To avoid ambiguous results caused by overlapping features, we screened out any TE annotations that overlapped any rRNA, tRNA or exon, and any upstream regions which overlapped any TE, rRNA, tRNA or exon.

Whole genome bisulphite sequencing

To measure DNA methylation on a genome-wide scale, we carried out whole-genome bisulphite sequencing. We used the DNeasy Blood and Tissue kit (QIAGEN) according to the manufacturer’s protocol to extract DNA from adult somatic tissues of the following species: L. polyphemus, P. tepidariorum, S. maritima, A. vulgare, B. germanica, A. pisum, B. terrestris, N. vespilloides, H. melpomene and D. melanogaster. For I. scapularis, we used the same method to extract DNA from the IDE2, IDE8 and ISE18 cell culture. To estimate bisulphite conversion efficiency, we added a spike-in of unmethylated DNA (P-1025-1, EpiGentek) equal to 0.01% of the sample DNA mass to each sample. We then prepared whole-genome bisulphite sequencing libraries from each DNA sample using the Pico Methyl-Seq Library Prep Kit (Zymo Research), according to the manufacturer’s protocol (S1 Table for detailed sample metadata and sequence accession codes). We sequenced these libraries on an Illumina HiSeq 2500 instrument to generate 100bp paired-end reads. We used pre-existing whole-genome bisulphite sequencing datasets for P. hawaiensis (SRR3618947, [23]) and A. mellifera (SRR1790690, [62]).

To generate bisulphite sequencing data for P. citri, we extracted DNA from adult females using the DNeasy Blood and Tissue kit (QIAGEN) according to the manufacturer’s protocol. To estimate bisulphite conversion efficiency, we included a spike-in of non-methylated Escherichia coli lambda DNA (isolated from a heat-inducible lysogenic E. coli W3110 strain, provided by Beijing Genomics Institute (BGI), GenBank/EMBL accession numbers J02459, M17233, M24325, V00636, X00906). Sequencing of bisulphite libraries was carried out by BGI on an Illumina HiSeq 4000 instrument to generate 150bp paired-end reads.

Bisulphite sequencing data analysis

Before mapping reads to the genome, we trimmed sequencing adapters from each read, and then trimmed 10 bases from the 5’ and 3’ end of each read (using the script https://github.com/SamuelHLewis/BStoolkit/blob/master/BStrim.sh). We aligned bisulphite sequencing reads to each genome using Bismark v0.19.0 [63] in—non_directional mode with default settings. We used MethylExtract v1.9.1 [64] to estimate the level of methylation at each CpG site, calculated as the number of reads in which the cytosine is methylated divided by the total number of reads covering the cytosine, excluding sites covered by fewer than 10 reads on each strand. Due to the large number of contigs in their genome assemblies exceeding the memory limit for MethylExtract, we split the genomes of I. scapularis, L. polyphemus and P. hawaiensis into individual contigs, ran MethylExtract on each contig separately, and concatenated the resulting output files into one file for each genome.

To estimate the genome-wide background level of CpG methylation, we calculated the mean methylation for all CpGs outside annotated features (exon, intron, upstream region, TE, rRNA & tRNA). To gain an accurate estimate of the methylation level of each feature, we calculated the mean methylation level of all CpGs within that feature, excluding any feature with fewer than 3 sufficiently-covered CpGs (only CpGs covered by >10 reads are analysed). We estimated 95% confidence intervals for the mean methylation of genes and TEs within each species using 1000 nonparametric bootstrap replicates (i.e. genes or TEs were resampled with replacement 1000 times to generate an empirical distribution of the mean).

Phylogenetics and ancestral state reconstruction

To infer the ancestral levels of genome-wide methylation across 29 species of arthropods with newly-produced or publicly-available methylation data (Fig 1), we obtained a time-scaled species tree from TimeTree (www.timetree.org, accessed 01.03.2020). We then used a maximum-likelihood approach to infer the genome-wide methylation level at all internal nodes of this tree based on the levels at the tips, using the fastAnc function within phytools [65].

To infer the ancestral levels of gene-body and TE methylation for the 14 focal species, we constructed a Bayesian time-scaled species tree for 14 focal species (Figs 2 & 3). We first identified 236 proteins present as 1:1:1 orthologues across our species set, concatenated the protein sequences together, and aligned them using MAFFT v7.271 [66] with default settings. We then screened out poorly-aligned regions using Gblocks [58] with least stringent settings. Using this alignment, we constructed a phylogenetic tree using BEAST v1.8.4 [67] to infer branch lengths. We specified a strict molecular clock, gamma-distributed rate variation, no invariant sites, and a birth-death speciation process. We fixed the topology and set prior distributions on key internal node dates (Arthropoda  =  568 ± 29, Insecta–Crustacea  =  555 ± 33, Insecta  =  386 ± 27, Hymenoptera–Coleoptera–Lepidoptera–Diptera  =  345 ± 27, Coleoptera–Lepidoptera–Diptera  =  327 ± 26), deriving these values from an existing phylogenetic analysis of arthropods [68]. We ran the analysis for 10 million generations, and used TreeAnnotator [67] to generate a maximum clade credibility tree. We then used a maximum-likelihood approach to infer the gene-body and TE methylation levels (separately) at all internal nodes of this tree, using the fastAnc function within phytools [65].

To test whether genome-wide methylation levels differ between species with and without ALKB2, we fitted a phylogenetic mixed model using MCMCglmm [69]. To account for phylogenetic non-independence caused by sampling species with different levels of relatedness, we used the branch lengths of the time-scaled (ultrametric) species tree (see above) to calculate a genetic distance matrix, and included this in the model as a random factor. We ran the analysis for 6 million iterations, with a burn-in of 1 million iterations and thinning of 500 generations.

RNA-Seq data analysis

To investigate the link between DNA methylation and transcription, we used RNA-Seq data generated previously for arthropod somatic tissue (NCBI PRJNA386859, [70] and the I. scapularis IDE-8 cell line (SRR1756347, Arthropod Cell Line RNA Seq initiative, Broad Institute, broadinstitute.org). To measure the expression of each feature, we trimmed adaptors and low-quality ends using Trim Galore with default settings, and mapped RNA-Seq reads to the genome of each species using TopHat2 v2.1.1 [71] with default settings for strand-specific libraries (—library-type fr-firststrand mode). We counted the number of reads overlapping each feature using BEDTools coverage v2.25.0 in strand-specific mode, and divided the number of reads by the feature length to generate expression level estimates in fragments per per kilobase million (FPKM).

To test whether variation in tissue-specific expression differs between highly- and lowly-methylated genes, we calculated the coefficient of variation for expression of each gene in each species with RNA-Seq data (i.e. excluding B. germanica, I. scapularis & P. hawaiensis). For S. maritima we used RNA-Seq data for fat body and nerve chord; for P. citri & A. pisum we used RNA-Seq data for female soma and germline; and for all other species we used RNA-Seq data for female and male soma and germline.

Periodic correlation in methylation levels

To obtain an estimate of how the correlation between the methylation levels of sites varied with distance between the sites, we collected all pairs of sites separated by d nucleotides where d could vary between 3 and 500 nucleotides within the same exon. For each separate d we then computed the correlation coefficient across all the pairs. To quantify the periodic component of the signal we subtracted any gradual change in correlation across the entire window by calculating the residuals of a linear model. This signal was subjected to Fourier analysis using the fast Fourier transform algorithm implemented in R. A linear model was used to subtract the baseline across the 500bp and the residuals were used as a time series for input into the algorithm, with 50000 0 values ended on to the end of the series to increase the resolution of the algorithm. The total intensity of the components between 140 and 200 base pairs was calculated to give the nucleosome periodicity for each species.

Nucleosome positioning analysis

The genomic coordinates of the D. melanogaster members of orthogroups conserved across all species were extracted and the top 20% (high methylation) and bottom 20% methylation (low methylation) levels selected. Nucleosome positioning data from the D. melanogaster S2 cell line was downloaded from Modencode [38]. The average signal was computed across 200bp windows spanning 2kb either side of the annotated transcription start site for each gene. The mean signal was computed within the high methylation and low methylation sets separately and a loess fit performed. To obtain confidence intervals, the mean signal was computed on 100 random samples containing 90% of the data and a loess fit calculated on the lowest and highest values obtained for each 200bp window.

CAGE data analysis

Total body RNA was extracted from L3 Drosophila melanogaster (w1118) larvae using the Qiagen RNeasy kit. CAGE library preparation was performed using the nAnT-iCAGE protocol [72]. Two biological replicates were prepared from 5 ug of total RNA each. The libraries were sequenced in single-end 50 bp-pair mode. CAGE tags (47 bp) were mapped to the reference D. melanogaster genome (assembly Release 6) using Bowtie2 [73] with default parameters. Uniquely mapped reads were imported into R (http://www.R-project.org/) as bam files using the standard workflow within the CAGEr package [74]. The 5’ ends of reads are CAGE-supported transcription start sites (CTSSs) and the number of tags for each CTSS reflects expression levels. Raw tags were normalised using a referent power-law distribution and expressed as normalized tags per million (TPMs). Biological replicates were highly correlated (r2 = 0.99) and were therefore merged prior to downstream analyses using standard Bioconductor packages (http://www.bioconductor.org/) and custom scripts.

CTSSs were clustered together into tag clusters, a single functional transcriptional unit, using distance-based clustering, with the maximum distance allowed between adjacent CTSSs being 20 bp. For each tag cluster, the interquantile width was calculated as the distance between CTSSs at the 10th and 90th quartile of the cumulative distribution of expression across the cluster. The interquartile range of each gene within the top 20% and bottom 20% of methylation levels was extracted and compared.

Supporting information

S1 Fig. ALKB2 DNA repair is associated with high levels of DNA methylation across arthropods.

Boxplot showing genome-wide methylation levels in 31 arthropod species with and without ALKB2.

(PDF)

S2 Fig. Metagene plot of methylation within TEs and in flanking sequence for all species.

TEs are shown in pink, flanking sequence in white.

(PDF)

S3 Fig. Domain structure of DNMT1s from arthropods, with the human DNMT1 for comparison.

(AI)

S4 Fig. Alignment of the CXXC domain from DNMT1 highlighting the disruption of this domain in P. citri.

(AI)

S5 Fig. Methylation of single exon and multi-exon genes for all species in which we see gene body methylation.

(PDF)

S6 Fig. Expression patterns of methylated and unmethylated genes for all species (cf Fig 5B).

(PDF)

S7 Fig. Fourier transform analysis of periodicity in DNA methylation correlation.

A) Diagram of method to convert from the pattern into the magnitude of the nucleosome periodicity B, C) Periodicity in coding sequences and transposable elements respectively.

(PDF)

S8 Fig. Intron periodicity is markedly less apparent than exon periodicity.

S. maritima exons 1 to 4 (A) and introns 1 to 4 (B) are shown for comparison.

(PDF)

S9 Fig. Methylation is a better predictor of nucleosomal pattern than housekeeping gene status.

Both housekeeping genes and non-housekeeping genes that are methylated genes show enhanced nucleosome occupancy at the +1 nucleosome.

(PDF)

S1 Table. Details of the tissue type, sex, caste, BioSample Accession and SRA Accession of each sample that was newly-sequenced in this study.

(XLSX)

Acknowledgments

We thank L. Bell-Sakyi, the Tick Cell Biobank, A. McGregor, R. Jenner, M. Akam, A. McLean, D. Collins, R. Kilner, A. Pinharanda and C. Jiggins for providing arthropod samples, and G. Filion for providing a curated list of D. melanogaster housekeeping genes. PS is a recipient of an EMBO Young Investigator award. Sequencing of bisulphite and CAGE libraries was carried out by the LMS Genomics Facility.

Data Availability

Sequence data that was newly-generated for this project have been deposited in the NCBI Short Read Archive under the BioProject accession code PRJNA589724. The source code, input data and newly-identified DNMT & ALKB2 gene sequences are available from the Cambridge Data Repository (https://doi.org/10.17863/CAM.45964). Bisulfite sequencing data for Planococcus citri has been deposited PRJNA610765.

Funding Statement

This research was funded by the Medical Research Council (grant to PS) and the Leverhulme Trust (grant to FMJ and PS).BL was supported by Medical Research Council UK (award MC_UP_1102/1) and Wellcome Trust Investigator Award 106954/Z/15/Z. LR was supported by a NERC fellowship (NE/K009516/1) and the Royal Society (RG160842). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

References

  • 1.Holliday R, Kakutani T, Martienssen R, Richards E. The inheritance of epigenetic defects. Science. 1987;238: 163–70. 10.1126/science.3310230 [DOI] [PubMed] [Google Scholar]
  • 2.Holliday R. Epigenetics: A historical overview. Epigenetics. 2006. pp. 76–80. 10.4161/epi.1.2.2762 [DOI] [PubMed] [Google Scholar]
  • 3.Law JA, Jacobsen SE. Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nature Reviews Genetics. 2010. pp. 204–220. 10.1038/nrg2719 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Jeltsch A. Molecular enzymology of mammalian DNA methyltransferases. Current Topics in Microbiology and Immunology. 2006. pp. 203–225. 10.1007/3-540-31390-7_7 [DOI] [PubMed] [Google Scholar]
  • 5.Nashun B, Hill PWS, Hajkova P. Reprogramming of cell fate: epigenetic memory and the erasure of memories past. EMBO J. 2015;34: 1296–308. 10.15252/embj.201490649 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ponger L, Li WH. Evolutionary diversification of DNA methyltransferases in eukaryotic genomes. Mol Biol Evol. 2005;22: 1119–1128. 10.1093/molbev/msi098 [DOI] [PubMed] [Google Scholar]
  • 7.Bird A. DNA methylation patterns and epigenetic memory. Genes Dev. 2002;16: 6–21. 10.1101/gad.947102 [DOI] [PubMed] [Google Scholar]
  • 8.Walsh CP, Chaillet JR, Bestor TH. Transcription of IAP endogenous retroviruses is constrained by cytosine methylation. Nat Genet. 1998;20: 116–117. 10.1038/2413 [DOI] [PubMed] [Google Scholar]
  • 9.Suzuki MM, Bird A. DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet. 2008;9: 465–476. 10.1038/nrg2341 [DOI] [PubMed] [Google Scholar]
  • 10.Zemach A, McDaniel IE, Silva P, Zilberman D. Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science (80-). 2010;328: 916–919. 10.1126/science.1186366 [DOI] [PubMed] [Google Scholar]
  • 11.Feng S, Cokus SJ, Zhang X, Chen P-Y, Bostick M, Goll MG, et al. Conservation and divergence of methylation patterning in plants and animals. Proc Natl Acad Sci. 2010;107: 8689–8694. 10.1073/pnas.1002720107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Rošić S, Amouroux R, Requena CE, Gomes A, Emperle M, Beltran T, et al. Evolutionary analysis indicates that DNA alkylation damage is a byproduct of cytosine DNA methyltransferase activity. Nat Genet. 2018;50: 452–459. 10.1038/s41588-018-0061-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bewick AJ, Vogel KJ, Moore AJ, Schmitz RJ. Evolution of DNA methylation across insects. Mol Biol Evol. 2017;34: 654–665. 10.1093/molbev/msw264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bewick AJ, Hofmeister BT, Powers RA, Mondo SJ, Grigoriev I V., James TY, et al. Diversity of cytosine methylation across the fungal tree of life. Nat Ecol Evol. 2019;3: 479–490. 10.1038/s41559-019-0810-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.de Mendoza A, Pflueger J, Lister R. Capture of a functionally active methyl-CpG binding domain by an arthropod retrotransposon family. Genome Res. 2019;29: 1277–1286. 10.1101/gr.243774.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.de Mendoza A, Hatleberg WL, Pang K, Leininger S, Bogdanovic O, Pflueger J, et al. Convergent evolution of a vertebrate-like methylome in a marine sponge. Nat Ecol Evol. 2019;3: 1464–1473. 10.1038/s41559-019-0983-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Keller TE, Han P, Yi S V. Evolutionary transition of promoter and gene body DNA methylation across invertebrate-vertebrate boundary. Mol Biol Evol. 2016;33: 1019–1028. 10.1093/molbev/msv345 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wang Y, Jorda M, Jones PL, Maleszka R, Ling X, Robertson HM, et al. Functional CpG methylation system in a social insect. Science (80-). 2006;314: 645–647. 10.1126/science.1135213 [DOI] [PubMed] [Google Scholar]
  • 19.Lyko F, Foret S, Kucharski R, Wolf S, Falckenhayn C, Maleszka R. The Honey Bee Epigenomes: Differential Methylation of Brain DNA in Queens and Workers. Keller L, editor. PLoS Biol. 2010;8: e1000506 10.1371/journal.pbio.1000506 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Xiang H, Zhu J, Chen Q, Dai F, Li X, Li M, et al. Single base–resolution methylome of the silkworm reveals a sparse epigenomic map. Nat Biotechnol. 2010;28: 516–520. 10.1038/nbt.1626 [DOI] [PubMed] [Google Scholar]
  • 21.Wang X, Wheeler D, Avery A, Rago A, Choi JH, Colbourne JK, et al. Function and Evolution of DNA Methylation in Nasonia vitripennis. PLoS Genet. 2013;9 10.1371/journal.pgen.1003872 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bonasio R, Li Q, Lian J, Mutti NS, Jin L, Zhao H, et al. Genome-wide and caste-specific DNA methylomes of the ants camponotus floridanus and harpegnathos saltator. Curr Biol. 2012;22: 1755–1764. 10.1016/j.cub.2012.07.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kao D, Lai AG, Stamataki E, Rosic S, Konstantinides N, Jarvis E, et al. The genome of the crustacean parhyale hawaiensis, a model for animal development, regeneration, immunity and lignocellulose digestion. Elife. 2016;5 10.7554/eLife.20062.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kvist J, Gonçalves Athanàsio C, Shams Solari O, Brown JB, Colbourne JK, Pfrender ME, et al. Pattern of DNA Methylation in Daphnia: Evolutionary Perspective. Genome Biol Evol. 2018;10: 1988–2007. 10.1093/gbe/evy155 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Liu S, Aagaard A, Bechsgaard J, Bilde T. DNA Methylation Patterns in the Social Spider, Stegodyphus dumicola. Genes (Basel). 2019;10: 137 10.3390/genes10020137 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Gatzmann F, Falckenhayn C, Gutekunst J, Hanna K, Raddatz G, Carneiro VC, et al. The methylome of the marbled crayfish links gene body methylation to stable expression of poorly accessible genes. Epigenetics and Chromatin. 2018;11 10.1186/s13072-018-0229-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wu P, Jie W, Shang Q, Annan E, Jiang X, Hou C, et al. DNA methylation in silkworm genome may provide insights into epigenetic regulation of response to Bombyx mori cypovirus infection. Sci Rep. 2017;7: 16013 10.1038/s41598-017-16357-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Bewick AJ, Zhang Y, Wendte JM, Zhang X, Schmitz RJ. Evolutionary and experimental loss of gene body methylation and its consequence to gene expression. G3 Genes, Genomes, Genet. 2019;9: 2441–2445. 10.1534/g3.119.400365 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Goll MG, Kirpekar F, Maggert KA, Yoder JA, Hsieh C- L, Zhang X, et al. Methylation of tRNAAsp by the DNA methyltransferase homolog Dnmt2. Science. 2006;311: 395–8. 10.1126/science.1120976 [DOI] [PubMed] [Google Scholar]
  • 30.Falckenhayn C, Boerjan B, Raddatz G, Frohme M, Schoofs L, Lyko F. Characterization of genome methylation patterns in the desert locust Schistocerca gregaria. J Exp Biol. 2013;216: 1423–9. 10.1242/jeb.080754 [DOI] [PubMed] [Google Scholar]
  • 31.Song J, Rechkoblit O, Bestor TH, Patel DJ. Structure of DNMT1-DNA complex reveals a role for autoinhibition in maintenance DNA methylation. Science (80-). 2011;331: 1036–1040. 10.1126/science.1195380 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Catania S, Dumesic PA, Pimentel H, Nasif A, Stoddard CI, Burke JE, et al. Evolutionary Persistence of DNA Methylation for Millions of Years after Ancient Loss of a De Novo Methyltransferase. Cell. 2020;180: 263–277.e20. 10.1016/j.cell.2019.12.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hunt BG, Glastad KM, Yi S V., Goodisman MAD. Patterning and Regulatory Associations of DNA Methylation Are Mirrored by Histone Modifications in Insects. Genome Biol Evol. 2013;5: 591–598. 10.1093/gbe/evt030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bewick AJ, Sanchez Z, Mckinney EC, Moore AJ, Moore PJ, Schmitz RJ. Dnmt1 is essential for egg production and embryo viability in the large milkweed bug, Oncopeltus fasciatus. Epigenetics Chromatin. 2019;12: 6 10.1186/s13072-018-0246-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Corrales M, Rosado A, Cortini R, Van Arensbergen J, Van Steensel B, Filion GJ. Clustering of Drosophila housekeeping promoters facilitates their expression. Genome Res. 2017;27: 1153–1161. 10.1101/gr.211433.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Chodavarapu RK, Feng S, Bernatavichute Y V., Chen P-Y, Stroud H, Yu Y, et al. Relationship between nucleosome positioning and DNA methylation. Nature. 2010;466: 388–392. 10.1038/nature09147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Zhang L, Xie WJ, Liu S, Meng L, Gu C, Gao YQ. DNA Methylation Landscape Reflects the Spatial Organization of Chromatin in Different Cells. Biophys J. 2017;113: 1395–1404. 10.1016/j.bpj.2017.08.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Ho JWK, Jung YL, Liu T, Alver BH, Lee S, Ikegami K, et al. Comparative analysis of metazoan chromatin organization. Nature. 2014;512: 449–452. 10.1038/nature13415 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Haberle V, Lenhard B. Promoter architectures and developmental gene regulation. Seminars in Cell and Developmental Biology. 2016. 10.1016/j.semcdb.2016.01.014 [DOI] [PubMed] [Google Scholar]
  • 40.Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006;38: 626–635. 10.1038/ng1789 [DOI] [PubMed] [Google Scholar]
  • 41.Hoskins RA, Landolin JM, Brown JB, Sandler JE, Takahashi H, Lassmann T, et al. Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res. 2011;21: 182–92. 10.1101/gr.112466.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Lenhard B, Sandelin A, Carninci P. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat Rev Genet. 2012;13: 233–245. 10.1038/nrg3163 [DOI] [PubMed] [Google Scholar]
  • 43.Haberle V, Li N, Hadzhiev Y, Plessy C, Previti C, Nepal C, et al. Two independent transcription initiation codes overlap on vertebrate core promoters. Nature. 2014;507: 381–385. 10.1038/nature12974 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Quenneville S, Turelli P, Bojkowska K, Raclot C, Offner S, Kapopoulou A, et al. The KRAB-ZFP/KAP1 system contributes to the early embryonic establishment of site-specific DNA methylation patterns maintained during development. Cell Rep. 2012;2: 766–73. 10.1016/j.celrep.2012.08.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Dixon GB, Bay LK, Matz M V. Evolutionary Consequences of DNA Methylation in a Basal Metazoan. Mol Biol Evol. 2016;33: 2285–2293. 10.1093/molbev/msw100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Takuno S, Gaut BS. Gene body methylation is conserved between plant orthologs and is of evolutionary consequence. Proc Natl Acad Sci U S A. 2013;110: 1797–1802. 10.1073/pnas.1215380110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Takuno S, Ran JH, Gaut BS. Evolutionary patterns of genic DNA methylation vary across land plants. Nat Plants. 2016;2: 1–7. 10.1038/NPLANTS.2015.222 [DOI] [PubMed] [Google Scholar]
  • 48.Bewick AJ, Schmitz RJ. Gene body DNA methylation in plants. Current Opinion in Plant Biology. 2017. 10.1016/j.pbi.2016.12.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Seymour DK, Gaut BS. Phylogenetic Shifts in Gene Body Methylation Correlate with Gene Expression and Reflect Trait Conservation. Purugganan M, editor. Mol Biol Evol. 2020;37: 31–43. 10.1093/molbev/msz195 [DOI] [PubMed] [Google Scholar]
  • 50.Bewick AJ, Ji L, Niederhuth CE, Willing EM, Hofmeister BT, Shi X, et al. On the origin and evolutionary consequences of gene body DNA methylation. Proc Natl Acad Sci U S A. 2016;113: 9111–9116. 10.1073/pnas.1604666113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Muyle A, Gaut BS. Loss of Gene Body Methylation in Eutrema salsugineum Is Associated with Reduced Gene Expression. Purugganan M, editor. Mol Biol Evol. 2019;36: 155–158. 10.1093/molbev/msy204 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Zilberman D. An evolutionary case for functional gene body methylation in plants and animals. Genome Biol. 2017;18: 87 10.1186/s13059-017-1230-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Morselli M, Pastor WA, Montanini B, Nee K, Ferrari R, Fu K, et al. In vivo targeting of de novo DNA methylation by histone modifications in yeast and mouse. Elife. 2015;4 10.7554/eLife.06205 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Baubec T, Colombo DF, Wirbelauer C, Schmidt J, Burger L, Krebs AR, et al. Genomic profiling of DNA methyltransferases reveals a role for DNMT3B in genic methylation. Nature. 2015;520: 243–247. 10.1038/nature14176 [DOI] [PubMed] [Google Scholar]
  • 55.Schwartz S, Meshorer E, Ast G. Chromatin organization marks exon-intron structure. Nat Struct Mol Biol. 2009;16: 990–995. 10.1038/nsmb.1659 [DOI] [PubMed] [Google Scholar]
  • 56.Tilgner H, Nikolaou C, Althammer S, Sammeth M, Beato M, Valcárcel J, et al. Nucleosome positioning as a determinant of exon recognition. Nat Struct Mol Biol. 2009;16: 996–1001. 10.1038/nsmb.1658 [DOI] [PubMed] [Google Scholar]
  • 57.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12: 59–60. 10.1038/nmeth.3176 [DOI] [PubMed] [Google Scholar]
  • 58.Castresana J. Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic Analysis. Mol Biol Evol. 2000;17: 540–552. 10.1093/oxfordjournals.molbev.a026334 [DOI] [PubMed] [Google Scholar]
  • 59.Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30: 1236–1240. 10.1093/bioinformatics/btu031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Lagesen K, Hallin P, Rødland EA, Stærfeldt H- H, Rognes T, Ussery DW. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 2007;35: 3100–3108. 10.1093/nar/gkm160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Lowe TM, Eddy SR. tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence. Nucleic Acids Res. 1997;25: 955–964. 10.1093/nar/25.5.955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Galbraith DA, Yang X, Niño EL, Yi S, Grozinger C. Parallel Epigenomic and Transcriptomic Responses to Viral Infection in Honey Bees (Apis mellifera). Schneider DS, editor. PLOS Pathog. 2015;11: e1004713 10.1371/journal.ppat.1004713 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 2011;27: 1571–1572. 10.1093/bioinformatics/btr167 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Barturen G, Rueda A, Oliver JL, Hackenberg M. MethylExtract: High-Quality methylation maps and SNV calling from whole genome bisulfite sequencing data. F1000Research. 2014;2: 217 10.12688/f1000research.2-217.v2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Revell LJ. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol. 2012;3: 217–223. 10.1111/j.2041-210X.2011.00169.x [DOI] [Google Scholar]
  • 66.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian Phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 2012;29: 1969–1973. 10.1093/molbev/mss075 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Misof B, Liu S, Meusemann K, Peters RS, Donath A, Mayer C, et al. Phylogenomics resolves the timing and pattern of insect evolution. Science. 2014;346: 763–7. 10.1126/science.1257570 [DOI] [PubMed] [Google Scholar]
  • 69.Hadfield JD. MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package. J Stat Softw. 2010;33: 1–22. 10.18637/jss.v033.i02 20808728 [DOI] [Google Scholar]
  • 70.Lewis SH, Quarles KA, Yang Y, Tanguy M, Frézal L, Smith SA, et al. Pan-arthropod analysis reveals somatic piRNAs as an ancestral defence against transposable elements. Nat Ecol Evol. 2018;2: 174–181. 10.1038/s41559-017-0403-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14: R36 10.1186/gb-2013-14-4-r36 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Murata M, Nishiyori-Sueki H, Kojima-Ishiyama M, Carninci P, Hayashizaki Y, Itoh M. Detecting expressed genes using CAGE. Methods Mol Biol. 2014. 10.1007/978-1-4939-0805-9_7 [DOI] [PubMed] [Google Scholar]
  • 73.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Meth. 2012;9: 357–359. Available: 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Haberle V, Forrest ARR, Hayashizaki Y, Carninci P, Lenhard B. CAGEr: Precise TSS data retrieval and high-resolution promoterome mining for integrative analyses. Nucleic Acids Res. 2015. 10.1093/nar/gkv054 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Wolf Reik

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

4 Feb 2020

Dear Peter

Thank you very much for submitting your Research Article entitled '­­­­­­Widespread conservation and lineage-specific diversification of genome-wide DNA methylation patterns across arthropods' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review again a much-revised version. We cannot, of course, promise publication at that time.

Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see our guidelines.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Wolf Reik

Section Editor: Epigenetics

PLOS Genetics

Wolf Reik

Section Editor: Epigenetics

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The study be Lewis and colleagues’ profiles genome-wide DNA methylation at single base resolution in species distributed broadly across the arthropod phylum. The observed repeated loss of DNA methylation and its associated DNMTs in some species indicates that the most recent common ancestor possessed DNA methylation. There is widespread evidence that DNMT3 is frequently lost across this phylum, yet, DNMT1 is sufficient to maintain the presence of CpG methylation in the genome. In a couple of species they observed independent acquisition of promoter methylation which was associated with TE acquiring. This is a very exciting observation. Curiously, DNA methylation in arthropods is highly enriched in gene bodies, especially within exons. The authors observed a periodicity associated with DNA methylation that very likely reflects the distribution of nucleosomes. To test this possibility, the authors took advantage of the fact that genes with methylation in arthropods are generally conserved. This provided an opportunity to use nucleosome positioning data from Drosophila, a species without DNA methylation. The authors found that nucleosomes were much better positioned in genes that are generally gene body methylated across the arthropod phylum.

Overall, this is a very nice and timely study that contributes to our growing knowledge that although the DNA methylation machinery is generally conserved, the manner in which the genome uses it are quite diverse. I find it very exciting that the authors found evidence for independent acquisition of promoter methylation that can influence gene expression. These unique species will provide clues on how this process has evolved. The minor comments below are intended to improve upon this nice contribution to the field.

1. I could not find any summary sequencing statistics associated with the WGBS data. Number of aligned reads, coverage, sodium bisulfite conversion efficiencies, etc.

2. Although I really like the use of the comparative analysis using Drosophila nucleosome positioning data, I think the manuscript should be carefully modified to indicate that there is not yet direct data of nucleosome positioning in a species that possesses DNA methylation (at least from this study). I believe the results will be the same, but until the experiment is done the language should be carefully crafted to make this clear.

3. In the species that independently acquire DNA methylation do the DNA methyltransferase acquire additional domains that could influence its ability to target different regions of the genome? Please add this to the discussion.

4. I would also consider that in addition to targeting methylation to exons an alternative explanation that DNA methylation maintenance might be more efficient at these regions. Targeting and maintenance are a bit different and these possibilities could be discussing.

5. In the first figure many methylomes are of low coverage, but a high coverage methylation that was not included is from Oncopeltus fasciatus.

Reviewer #2: Sarkies et al. analyze DNA methylation patterns in a range of arthropod species. They find that most arthropods have low levels of TE methylation, as is common in invertebrates, but that two species have preferential TE methylation that likely evolved independently. This is especially convincing in S. maritima. The authors also provide evidence that promoter methylation causes transcriptional silencing in these species and suggest that this is linked to methylation as a mechanism of TE silencing. A recent paper from Ryan Lister’s lab argues this for a sponge (de Mendoza et al, Nat Ecol Evol. 2019) and echoes a much earlier proposal for the evolution of TE silencing by methylation in vertebrates (Zemach et al., Science 2010). Still, it’s valuable to have additional examples that demonstrate the ease with which this evolutionary development can occur in animals.

The authors’ treatment of gene body methylation is of more concern. The association of gene body methylation with constitutively expressed, evolutionarily conserved housekeeping genes and its corresponding depletion from variably expressed genes is probably the best-known gene body methylation characteristic in plants and animals. The initial description of this phenomenon was published by Steve Jacobsen’s lab in Arabidopsis (Zhang et al., Cell 2006); more elaborate characterizations in Arabidopsis and honeybee were published a little later (Aceituno et al., 2008; Elango et al., 2009). Many more papers have been published since, as illustrated by this quote from a recent paper that reported this phenomenon in the coral Acropora millepora (Dixon et al., Mol Biol Evol 2016):

“We showed that strongly methylated genes in A. millepora tend to have constitutive and ubiquitous functions and are less likely to be differentially expressed across developmental stages and environmental regimes. These results corroborate earlier findings from diverse taxa including plants (Aceituno et al. 2008; Coleman-Derr and Zilberman 2012; Takuno and Gaut 2012), cnidarians (Sarda et al. 2012; Dimond and Roberts 2016), mollusk (Gavery and Roberts 2010, 2013), arthropods (Elango et al. 2009; Wang et al. 2013), and a basal chordate (Suzuki et al. 2013; Keller et al. 2015). The relationship with differential expression in response to environmental regimes suggests the intriguing possibility that gbM could modulate gene expression plasticity.”

Since then, this phenomenon has been reported in several more papers (probably not a complete list): Kvist et al., Genome Biol Evol 2018 for Daphnia; Gatzmann et al., Epigenetics & Chromatin 2018 for crayfish; Liu et al., Genes 2019 for spiders; Harris et al., Epigenetics & Chromatin 2019 for honeybee. Note that all these papers cover arthropods (and half are not cited). The crayfish paper in particular may complicate the authors’ conclusions about crustaceans.

Therefore, it is not “remarkabl[e]” that “the same set of genes are likely to be methylated in all species” or that “these genes have characteristic patterns of expression”. This has been extensively demonstrated in many plant and animal (including arthropod) species. The authors are not confirming an “earlier speculation” as they put it in the discussion, but restating well-supported conclusions from earlier work.

There are similar issues about the association between genic DNA methylation and nucleosomes. First, the authors’ claim that “nucleosome positioning influences DNA methylation levels across arthropods” (line 3, page 10) is far stronger than the data justify. Instead, the authors report a correlation with nucleosome positioning in a species (Drosophila) that lacks DNA methylation. They can’t distinguish if nucleosome positioning influences methylation, vice versa, or both. Second, the relationship between methylation and nucleosomes is hardly unexplored. Of most relevance, enrichment of methylation over nucleosomes and the corresponding nucleosomal periodicity of methylation have been reported long ago for Arabidopsis and humans (Chodavarapu et al., Nature 2010). An enrichment of gene body methylation at nucleosomes has been described more recently in Arabidopsis (Lyons and Zilberman, eLife 2017). Preferential methylation of nucleosomes in genes is not a new finding. Other strong associations between nucleosomes and DNA methylation (cytosine and adenine) have also been reported, including the ability of methylation to influence nucleosomes (for example, Huff and Zilberman, Cell 2014; Fu et al., Cell 2015; Beh et al., Cell 2019). Finally, the authors are creating the appearance of novelty by associating known features of housekeeping gene transcription and nucleosome organization with a known feature of gene body methylation (enrichment in constitutively expressed housekeeping genes). Because gene body methylation is concentrated in housekeeping genes, it will of necessity correlate with features of these genes.

This paper contains interesting observations about the evolution of DNA methylation and could be a valuable addition to the field if the authors were to place their finding properly within the existing literature. However, given the relatively low novelty of the conclusions, I feel this paper is much more suitable for a journal like Mol Biol Evol or Epigenetics & Chromatin (in which many of the relevant recent papers have been published).

Reviewer #3: DNA methylation (5MeC) is an epigenetic regulatory mechanism, which among eukaryotes displays great variability in terms of genomic content and function. The majority of studies focusing on 5MeC to date have been carried out on vertebrate model organisms that exhibit hypermethylated genomes. Due to the recent increase in accessibility of genomic sequencing technologies, more invertebrate organisms have had their genomes and DNA methylomes sequenced. While this provided important insights into the evolutionary conservation and divergence of 5MeC, many open questions still remain. In the current manuscript Lewis et al generate base-resolution DNA methylome maps of the highly diverse Arthropod phylum. While it is fair to say that none of the reported findings come across as particularly surprising, this is an important piece of work that will help in better understanding the evolution of 5MeC. This work will also provide a useful genomic resource that will enable further evolutionary studies in the 5MeC field. My comments for improvement can be found below.

1) The authors state that the sequence data is available via SRA. However, the PRJNA589724 accession number does not work. I could also not find any reviewer links that would point to these data. This needs to be fixed before publication.

2) The samples are poorly described. It is not clear what type of tissue / organ / cell type / developmental stage was used for DNA extraction / library preps.

3) P6, L24-25 and P7, L6-7: Targeting of TEs by 5MeC was previously shown in S maritima. A citation is required here (de Mendoza et al, 2019, Genome Res).

4) Figure 3C. Could the authors speculate on the differences in 5MeC exon enrichment directionality? For example Limulus, Exodes, Parasteatoda show more enrichment at 5' ends whereas Bombus and Apis, display more enrichment at 3' gene ends.

5) Would some of the published RNA-seq data allow for analyses of the frequency of cryptic transcription initiation in organisms with low gene body 5MeC levels vs the ones with robustly methylated GBs?

6) Related to Figs, 4,5,6: It is not clear from which tissues / organs / cell types / developmental stages RNA-seq data originates from and how this compares to the DNA methylome data. This needs to be clearly explained. Could these discrepancies be the reason for the reversed pattern in P citri or the observation that many highly conserved and expressed genes lack 5MeC?

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: Missing sequencing summary statistics

Reviewer #2: None

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Decision Letter 1

Wolf Reik

3 May 2020

* Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. *

Dear Peter,

Thank you very much for submitting your Research Article entitled '­­­­­­Widespread conservation and lineage-specific diversification of genome-wide DNA methylation patterns across arthropods' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved.

We therefore ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer.

In addition we ask that you:

1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

Please let us know if you have any questions while making these revisions.

Yours sincerely,

Wolf Reik

Section Editor: Epigenetics

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The revisions to this manuscript are well done. This is going to make a nice contribution to the field.

Reviewer #2: I am grateful to the authors for seriously considering my concerns and revising their manuscript accordingly. The paper now far more accurately reflects the existing state of knowledge and the authors’ new contributions. I remain sceptical about the appropriateness of this manuscript for PLoS Genetics, but this is ultimately a matter for the editors.

I do have one suggestion for improvement. The conservation of gene body methylation across orthologs has been extensively studied in plants (Takuno and Gaut, PNAS 2013; Seymour et al., PLoS Genetics 2014; Takuno et al., MBE 2017; Seymour and Gaut, MBE 2019). This work reaches the same conclusion the authors do here, that the same sets of genes tend to be methylation across species (spanning diverse timescales). Furthermore, genes that retain methylation across evolution have a higher tendency to have conserved sequences and to exhibit stable and constitutive expression (a rough proxy for housekeeping functions) than genes that have species-specific methylation. This work should be cited and discussed. I also suggest the authors test if genes that retain gbM across arthropods also have a higher tendency for conservation and stable expression, and perhaps a specific nucleosome configuration. This would make a valuable addition to the paper.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Decision Letter 2

Wolf Reik

15 May 2020

Dear Peter,

We are pleased to inform you that your manuscript entitled "­­­­­­Widespread conservation and lineage-specific diversification of genome-wide DNA methylation patterns across arthropods" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional accept, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about one way to make your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Wolf Reik

Section Editor: Epigenetics

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-19-02110R2

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Wolf Reik

17 Jun 2020

PGENETICS-D-19-02110R2

­­­­­­Widespread conservation and lineage-specific diversification of genome-wide DNA methylation patterns across arthropods

Dear Dr Sarkies,

We are pleased to inform you that your manuscript entitled "­­­­­­Widespread conservation and lineage-specific diversification of genome-wide DNA methylation patterns across arthropods" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Matt Lyles

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. ALKB2 DNA repair is associated with high levels of DNA methylation across arthropods.

    Boxplot showing genome-wide methylation levels in 31 arthropod species with and without ALKB2.

    (PDF)

    S2 Fig. Metagene plot of methylation within TEs and in flanking sequence for all species.

    TEs are shown in pink, flanking sequence in white.

    (PDF)

    S3 Fig. Domain structure of DNMT1s from arthropods, with the human DNMT1 for comparison.

    (AI)

    S4 Fig. Alignment of the CXXC domain from DNMT1 highlighting the disruption of this domain in P. citri.

    (AI)

    S5 Fig. Methylation of single exon and multi-exon genes for all species in which we see gene body methylation.

    (PDF)

    S6 Fig. Expression patterns of methylated and unmethylated genes for all species (cf Fig 5B).

    (PDF)

    S7 Fig. Fourier transform analysis of periodicity in DNA methylation correlation.

    A) Diagram of method to convert from the pattern into the magnitude of the nucleosome periodicity B, C) Periodicity in coding sequences and transposable elements respectively.

    (PDF)

    S8 Fig. Intron periodicity is markedly less apparent than exon periodicity.

    S. maritima exons 1 to 4 (A) and introns 1 to 4 (B) are shown for comparison.

    (PDF)

    S9 Fig. Methylation is a better predictor of nucleosomal pattern than housekeeping gene status.

    Both housekeeping genes and non-housekeeping genes that are methylated genes show enhanced nucleosome occupancy at the +1 nucleosome.

    (PDF)

    S1 Table. Details of the tissue type, sex, caste, BioSample Accession and SRA Accession of each sample that was newly-sequenced in this study.

    (XLSX)

    Attachment

    Submitted filename: Reviewerresponses_ver2.docx

    Attachment

    Submitted filename: reviwsround2.docx

    Data Availability Statement

    Sequence data that was newly-generated for this project have been deposited in the NCBI Short Read Archive under the BioProject accession code PRJNA589724. The source code, input data and newly-identified DNMT & ALKB2 gene sequences are available from the Cambridge Data Repository (https://doi.org/10.17863/CAM.45964). Bisulfite sequencing data for Planococcus citri has been deposited PRJNA610765.


    Articles from PLoS Genetics are provided here courtesy of PLOS

    RESOURCES