1. The unsolved question of the ‘epigenetic code’
The eukaryotic genome is packaged into chromatin, a nucleoprotein complex in which the DNA is coordinated by histones and other structural and regulatory proteins. Since the first description of DNA methylation [1] and post-translational modifications of histones [2] it has become apparent that all chromatin constituents are extensively modified by covalent attachments of various chemical groups. The realization that particular modifications and the enzymes that deposit or remove them are linked to transcriptional ‘on’ (euchromatin) or ‘off’ (heterochromatin) states, and the discovery of effector proteins that can recognize specific modifications via designated binding domains, subsequently paved the way for the formulation of the ‘histone code’ hypothesis [3–5]. Accordingly, modifying enzymes are now referred to as ‘writers’, demodifying enzymes as ‘erasers’ and binding proteins as ‘readers’ of particular modifications. The reversible deposition of modifications by the writer/eraser machinery, in interplay with the modification readers, adds a ‘flexible’ layer of ‘epigenetic’ information, stored within the chromatin template, to the ‘fixed’ genetic information encoded in the DNA sequence itself. Thus, the ‘histone code’ hypothesis has gradually evolved into the concept of an ‘epigenetic code’ [6], where histone and DNA modifications together with histone sequence variants combine into specific modification signatures that regulate the activity and functional state of the underlying chromatin thereby specifying the state of a cell.
Over the years many novel modifications have been identified [7] and it was found that histones are pervasively decorated by multiple modifications [8,9], underscoring the complexity of the chromatin modification landscape. These modifications are not randomly distributed, but form characteristic patterns that demarcate functionally distinct chromatin regions such as enhancers, promoters, gene bodies or heterochromatin and provide the basis for subdividing the genome into different ‘chromatin states’ based on their modification and transcriptional status within a given cell [10,11]. However, while many individual modifications are well characterized, how their combinations regulate genome functions remains largely unresolved. In other words, it is still largely unclear whether an epigenetic ‘code’ exists, and if so how such a code might work.
DNA and histone modifications regulate genome functions to a large extent by modulating the binding of nuclear proteins to chromatin. Modifications recruit or exclude effector proteins such as modifying enzymes, demodifying enzymes, chromatin remodelers, transcription machinery, replication factors, and DNA repair proteins that mediate specific downstream functions. Most chromatin regulators harbor multiple modification binding domains, either within their protein sequences or distributed over different subunits of a protein complex [12], suggesting that recognizing composite modification signatures is critical for their functions in genome regulation. However, while many readers of single modifications have been described, a lot less is known about how nuclear proteins read out and interpret more than one modification.
Addressing this question poses a major technical challenge: biochemical or structural approaches are usually low-throughput and candidate-based or, if carried out in high-throughput, typically test isolated reader domains against modified histone peptides or oligonucleotides thereby losing the physiological context of chromatin and intact proteins; genome-wide mapping approaches, while allowing the comparison of the distributions of chromatin modifications and reader proteins, can only establish correlative relations but not causality; genetic approaches that study the effects of inactivating mutations in epigenetic enzymes or amino acid substitutions in histones are usually hampered by enzyme redundancy and the fact that most histone residues can be modified in a variety of ways; despite recent progress in single-cell and epi-genome editing techniques it is still impossible to precisely manipulate and determine the exact modification status of a nucleosome at a given genomic locus in vivo with currently available technologies. Therefore, despite the ‘histone code’ hypothesis dating back more than 20 years there still is a striking lack of insights into how nuclear machineries read out complex chromatin modification landscapes.
2. Decoding the chromatin modification landscape
To answer the long standing question of how nuclear proteins translate the information encoded in complex modification patterns we have carried out a large set of mass spectrometry (MS)-linked affinity purifications from nuclear extracts using a library of 55 modified di-nucleosomes incorporating biologically relevant modification signatures representing promoter, enhancer and heterochromatin states [13]. Unlike other experimental approaches, nucleosome affinity purifications followed by high-performance MS allow us to precisely control the modification status of the nucleosomes and to quantitatively assess the direct effects of the modifications on protein recruitment or exclusion, enabling us to establish causalities between modifications and protein binding at a large scale. We thereby provide the first systematic characterization of how combinatorial chromatin modification patterns are decoded by the nuclear proteome.
At the center of our study lies a highly complex MS dataset that contains several layers of information (Supplementary Figure S1). The first layer consists of the quantitative proteomics results of the nucleosome pull-downs themselves that describe the binding responses of around 2000 nuclear proteins to the different chromatin modification signatures. While informative on their own, the nucleosome pull-downs do not provide any information on which modifications modulate recruitment or exclusion of individual proteins as most nucleosomes display combinatorial modification signatures. However, systematic analyses across the entire dataset offer deeper insights into how chromatin readers engage with the modified nucleosome templates. As the second layer we can deconvolute ‘feature effect estimates’ from the highly complex nucleosome binding profiles (see Supplementary Figure S1). Our dataset allows us to retrieve the effects of 15 different modification features, including various acetylation and methylation marks on histones H3 and H4, DNA methylation, and the histone variant H2A.Z. Combining these estimates into ‘modification response profiles’, analogous to ‘sequence logos' of DNA binding motifs targeted by transcription factors [14], provides critical information about the direct effects of various modification features on protein binding. An important finding from this analysis is that modification features vary considerably in their ability to regulate protein binding. Notably, activating modifications tend to affect the binding of many more proteins than repressive modifications. This may reflect the more ‘dynamic’ nature of euchromatin regulation compared with more ‘static’ heterochromatin. Furthermore, chromatin engagement by many reader proteins is modulated by multiple features supporting the combinatorial nature of epigenetic regulation. As a third layer, the similarities of the binding responses of the identified proteins to all the di-nucleosomes in the library can be exploited to identify networks of factors that are co-regulated by similar chromatin states. This information can be used to identify protein complexes and to predict novel protein-protein interactions, including so far undiscovered subunits of known complexes.
A striking result from these combined analyses was the unexpected finding that, unlike the promoter mark H3K4me3 (tri-methyllysine 4 of histone H3), the enhancer marks H3K4me1 (mono-methyllysine 4 of histone H3) and H3K27ac (acetyllysine 27 of histone H3) have a low regulatory potential and are largely inert in recruiting proteins to chromatin. Contrary to a recent study [15], our results indicate that H3K4me1 does not recruit any specific proteins, but may act by excluding repressive factors [16,17], in line with other findings demonstrating that the H3K4me1 mark itself is largely dispensable for enhancer function [18,19]. Similarly, our analysis reveals that the H3K27ac mark alone only weakly recruits a low number of proteins, for example, the SRCAP complex. It was found in MS studies that H3K27ac almost always occurs in the context of other histone acetylation marks [9] that are also deposited by its acetyl transferase enzyme p300/CBP, and similar to H3K4me1 it was shown that H3K27ac is not required for gene activation and enhancer function [20,21]. Therefore, while H3K27ac can be used as a proxy for p300/CBP activity and is predictive of active enhancers, it is unlikely to play any direct role in instructing transcriptional activation itself. The likely function of H3K27ac could therefore simply be to block deposition of the repressive H3K27me3 (tri-methyllysine 27 of histone H3) mark and to prevent the recruitment of the H3K27me3-binding repressive PRC1 and PRC2 complexes. Taken together, these findings suggest that H3K4me1 and H3K27ac may both act by shielding active chromatin from invasion by repressive factors and modifications rather than by directly recruiting transcriptional activators. Thus, while H3K4me1 and H3K27ac can clearly be used as markers for genome annotation, their functional role at enhancers must be reevaluated.
3. The Modification Atlas of Regulation by Chromatin States: MARCS
Extracting meaningful information from the nucleosome pull-down dataset turned out to be a challenge since the complexity of the MS data obtained from the SILAC-linked nucleosome affinity purifications (SNAP) went beyond standard interaction proteomics. Many SNAP experiments needed to be integrated and therefore experimental variations had to be compensated. Furthermore, in order to establish the relationships between nucleosomal modifications and protein binding, reliable quantifications of the enrichments of a particular protein on the different modified nucleosomes were important, apart from only the binary information of what protein binds to which nucleosome. To achieve this, we needed to develop new computational tools to adjust and compare all SNAP experiments. Furthermore, we had to find ways to determine and visualize the dependencies between binding and modification signatures for many proteins and nucleosomes in easily understandable graphical representations. The SNAP dataset contains a lot of information and many interesting observations that cannot all be communicated effectively in the format of a scientific article. We therefore developed an alternative solution to make the data accessible to the research community. Initially built for our own use to be able to browse the SNAP data, we now present the key results of our study in the online resource MARCS – the Modification Atlas of Regulation by Chromatin States (https://marcs.helmholtz-munich.de). Designed as an interactive and easy to use web interface, MARCS provides a collection of intuitive visualization tools to interrogate the different information layers of the dataset as described above (Supplementary Figure S1).
Finding a visual ‘language’ for displaying the nucleosome pull-down data and creating tailored computational analysis workflows constituted a critical aspect of the work. The most challenging part consisted of condensing the dataset into few but useful parameters and finding a graphical representation of the results that is easy to grasp and navigate, yet preserves the full complexity of the data. MARCS concentrates on the key analyses and results that we found to be most informative, and allows users to customize their searches and visualizations in order to generate downloadable image files displaying exactly the information that they require. In fact many of the figures for our study were generated directly from the MARCS website.
While the published paper [13] forms the cornerstone of our study, the MARCS resource converts it into an interactive ‘living’ document that can be read at several levels: the main text and figures of the paper will allow readers to absorb the key messages quickly, but if readers take more time they will find many more insightful details and results in the ‘Extended Data’ and the ‘Supplementary Information' of the article. However, to get the full picture we encourage readers to spend time on the MARCS website and try out the different functions and settings – there are many observations that we have not touched upon in the paper, and that users will discover once they start exploring the data thoroughly. Our goal was therefore to make the data as accessible as possible via the MARCS interface, so that other researchers can delve into it and use it as an information source for both hypothesis generation and validation.
For interested readers we also provide detailed step-by-step protocols for preparing modified nucleosomes and performing proteomics-linked nucleosome pull-downs in Tvardovskiy et al. [22]. In addition, a follow-up study that builds on the MARCS dataset to identify binding events that display true ‘synergistic’ combinatorial effects (as predicted by the ‘histone code hypothesis’) is presented in Stadler et al. [23].
4. What can we learn from MARCS?
Where do we stand in our quest to decipher the epigenetic code? And what can MARCS teach us about how such an epigenetic code might work? Our results indicate that some readers follow simple binding rules and are recruited or excluded by just one modification. However, a large number of readers display more complex binding modes and respond to multiple modifications. Some of these follow a recognizable ‘code’ that can be identified and described by relatively simple parameters [23]. Similarly, while some modifications act like binary ‘yes/no’ binding switches, other nucleosomal features seem to rather modulate binding affinities and do not induce pronounced ‘on/off’ responses. Many interactions are clearly not simple and cannot easily be described by synergies between a small number of specific modifications. Such binding responses apparently consist of multiple independent interactions that can act in additive ways, or include intricate cross-talk between multiple modifications that is not easy to delineate. Especially for acetylation marks we observe that certain readers do not have clear binding preferences, but seem to be promiscuous or respond to the degree of acetylation rather than recognizing specific acetylations in specific positions.
Over the past years a surge of structural studies has revealed that many chromatin readers recognize large surfaces on the nucleosome backbone, including the DNA (reviewed by McGinty & Tan [24]). Furthermore, biophysical studies coupled to molecular dynamics simulations have demonstrated that histone tails undergo transient structural rearrangements and that they can assume conformations that fold back onto the DNA or neighboring nucleosomes (reviewed by Tsunaka et al. [25]). These properties can potentially be influenced by chromatin modifications and histone variants. Thus, the recognition of particular histone and/or nucleosome conformational features and structural changes induced by modifications must also be taken into consideration as an integral part of the ‘code’, in addition to the direct readout of the modifications themselves. Notably, many readers do not contain any identifiable modification binding domains that would help to rationalize their binding behaviors, which is in line with the recognition of larger surfaces on the nucleosome [24] rather than binding to discrete modifications in the context of linear histone tails. A recent study using targeted epigenome editing additionally demonstrated that the effects of specific modifications can also depend on the genomic sequence context [26], therefore cross-talk between modifications and the DNA sequence must also be taken into account, increasing the level of complexity even further.
As is often the case, solving one problem opens the door to new and more difficult problems. Although with MARCS we have now made a major step forward in our understanding how epigenetic readers decode the chromatin modification landscape, it turns out that the problem is much more complex than initially thought. We will not run out of questions any time soon, until we truly break the code!
Supplementary Material
Acknowledgments
The authors wish to thank N Nguyen for his invaluable contributions in conducting the SILAC nucleosome affinity purification experiments without which MARCS would not have been possible.
Supplemental material
Supplemental data for this article can be accessed at https://doi.org/10.1080/17501911.2024.2387527
Author contributions
A Tvardovskiy and T Bartke wrote the manuscript. S Lukauskas prepared Supplementary Figure 1 and commented on the manuscript.
Financial disclosure
Work in the T B. lab was funded by the UK Medical Research Council (Grant No. MC_UP_1102/2), the European Research Council (ERC StG No. 309952), the Deutsche Forschungsgemeinschaft (DFG Project-IDs 213249687/SFB 1064, 431163844, and 450084515), and the Helmholtz Gesellschaft. The MARCS website (https://marcs.helmholtz-munich.de) is hosted on the BMBF-funded de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) (031A532B, 031A533A, 031A533B, 031A534A, 031A535A, 031A537A, 031A537B, 031A537C, 031A537D, 031A538A).
Competing interests disclosure
The authors have no competing interests or relevant affiliations with any organization or entity with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
Writing disclosure
No writing assistance was utilized in the production of this manuscript.
References
- 1.Hotchkiss RD. The quantitative separation of purines, pyrimidines, and nucleosides by paper chromatography. J. Biol. Chem. 1948;175(1):315–332. doi: 10.1016/S0021-9258(18)57261-6 [DOI] [PubMed] [Google Scholar]
- 2.Allfrey VG, Faulkner R, Mirsky AE. Acetylation and methylation of histones and their possible role in the regulation of RNA synthesis. Proc Natl Acad Sci USA. 1964;51(5):786–794. doi: 10.1073/pnas.51.5.786 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Turner BM. Decoding the nucleosome. Cell. 1993;75(1):5–8. doi: 10.1016/S0092-8674(05)80078-9 [DOI] [PubMed] [Google Scholar]
- 4.Strahl BD, Allis CD. The language of covalent histone modifications. Nature. 2000;403(6765):41–45. doi: 10.1038/47412 [DOI] [PubMed] [Google Scholar]
- 5.Jenuwein T, Allis CD. Translating the histone code. Science. 2001;293(5532):1074–1080. doi: 10.1126/science.1063127 [DOI] [PubMed] [Google Scholar]
- 6.Turner BM. Defining an epigenetic code. Nat Cell Biol. 2007;9(1):2–6. doi: 10.1038/ncb0107-2 [DOI] [PubMed] [Google Scholar]
- 7.Millán-Zambrano G, Burton A, Bannister AJ, et al. Histone post-translational modifications - cause and consequence of genome function. Nat Rev Genet. 2022;23(9):563–580. doi: 10.1038/s41576-022-00468-7 [DOI] [PubMed] [Google Scholar]
- 8.Garcia BA, Pesavento JJ, Mizzen CA, et al. Pervasive combinatorial modification of histone H3 in human cells. Nat Methods. 2007;4(6):487–489. doi: 10.1038/nmeth1052 [DOI] [PubMed] [Google Scholar]
- 9.Young NL, DiMaggio PA, Plazas-Mayorca MD, et al. High throughput characterization of combinatorial histone codes. Mol Cell Proteomics. 2009;8(10):2266–2284. doi: 10.1074/mcp.M900238-MCP200 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol. 2010;28(8):817–825. doi: 10.1038/nbt.1662 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Roadmap Epigenomics Consortium Kundaje A, Meuleman W, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518(7539):317–330. doi: 10.1038/nature14248 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ruthenburg AJ, Li H, Patel DJ, et al. Multivalent engagement of chromatin modifications by linked binding modules. Nat Rev Mol Cell Biol. 2007;8(12):983–994. doi: 10.1038/nrm2298 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lukauskas L, Tvardovskiy A, Nguyen NV, et al. Decoding chromatin states by proteomic profiling of nucleosome readers. Nature. 2024;627(8004):671–679. doi: 10.1038/s41586-024-07141-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Schneider TD, Stevens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18(20):6097–6100. doi: 10.1093/nar/18.20.6097 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Local A, Huang H, Albuquerque CP, et al. Identification of H3K4me1-associated proteins at mammalian enhancers. Nat Genet. 2018;50(1):73–82. doi: 10.1038/s41588-017-0015-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Blackwehl T, Crispatzu G, Schaaf K, et al. Enhancer-associated H3K4 methylation safeguards in vitro germline competence. Nat Commun. 2021;12(1):5771. doi: 10.1038/s41467-021-26065-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ooi SK, Qiu C, Bernstein E, et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature. 2007;448(7154):714–717. doi: 10.1038/nature05987 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Dorighi KM, Swigut T, Henriques T, et al. Mll3 and Mll4 facilitate enhancer RNA synthesis and transcription from promoters independently of H3K4 monomethylation. Mol Cell. 2017;66(4):568–576; e4. doi: 10.1016/j.molcel.2017.04.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rickels R, Herz HM, Sze CC, et al. Histone H3K4 monomethylation catalyzed by Trr and mammalian COMPASS-like proteins at enhancers is dispensable for development and viability. Nat Genet. 2017;49(11):1647–1653. doi: 10.1038/ng.3965 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zhang T, Zhang Z, Dong Q, et al. Histone H3K27 acetylation is dispensable for enhancer activity in mouse embryonic stem cells. Genome Biol. 2020;21(1):45. doi: 10.1186/s13059-020-01957-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Sankar A, Mohammad F, Sundaramurthy AK, et al. Histone editing elucidates the functional roles of H3K27 methylation and acetylation in mammals. Nat Genet. 2022;54(6):754–760. doi: 10.1038/s41588-022-01091-2 [DOI] [PubMed] [Google Scholar]
- 22.Tvardovskiy A, Nguyen N, Bartke T. Identifying specific protein interactors of nucleosomes carrying methylated histones using quantitative mass spectrometry. Methods Mol Biol. 2022;2529:327–403. doi: 10.1007/978-1-0716-2481-4_16 [DOI] [PubMed] [Google Scholar]
- 23.Stadler M, Lukauskas S, Bartke T, et al. asteRIa enables robust interaction modeling between chromatin modifications and epigenetic readers. Nucleic Acids Res. 2024;52(11):6129–6144. doi: 10.1093/nar/gkae361 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.McGinty RK, Tan S. Principles of nucleosome recognition by chromatin factors and enzymes. Curr Opin Struct Biol. 2021;71:16–26. doi: 10.1016/j.sbi.2021.05.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tsunaka Y, Furukawa A, Nishimura Y. Histone tail network and modulation in a nucleosome. Curr Opin Struct Biol. 2022;75:102436. doi: 10.1016/j.sbi.2022.102436 [DOI] [PubMed] [Google Scholar]
- 26.Policarpi C, Munafò M, Tsagkris S, et al. Systematic epigenome editing captures the context-dependent instructive function of chromatin modifications. Nat Genet. 2024;56(6):1168–1180. doi: 10.1038/s41588-024-01706-w [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
