Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2023 Oct 18;52(D1):D222–D228. doi: 10.1093/nar/gkad894

MethMotif.Org 2024: a database integrating context-specific transcription factor-binding motifs with DNA methylation patterns

Matthew Dyer 1, Quy Xiao Xuan Lin 2, Sofiia Shapoval 3, Denis Thieffry 4,5, Touati Benoukraf 6,7,
PMCID: PMC10767921  PMID: 37850642

Abstract

MethMotif (https://methmotif.org) is a publicly available database that provides a comprehensive repository of transcription factor (TF)-binding profiles, enriched with DNA methylation patterns. In this release, we have enhanced the platform, expanding our initial collection to over 700 position weight matrices (PWM), all of which include DNA methylation profiles. One of the key advancements in this release is the segregation of TF-binding motifs based on their cofactors and DNA methylation status. We have previously demonstrated that gene ontology (GO) enriched terms associated with TF target genes may differ based on their association with alternative cofactors and DNA methylation status. MethMotif provides precomputed GO annotations for each human TF of interest, as well as for TF-co-TF complexes, enabling a comprehensive analysis of TF functions in the context of their co-factors. Additionally, MethMotif has been updated to encompass data for two new species, Mus musculus and Arabidopsis thaliana, widening its applicability to a broader community. MethMotif stands out as the first and only TF-binding motifs database to incorporate context-specific PWM coupled with epigenetic information, thereby enlightening context-specific TF functions. This enhancement allows the community to explore and gain deeper insights into the regulatory mechanisms governing transcriptional processes.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

Transcription factors (TFs) are DNA-binding proteins that control transcription and therefore regulate gene expression. They consist of a DNA-binding domain, an activation domain that recruits other proteins, such as transcription coregulators, and an optional signal-sensing domain that regulates the activity of the transcription complex. TFs recognize specific short DNA sequences (recognition/binding motifs), located mainly in regulatory elements such as promoters and enhancers.

The perception of DNA methylation, a process that occurs on cytosines primarily within CpG dinucleotides, has evolved from its previously assumed role as a consistent repressor of TF–DNA interaction. Current research indicates that methylation can influence TF binding and serve as a selection factor for certain motif-specific TFs, leading to variable impacts on transcription, particularly when one factor functions as an activator and the other as a repressor (1). Based on this understanding, TFs can be classified into three groups: those whose binding is inhibited, promoted or remains unaffected by DNA methylation (2). In addition, methylated CpGs-binding TFs can induce demethylation by recruiting ten–eleven translocations (TET) enzymes, which in turn allows binding of TFs, whose binding is usually methylation-impeded (3). Furthermore, the group of TFs known as pioneer TFs can bind densely closed inactive chromatin, leading to the recruitment of remodeling complexes, changing chromatin configuration and epigenetic modifications, ultimately facilitating the binding of other TFs (4,5).

Noteworthy, aberrant methylation patterns are found in numerous disease conditions, including cancer (6,7). It is now established that DNA methylation is an epigenetic modification crucial for the recruitment and/or selection of TFs within the chromatin, ultimately resulting to profound alteration of transcriptomic profiles.

To further elucidate the intricate relationship between DNA methylation and transcription regulation by TFs, we have created MethMotif (8), a comprehensive 2D TF motif database. This resource compiles the transcription factor binding site (TFBS) and position weight matrices (PWM) data, and it is enriched with cell type specific CpG methylation information. The amalgamation of TFBS motifs and their associated DNA methylation data enables us to more accurately characterize the DNA loci attributes that TFs recognize. Importantly, we have discovered that TFBS can exhibit cell type-specific variability within its binding motif in conjunction with a specific methylation pattern (9).

Since the inception of the first version of MethMotif, we have taken advantage of user feedback to enhance its capabilities, introduce innovative features and expand our dataset. More specifically, we have:

  • Increased the number of PWM from 509 to over 700

  • Expanded our reach to include datasets for two additional species: Mus musculus (n = 24) and Arabidopsis thaliana (n = 23)

  • Developed supplementary tools for batch querying and data visualization

The 2024 version of MethMotif encompasses an expanded and distinctive feature set, which aids in simplifying the comprehension of the context-specific roles of TFs. In previous research (9), we demonstrated that cofactors can modulate the target DNA-binding sequence motif, target sequence methylation preference and TF function. Consequently, besides its primary binding motif, the latest release of MethMotif offers for each human TF:

  • TFBS motifs and DNA methylation profiles in the context of each TF cofactor partners

  • Genomic locations of the TF of interest in relation to each TF partner

  • Gene ontology (GO) enrichment analyses of TF targets in the setting of each TF partner

With these enhancements and improvements readily available online (https://methmotif.org), MethMotif 2024 stands as a unique and robust tool for the scientific community interested by TF binding dynamics and gene network regulation.

Results

Expansion and update of MethMotif

MethMotif 2024 currently incorporates a vast array of ChIP-seq and Whole Genome Bisulfite sequencing (WGBS) or Enzymatic Methyl Sequencing (EM-seq) datasets from diverse sources, such as ENCODE (10), GEO (11) and GTRD (12). To complete these datasets, we have performed WGBS experiments (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE118030), to reach a total of 702 methyl-PWMs spanning three taxonomic groups.

Regarding Human, we now cover 16 cell lines, including 5 new ones: H9 and HUES64, both of which are human embryonic stem cell lines derived from the inner cell mass of the blastocyst; HEK293T, a cell line derived from human embryonic kidney cells; LNCaP, a prostate cancer cell line derived from a human lymph node metastatic lesion of prostatic adenocarcinoma; and SNU398, a hepatocellular carcinoma cell line derived from human liver cancer. These enhancements enable us to more effectively delineate the function of cell-type specific TFs (Table 1).

Table 1.

Expansion overview of MethMotif 2024 compared to the previous release

Release Human Mouse Arabidopsis Total
Number of TFs 2018 509 0 0 509
2024 655 24 23 702
Cell lines/Tissues 2018 11 0 0 11
2024 16 1 1 18
ChIP-seq experiments 2018 2178 0 0 2178
2024 2473 78 74 2625
WGBS experiments 2018 16 0 0 16
2024 22 1 10 33

Enhanced MethMotif database with pioneer factor integration

In an effort to provide more comprehensive and insightful annotations to our users, we have undertaken the meticulous task of manually curating literature to identify and annotate pioneer TFs within the MethMotif database. The connection between pioneer factors and DNA methylation is particularly noteworthy. Pioneer factors are known to bind to areas of the genome that are inaccessible to other TFs due to condensed chromatin, DNA methylation or other repressive epigenetic modifications. By integrating DNA methylation information into MethMotif, we allow users to delve deeper into the relationship between these critical aspects of transcriptional regulation. To effectively curate these pioneer factors, we relied on five specific criteria, each representing a key feature of these pioneer TF. These features include the ability to (i) interact with heterochromatin, (ii) promote DNA accessibility, (iii) induce alterations in cellular programming, (iv) be implicated in tumorigenesis or cancer progression and (v) bind to methylated DNA (for those TFs that contain a CpG dinucleotide within their binding sites) (4,13,14). In MethMotif, all pioneer factors are marked with an asterisk (*) for easy identification. Moreover, the database provides direct links to studies demonstrating each feature for every annotated pioneer factor. This information, along with additional features and details of the TF-binding motif, is accessible on the TF card within the database (Figure 1).

Figure 1.

Figure 1.

Updated MethMotif card. The MethMotif card consists of distinct panels that provide a comprehensive overview of TF-binding motifs, offering various insights into TF-binding motif characteristics to cater to diverse research needs. The novel panels introduced in MethMotif 2024 are emphasized with a red-dashed box. (A) The main MethMotif logo panel displays the TF cell-specific motif with the corresponding DNA methylation profile (MethMotif). (B) The cell line information panel offers exhaustive details about the cell, including species, sex, life stage, age, cell line, health and references. (C) The TF panel presents TF classifications, general attributes, expression levels and external database links. (D) The motif Information panel provides information about the origin of the ChIP-seq dataset, download date, peak counts, sequence counts used for motif generation, motif locations, and options for visualizing, opening, and downloading motif matrices in MEME and TRANSFAC formats, as well as a beta score matrix for DNA methylation profiles. (E) The pioneer factor information panel is dedicated to pioneer TF features and classifications and includes links to references about the pioneer features of the selected TF. (F) The co-factors information Panel offers an interactive heatmap of encompassing the main TF co-factors, with a color gradient proportional to the percentage of TF-co-TF co-binding (right). Clicking on a co-TF of interest displays the TF-co-TF co-binding MethMotif logo (an example of FOSL2 as a co-factor is shown on the left). A report of the TF-co-TF list can be downloaded from this panel.

This manual curation of pioneer factor information enriches MethMotif, and enhances its value as a comprehensive resource for the study of TF-binding dynamics, and gain a deeper understanding of the complex interplay between DNA methylation, TF binding and the regulation of gene expression.

Enhancing transcription factor cooperation analysis with MethMotif 2024

The cooperative binding of TFs is pivotal in the regulation of gene expression. Past research has shown that TF–TF cooperation is largely contingent on a multitude of factors, such as DNA motif, methylation, orientation and spacing preferences (15). However, to this day, there exists no systematic method to characterize TF binding partnership predicated on DNA methylation status, nor to foresee the impact of TF binding modules on the CpG methylation level.

In order to bridge this gap, we have pioneered a novel tool that leverages our comprehensive data compendium to uncover and rank the list of cofactors that co-bind with a TF of interest (9). For each human TF, MethMotif 2024 comprises a cofactor report that encapsulates the TF-binding partnership in four distinct categories: (i) co-binding score, (ii) the binding motif of co-binding TFBS, (iii) the methylation status coupled with the CG percentages within the two motifs, as well as within the 200 bp region surrounding the motif, and (iv) the read enrichment score (Figure 2A). Furthermore, MethMotif logos are provided for the overall TF-binding sites, as well as for its co-binding with different TF cofactors. Users can access this information by clicking on the co-TF within the TF card (Figure 2B). Lastly, within the same panel, users can access the global distribution of genomic locations (Figure 2C) and the GO enrichment analysis for TF targets (Figure 2D) associated with each individual TF and TF–TF pair. These results are generated using HOMER (16) and GREAT (17), respectively.

Figure 2.

Figure 2.

Example of context-specific TFBS query with MethMotif 2024, focusing on EGR2 and its co-factors in HEK293 cell line. (A) The heatmap shows the percentage of highly methylated (≥90%), moderately methylated (>10% and <90%), and not methylated (≤10%) CG sites in the TF-cofactor motif and surrounding area. Each row illustrates a TF-cofactor pair, each column displays a methylation status range, and color intensity represents a CG percentage within a given location and partner pair. Boxplots display ChIP-seq read enrichment scores in the co-binding peaks for EGR2 and its partners. The co-binding percentage is further displayed as a heatmap and the corresponding paired TF-cofactor motif logo is provided. (B) EGR2 MethMotif logos build on all ChIP-seq peaks (left) and in the presence of SP2 ChIP-seq peaks (right). Stacked bars above the motif logo describe CG sites methylation status. The blue, green and orange colors correspond to low, moderate and high methylation level, respectively. (C) Pie charts categorize the genomic locations of EGR2 (left) and EGR2-SP2 (right) ChIP-seq peaks, following an order starting from promoter/transcription start sites (TSS), transcription termination sites (TTS), 5′ untranslated region (UTR) exons, 3′ UTR exons, introns and intergenic regions. (D) Listing of the main enriched GO terms associated with the genes targeted by EGR2 (left) and EGR2-SP2 (right).

Introduction of CHG and CHH methylation contexts in Arabidopsis Thaliana

Unlike mammalian cells, plants exhibit DNA cytosine methylation in a context-independent manner. Specifically, DNA methylation can take place within symmetrical CG and CHG di- and trinucleotides (where H represents A, T or C), meaning that cytosine methylation can be present on both strands. It can also occur within asymmetrical CHH tri-nucleotides, where cytosine methylation is present only on one strand (18). Given these unique DNA methylation patterns in plants, distinct from other taxonomic groups, we have enhanced the original MethMotif logo to differentiate between symmetrical/asymmetrical or strand-specific methylation (Figure 3).

Figure 3.

Figure 3.

A/symmetric MethMotif logo designed for plants’ TFs. (A) CG-context symmetric MethMotif logo displaying MAF1-binding motif and the corresponding CG methylation status in a strand-specific manner. (B) MethMotif logos for MAF1 symmetric CHG- and asymmetric CHH-contexts. CHG and CHH distribution scores are represented by strand-specific boxplots.

Toolkits for batch querying and data visualization

Our batch query function continues to offer users the capacity to search our expansive database for occurrences of TFBSs, together with the associated methylation status, within a given bed file. This feature provides a comprehensive view of global TFBS and context-specific partner information for the selected cell line and enriched TFs. To facilitate interpretation, the tool generates essential information and visualizations for the specified genomic locations. The results can be downloaded with a single click, establishing the batch query as a robust and user-friendly tool that delivers context-specific TF information to all scientists.

Users looking for a more customizable analysis or more comfortable with R can drill down further into a sample of interest with the API TFregulomeR (9). This R package can access the entirety of the MethMotif database and enables the generation of all the figures and files available through the MethMotif website (with internet access). Starting with the genomic information of interest, it is possible to analyze the TFBS between cell lines in different CpG contexts. TFregulomeR further enables the deconvolution of TFBS by segregating a list of the cofactors most likely binding the protein of interest. Finally, TFregulomeR streamlines efficient annotation of context-specific data with genomic location and GO information. Users can download this package through GitHub (https://github.com/benoukraflab/TFregulomeR) and begin exploring its many functions.

Conclusion and perspectives

In our initial release, we presented the first integrated TF database integrating TF-binding motifs (ChIP-seq) with DNA methylation (WGBS). In this updated version of MethMotif, we have broadened the scope of accessible TFBS, cell lines, and WGBS and ChIP-seq coverage. MethMotif continues to be the singular TF database that delivers context-specific TFBS information. Our enhanced TF card now incorporates a cofactor information panel spanning DNA methylation and cell line contexts, uniquely revealing the motif context-dependent composition. True to our original design philosophy, the revamped cofactor card offers pre-generated figures presenting context-specific information on TF binding motifs, DNA methylation, genomic location and associated gene ontology. Consequently, MethMotif significantly amplifies the information content regarding primary TFs and binding site specificity. We have also introduced a manually curated database of pioneer factors. These pioneer factors, known for their ability to engage with and open condensed chromatin regions, provide a key to unlocking complex regulatory dynamics. This integration with pioneer factors makes MethMotif a potential companion for Assay for Transposase-Accessible Chromatin (ATAC-seq) (19) analysis. By mapping ATAC-seq peaks to MethMotif’s database, users can now infer the potential TFs, especially pioneer factors, that might be governing chromatin accessibility, thereby enhancing the depth and precision of their ATAC-seq studies.

Alongside expanded information for human cell lines, we have extended coverage of MethMotif to include cell lines from mouse (Mus musculus) and the model plant species Arabidopsis thaliana.

Up until this point, MethMotif’s information has been exclusively derived from cell line and Arabidopsis thaliana experiments. Moving forward, our objective is to incorporate primary cell datasets. Specifically, we are considering incorporating ChIP-seq and WGBS data derived from patients, as well as primary cells from mouse models. This approach would provide a more comprehensive depiction of in vivo TF-binding sites and their interplay with DNA methylation. Furthermore, the inclusion of primary mouse and patient cells would enrich our data with a unique perspective, enabling us to better capture the pathophysiological interplay between DNA methylation and TF activities, and provide insights into responses to drug treatments. This approach would address some of the challenges encountered when using cell lines and offer an intriguing opportunity to compare and contrast our findings with in vitro data. Depending on availability, we will also consider the incorporation of ChIP-seq and WGBS data for other model animals.

Lastly, to facilitate analyses with dataset not included in our database, our API TFregulomeR mirrors all functionalities found in MethMotif. By employing numerous functions of TFregulomeR, users can scrutinize their own genomic data in a context-specific manner to discover novel methylation specific TF co-factor motifs.

With these continual enhancements and expansions, we anticipate that MethMotif will remain a valuable resource for the scientific community studying gene regulation and TF dynamics in the context of specific epigenomics landscapes.

Acknowledgements

The authors thank Michael Woods for his insightful comments during the project, as well as the Centre for Analytics, Informatics and Research (CAIR) at Memorial University and the Digital Research Alliance of Canada (the Alliance) in partnership with ACENET, for their support in providing high-performance computing resources and database hosting.

Authors Contributions: M.J.D., Q.X.X.L. and T.B. designed the platform; M.J.D. and Q.X.X.L. programmed the website and performed all analysis; M.J.D., Q.X.X.L., S.S., D.T. and T.B. interpreted the data; M.J.D., S.S., D.T., and T.B. wrote the manuscript. T.B. directed the study.

Contributor Information

Matthew Dyer, Division of BioMedical Sciences, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, NL A1B 3V6, Canada.

Quy Xiao Xuan Lin, Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore.

Sofiia Shapoval, Division of BioMedical Sciences, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, NL A1B 3V6, Canada.

Denis Thieffry, Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; Département de Biologie de l’École Normale Supérieure, PSL Research University, Paris 75005, France.

Touati Benoukraf, Division of BioMedical Sciences, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, NL A1B 3V6, Canada; Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore.

Data availability

MethMotif is freely available at https://methmotif.org. Its API, TFregulomeR, is also available for download from GitHub (https://github.com/benoukraflab/TFregulomeR) and Zenodo (https://doi.org/10.5281/zenodo.8403294).

Funding

Canada Research Chairs; National Research Foundation; Ministry of Education, Singapore. Funding for open access charge: Canada Research Chairs.

Conflict of interest statement. None declared.

References

  • 1. Tirado-Magallanes R., Rebbani K., Lim R., Pradhan S., Benoukraf T.. Whole genome DNA methylation: beyond genes silencing. Oncotarget. 2017; 8:5629–5637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Yin Y., Morgunova E., Jolma A., Kaasinen E., Sahu B., Khund-Sayeed S., Das P.K., Kivioja T., Dave K., Zhong F.et al.. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science. 2017; 356:eaaj2239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Rasmussen K.D., Helin K.. Role of TET enzymes in DNA methylation, development, and cancer. Genes Dev. 2016; 30:733–750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Zaret K.S., Carroll J.S.. Pioneer transcription factors: establishing competence for gene expression. Genes Dev. 2011; 25:2227–2241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Mayran A., Drouin J.. Pioneer transcription factors shape the epigenetic landscape. J. Biol. Chem. 2018; 293:13795–13804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Detilleux D., Spill Y.G., Balaramane D., Weber M., Bardet A.F.. Pan-cancer predictions of transcription factors mediating aberrant DNA methylation. Epigenetics Chromatin. 2022; 15:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Lemma R.B., Fleischer T., Martinsen E., Ledsaak M., Kristensen V., Eskeland R., Gabrielsen O.S., Mathelier A.. Pioneer transcription factors are associated with the modulation of DNA methylation patterns across cancers. Epigenetics Chromatin. 2022; 15:13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Xuan Lin Q.X., Sian S., An O., Thieffry D., Jha S., Benoukraf T.. MethMotif: an integrative cell specific database of transcription factor binding motifs coupled with DNA methylation profiles. Nucleic Acids Res. 2019; 47:D145–D154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Lin Q.X.X., Thieffry D., Jha S., Benoukraf T.. TFregulomeR reveals transcription factors’ context-specific features and functions. Nucleic Acids Res. 2020; 48:e10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Davis C.A., Hitz B.C., Sloan C.A., Chan E.T., Davidson J.M., Gabdank I., Hilton J.A., Jain K., Baymuradov U.K., Narayanan A.K.et al.. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 2018; 46:D794–D801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Clough E., Barrett T.. The Gene Expression Omnibus Database. Methods Mol. Biol. 2016; 1418:93–110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Yevshin I., Sharipov R., Kolmykov S., Kondrakhin Y., Kolpakov F.. GTRD: a database on gene transcription regulation—2019 update. Nucleic Acids Res. 2019; 47:D100–D105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Iwafuchi-Doi M. The mechanistic basis for chromatin regulation by pioneer transcription factors. Wiley Interdiscip. Rev. Syst. Biol. Med. 2019; 11:e1427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Luzete-Monteiro E., Zaret K.S.. Structures and consequences of pioneer factor binding to nucleosomes. Curr. Opin. Struct. Biol. 2022; 75:102425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Jolma A., Yin Y., Nitta K.R., Dave K., Popov A., Taipale M., Enge M., Kivioja T., Morgunova E., Taipale J.. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature. 2015; 527:384–388. [DOI] [PubMed] [Google Scholar]
  • 16. Heinz S., Benner C., Spann N., Bertolino E., Lin Y.C., Laslo P., Cheng J.X., Murre C., Singh H., Glass C.K.. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 2010; 38:576–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. McLean C.Y., Bristor D., Hiller M., Clarke S.L., Schaar B.T., Lowe C.B., Wenger A.M., Bejerano G.. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 2010; 28:495–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Grimanelli D., Ingouff M.. DNA methylation readers in plants. J. Mol. Biol. 2020; 432:1706–1717. [DOI] [PubMed] [Google Scholar]
  • 19. Grandi F.C., Modi H., Kampman L., Corces M.R.. Chromatin accessibility profiling by ATAC-seq. Nat. Protoc. 2022; 17:1518–1552. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

MethMotif is freely available at https://methmotif.org. Its API, TFregulomeR, is also available for download from GitHub (https://github.com/benoukraflab/TFregulomeR) and Zenodo (https://doi.org/10.5281/zenodo.8403294).


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES