Abstract
In mammals, transcriptional factors (TFs) drive gene expression by binding to regulatory elements in a cooperative manner. Deciphering the rules of such cooperation is crucial to obtain a full understanding of cellular homeostasis and development. Although this is a long-standing topic, there is no comprehensive database for biologists to access the syntax of TF binding sites. Here we present TFSyntax (https://tfsyntax.zhaopage.com), a database focusing on the arrangement of TF binding sites. TFSyntax maps the binding motif of 1299 human TFs and 890 mouse TFs across 382 cells and tissues, representing the most comprehensive TF binding map to date. In addition to location, TFSyntax defines motif positional preference, density and colocalization within accessible elements. Powered by a series of functional modules based on web interface, users can freely search, browse, analyze, and download data of interest. With comprehensive characterization of TF binding syntax across distinct tissues and cell types, TFSyntax represents a valuable resource and platform for studying the mechanism of transcriptional regulation and exploring how regulatory DNA variants cause disease.
INTRODUCTION
The mammalian genome encodes over 1600 transcription factors (TFs) (1,2). These proteins regulate the spatiotemporal expression of genes in response to environmental and developmental stimuli (3–5). Typically, regulatory elements contain multiple TF binding motifs arranged in a non-random fashion, following a particular syntax (6). This syntax is believed to facilitate TF cooperation in binding (7,8). However, the rules behind TF binding syntax are largely unknown. The significance of exploring this problem is highlighted by the fact that >80% of genetic variants associated with complex diseases map to regulatory elements (9).
Evidence about TF cooperativity has come from the study of TF complexes. For example, the AP-1 complex, which regulates gene expression in response to a variety of stimuli (10), is composed of ATF, FOS, JUN and MAF. Genome-wide analyses have shown that motifs for AP-1 members either overlap or are in close proximity of each other (11–13). Furthermore, deleting or mutating individual motifs affect the binding of their neighbors (14,15). Recently, we discovered a group of TFs that recognize overlapping GC sequences in all tissues and frequently colocalize with other expressed factors (16). Dubbed stripe factors, this group includes members of the SP-KLF, EGR and ZBTB families, as well as a subset of zinc-finger proteins. These factors provide accessibility and increased chromatin residence time to co-binding partners (16). So far, different models and mechanisms have been proposed to explain TF binding cooperativity, such as motif arrangements, the billboard model, collective syntax and others (4,8,17–22). These substantial efforts have largely extended our vision of TF cooperativity, and further deepen the understanding of cis-regulatory code.
In recent years, several databases and web servers had been developed to facilitate the exploration of TF binding, such as ATACdb, TcoFBase, Cistrome, Factorbook, ReMap, ChIPBase and so on (23–30). However, few of them focus on TF binding syntax. As an important part of cis-regulatory code, TF cooperation has been a long-standing topic and the focus of much (31). However, there is no comprehensive database for researchers to visualize TF binding in mammalian genomes and derive the syntax of regulatory DNA. To fill this gap, we have developed TFSyntax, which suggests rules behind TF binding with an emphasis on cooperation. This is done by defining TF colocalization, positional, and density preferences. Dedicated to deciphering the cis-regulatory code of mammalian genomes, TFSyntax will provide a comprehensive view of genomic regulatory elements in mammalian genome and facilitate further exploration of transcriptional regulation.
DATA PROCESSING AND DATABASE IMPLEMENTATION
Data collection and processing
All raw ATAC-Seq, DNase-Seq and RNA-Seq data were downloaded from ENCODE and GEO (32). Quality control of these sequencing data was performed based on ENCODE Guidelines (https://www.encodeproject.org/data-standards/). All genomic coordinates for human and mouse samples are based on hg38 and mm10 respectively, and the reference genomes were downloaded from UCSC website (https://hgdownload.soe.ucsc.edu/downloads.html). All sequencing data with high-quality were further manually curated and processed using our in-house pipeline. In total, 187 human samples and 195 mouse samples have both high-quality chromatin accessibility and transcriptome data. In addition, we also downloaded preprocessed human and mouse TF ChIP-seq data from ReMap database (30).
For ATAC-Seq, DNase-Seq and ChIP-Seq data, bowtie2 (version 2.3.4) was used to align raw reads to reference genome using default parameters, and the option ‘–no-mixed’ was used for pair-end sequencing (33). Picard (version 2.21.8) were used to remove PCR duplicates with the ‘MarkDuplicates REMOVE_DUPLICATES = true’ options. Samtools (version 1.9) was used to extract uniquely mapped reads with options ‘view -q 30’ (34). MACS2 was used to identify peaks and summits of accessible elements with q-value of 0.01 (35). For ATAC-Seq and DNase-Seq data, RGT-HINT (version v0.13.1) were utilized to detect footprinting sites (36), and peaks of regulatory elements from matched sample were used as input. For RNA-Seq data, STAR (version 2.6.0c) was used to align raw reads with default parameters (37). RSEM (version 1.3.0) were used to measure gene expression level with options ‘–seed 12345 –estimate-rspd’ (38).
All position weight matrices (PWMs) of TF binding motifs were retrieved from JARSPAR, TRANSFAC (commercial version) and CIS-BP databases (39–41). Genome-wide PWM hits on human and mouse genome were detected using FIMO in the MEME Suite (version 5.0.1) with P-value < 1e–5 (42). With BEDTools (version 2.29.0) (43), the PWM hit pool was further used to annotate potential TF binding within the footprinting sites and potential TF binding around the summits of regulatory elements. The final annotation only kept those PWM hits with at least 90% overlap with the regions of interest (44). Specifically, we extended each summit both upstream and downstream by 100 bp, which resulted a 201 bp-region (extended summit region). The annotation of potential TF binding around summits are based on the extended summit regions. For each sample, gene expression values were transformed to zFPKM (45), and non-expressed TFs (zFPKM←2) were excluded from annotation.
Computation of TF binding syntax
To explore the rules of TF binding sites, we evaluated the distribution of DNA motifs within regulatory elements from three aspects: positional, density and colocalization. For each sample, TF binding syntax, in terms of positional preference and density preference of TF binding motifs and TF colocalization map, were computed based on two kinds of data respectively, motif-based and footprinting-based. Motif-based data are annotated motifs within the extended summit regions (201bp) of regulatory elements, while footprinting-based data are annotated motifs within footprinting sites identified from chromatin accessibility data.
For positional preference of TF binding sites, we computed the relative positions of inferred binding sites around the summits of regulatory elements, where the position of summit is 0. For each TF, we calculated the percentage of summits with the motif of that specific TF in each position. For density preference of TF binding sites, first, we calculated TF density on each extended summit region by counting the number of TFs who have inferred binding sites within the 201 bp summit region. Then, density preference of TF was computed based on the distribution of TF density on each summit where the TF may bind. Based on these methods, we calculated the positional preference and density preference of each TF in individual samples. Meanwhile, we also calculated the positional preference and density preference of each TF by species via combing annotation of all samples from the same species. To validate TF density preference with ChIP-Seq data, we merged all TF ChIP-Seq peaks from the same cell line and identified non-overlapping regions. The next, we counted the number of bound TFs in each region. Density preference of TF was computed based on the distribution of TF density on each region where the TF bind.
For TF colocalization map, we calculated how TFs colocalize with each other in each sample. If a summit region (201bp) of ATAC-Seq data has inferred binding site of TF A, we considered this summit as TF A positive. If x% of TF A positive summits were also TF B positive, we define that x% of TF A is colocalized by TF B. Similarly, we also calculated that y% of TF B was colocalized by TF A. Of note, x% is not equal to y% in almost all cases, because these two scores represent completely different values. According to our recent study (16), if over 30% of expressed TFs in a cell type were colocalized by the factor B with a percentage score ≥10%, then the factor B is classified as a ‘stripe’ factor. Based on this method, we calculated colocalization scores for all TF pairs and identified ‘stripe’ factors in each sample. In addition, we calculated TF colocalization with partners using TF ChIP-Seq data downloaded from ReMap database. If x% of TF A ChIP-Seq peaks were overlapped with any of TF B ChIP-Seq peaks, we define that x% of TF A is colocalized by TF B. Similarly, we also calculated that y% of TF B was colocalized by TF A. The colocalization scores based on ChIP-Seq data are also included in the TFSyntax data.
Implement of TFSyntax database
All processed data and other supporting data are stored in MySQL (version 8.0.28, https://downloads.mysql.com/archives/community/) or SQLite (version 3, https://www.sqlite.org/). The main framework of the TFSyntax web application is developed based on PHP (version 8.1.9, https://www.php.net/downloads.php), a popular and open-source script language for web development. JQuery (version 3.5.1, https://jquery.com/download/), Bootstrap (version 5.1.3, https://getbootstrap.com/) and Font Awesome (version 6.1.1, https://fontawesome.com/) open-source libraries are used to develop front-end web interfaces. Chart.js library (version 3.7.1, https://www.chartjs.org/) is used to visualize the positional preference and density preference of TF, and Morpheus.js library (https://github.com/cmap/morpheus.js/) is used for visualizing and processing TF colocalization map. JBrowse (version 1.16.11, https://jbrowse.org/jbrowse1.html) is integrated to visualize genome-wide TF binding profiles. The web application of TFSyntax is served based Linux-based NGINX server (version 1.23.1, https://nginx.org/).
DATABASE CONTENT AND USAGE
Overview of TFSyntax
In current version, we have collected and curated 7456 PWMs, which covers 1299 and 890 sequence-specific TFs in human and mouse respectively (Table 1). From 187 human and 195 mouse samples, we have detected 45M footprinting sites and annotated 2114M accessible motifs within the extended summits (201 bp) of 60M regulatory elements, which covers 760 Mb and 652 Mb non-redundant regions in human and mouse genome respectively. In addition, we generated comprehensive maps of potential transcription factor binding sites for each sample based on motifs within the extended summits and motifs within the footprinting sites (Figure 1A), and only motifs of expressed TFs were considered in each cell and tissue type. Based on these maps, TFSyntax presents the binding ‘syntax’ of 1299 human TFs and 890 mouse TFs from three aspects.
Table 1.
Summary of TFSyntax database
| Human | Mouse | |
|---|---|---|
| Number of transcription factors (TFs) | 1299 | 890 |
| Number of position weight matrices (PWMs) | 4097 | 3359 |
| Number of samples | 187 | 195 |
| Number of regulatory elements | 33M | 27M |
| Number of footprinting sites | 27M | 18M |
| Number of accessible motifs | 1507M | 607M |
| Number of TFs with ChIP-Seq data | 767 | 374 |
| Size of accessible genome | 760Mb | 652Mb |
Figure 1.
Overview of TF binding syntax. (A) Workflow of data processing pipeline in TFSyntax. (B) Positional preference of TF binding sites relative to the summit of regulatory elements. In the heatmap, white color indicates there is no enrichment comparing to background (edges), red means enriched, and blue means depleted. (C) Density preference of TF binding motifs. Each dot represents distribution for a TF (x-axis: mean value, y-axis: standard deviation). (D) Pairwise TF colocalization map. The score in each grid of heatmap shows percentage of TF on row are colocalized by the TFs on column.
The first syntax is positional preference of TF binding. A previous study using 103 TFs showed that different families of TFs display positional binding preferences within enhancers (46). Here, we explored this question comprehensively by plotting TF motif-enrichment profiles relative to the summits of regulatory elements, which results three major groups: center enriched, evenly distributed, and center depleted (Figure 1B). For instance, DNA motifs for PU.1, which shape local chromatin architecture by binding to nucleosomes (47), were strongly enriched at the summit of DNA elements. Conversely, most FOX TF motifs were clustered near the edges of regulatory elements (46), which agrees well with the pioneer function of FOX factors (48,49). On the other hand, motifs for LEF1, which shares homology with high mobility group protein-1 (HMG1) (50), lacked specific enrichment across regulatory elements. To compare the results from motif-based and footprinting-based data, the TF positional preference was measured using a metric named as enrichment score (occupancy frequencies of the center 50bp over the background on the two flanking sides). The comparative analysis shows that motif-based and footprinting-based data agree with each other very well (Supplementary Figure S1). The Pearson correlation coefficient is 0.79 in human, and the number in mouse is 0.73.
The next syntax is density preference of TF binding. Recent studies showed that the local density of TF binding sites impacts transcription by promoting the formation of condensates around active genes (51). Whether different TFs prefer specific densities at regulatory DNA has not been explored. To address this question, we computed the number of non-overlapping TF motifs per element in the mouse and human genomes. The analysis identified two broad TF groups: those that preferentially bind elements with relatively few partners (and display low standard deviation), and those that prefer more overcrowded elements (Figure 1C). For example, the relative position of different TFs, ARID5A, NFKB, MAZ and KLF10, was the same in both human and mouse (Supplementary Figure S2A). Similar result was also observed among members of the same family, such as SMAD3 and SMAD7 (Supplementary Figure S2B). To validate the TF density preference, we repeated the analysis using TF ChIP-seq data in human cell lines, and each cell line includes 80 to 244 TF ChIP-Seq data. The result shows that there is good correlation between the outputs from motif-based and ChIP-Seq-based data, with the Pearson correlation coefficients ranging from 0.53 to 0.66 (Supplementary Figure S3A). In addition, we also compared the results between motif-based and footprinting-based, and the Pearson correlation coefficients are mostly between 0.85 and 0.9 in both human and mouse samples (Supplementary Figure S3B).
Last but not least, combinatorial binding of TFs is thought to facilitate recognition of regulatory elements rather than isolated DNA motifs scattered across the genome (52,53). To comprehensively define TF combinations, we calculated the colocalization frequency of all possible TF motif pairs in each sample (Figure 1D). Though nearly 90% of TF motif pairs are lowly colocalized, there are many meaningful TF pairs, such as proteins from FOS-Jun complex, AP-1 complex and so on. Based on the colocalization frequency, in each cell and tissue, we also identified stripe factors, which frequently colocalize with other factors and provide accessibility to their co-binding factors in the same regulatory element (16). In addition, we also used TF ChIP-Seq data from four human cell lines to repeat the analysis of TF colocalization and then compared the colocalization scores between ChIP-Seq-based data and motif-based data. The Pearson correlation coefficients obtained ranged from 0.62 to 0.77 in the four cell lines (Supplementary Figure S4A). In addition, the Pearson correlation coefficients between motif-based and footprinting-based results were 0.6–0.65 in human samples and 0.7–0.8 in mouse samples, thus indicating good correlation (Supplementary Figure S4B).
The binding ‘syntax’ data of each TF are presented in two tiers, sample-level and species-level. Sample-level data show the profile in individual tissue or cell type, while species-level is based on aggregated data from different samples of the same species. All these TF binding maps, as well as detailed syntax information, are freely accessible through a web application with user-friendly interactive interfaces. All functions in the web application are organized into four modules: search, browse, analyses and download.
Searching TFSyntax
With heuristically search box, users can search TFs of interest in a specific species and/or in a specific tissue/cell type (Figure 2A), and view the whole picture of TF binding syntax based on motif and footprinting, including summarized information of TF binding sites, positional and density preference, and summarized colocalization information (Figure 2B). Users can switch between sample-specific view and species-specific view with the button inside the page. With the buttons shown as ‘Motif-based map’, ‘Footprinting-based map’ and ‘View in browser’, users can view and operate high-dimensional data with browsing tools, which will be introduced in latter section. TFSyntax also provides TF ortholog maps between human and mouse, with which users can easily access the same TF across species (Figure 2C). In addition, we also developed a dedicated search box for retrieving detailed TF colocalization information (Figure 2D–E) as well as supportive information based on ChIP-Seq data (Figure 2F). A dedicated module was developed to extract ‘stripe’ factors in a specific sample (Figure 2G, H)
Figure 2.
Screenshot of searching module. (A) Interface for searching TF binding syntax. (B) TF binding syntax in a sample-specific view. Panel 1: summary information of transcription factor. Panel 2: summary information in a specific sample. This panel will not be available in species-specific view. Panel 3–4: Positional profile of TF binding site based on motif (Panel 3) and footprinting (Panel 4). Density profile of TF binding site based on motif (Panels 5–6) and footprinting (Panels 7–8). (C) TF orthologs map between human and mouse. Box in red is for gene filtering. (D) Interface for filtering colocalized TF pairs. (E) Table showing TFs colocalizing with JUNB using footprinting-based data. (F) TF colocalization between JUNB and FOSL1 based on ChIP-Seq data. (G) Interface for filtering ‘stripe’ factors. (H) Table showing ‘stripe’ factors in adipose-derived stem cells.
Therefore, biologists can easily extract potential interesting factors in a specific cell type and filter out co-binding partners for a specific TF of interest, which will facilitate the study of TF cooperation.
Browsing tools
TFSyntax provides two kinds of browsing tools, namely, interactive heatmap and genome browser, to facilitate viewing and operating high-dimensional data. The interactive heatmap can be launched by a dedicated search box (Figure 3A) or hyperlinked sample ID in any pages showing TF binding syntax (Figure 2B, E and G). With the interactive heatmap, users can visualize genome-wide TF colocalization information, instead of a single TF (Figure 3B). As shown in the Figure 3B, ‘stripe’ factors are shown as white stripes in the heatmap, and frequently associated factors or factors of the same family form clusters. In addition, a series of functions are included in the interactive heatmap, which allow users to make customized operations, such as clustering analysis, creating Images and so on. A genome browser is deployed to view regulatory elements and detailed TF binding profiles at single-base resolution. With searching box and interactive buttons, users can view genomic regions or genes of interest by searching genomic coordinates and gene names or by clicking zoom-in/out buttons. Through track selector, users can load or unload the data tracks from 382 human and mouse samples (Figure 3C), including extended summit regions, footprinting sites and TF motifs. Alternatively, users can load their own tracks for side-by-side comparisons. For example, biologists can check which TFs may be affected by a disease SNPs of interest. As shown in Figure 3D, an autoimmune disease-associated variant rs5794664 (54), locates in regulatory elements in both CD4+ and CD8+ T cells and hits TFCP2 binding motif.
Figure 3.
Screenshot of browsing module. (A) Interface for selecting sample of interest to view TF colocalization map. (B) TF colocalization map in mouse CH12 cell line. The heatmap was clustered and white stripes on the heatmap are showing ‘stripe’ factors. Zoom-in box partially shows the cluster of ‘universal stripe factors’. (C) ‘Track selector’ in the human genome browser. (D) Summits of regulatory elements (green bars) and TF binding motifs (purple bars) and in CD4+ and CD8+ T cells. Vertical black dashed line denotes the location of SNP rs5794664, which hits the motif of TFCP2 in both cell types.
Analytical tools
Besides browsing tools, TFSyntax also provides online analytical tools, with which users can directly compare the binding syntax across TFs of interest. Currently, these analytical tools support the comparison of positional preference (Figure 4A, B) and density preference (Figure 4C, D), either using motif-based or footprinting-based data. With these analytical tools, for instance, users can view the density preference of multiple TFs across all cells and tissue types, instead of individual sample. Since the density of binding sites on DNA affects the formation of condensates, biologists may get some clues about the functionality of TFs in phase separation (51). Analytical results are immediately available, and all operations are executed through web-based graphical user interface. There is no requirement for prior knowledge regarding bioinformatics tools and programming.
Figure 4.
Screenshot of analytical tools. (A) interface for comparing positional preference of TF binding sites. (B) comparison of positional preference (JUNB, FOXP1 and LEF1) based on motifs. (C) interface for comparing density preference of TF binding sites. (D) comparison of density preference (KLF10, ARID5A, MAZ and NFKB1) based on motifs.
Data download
All processed data from TFSyntax are freely available through downloading module. Each sample includes two data levels: (i) genome-wide TF binding profile, including summits, motifs and footprinting sites, as mapped within regulatory elements and (ii) positional profiles, density profiles and TF colocalization motif maps. Users can download all data for further analysis or visualization using their own tools.
SUMMARY AND FUTURE DIRECTIONS
Increasing evidence shows that TFs cooperatively bind to regulatory element and activate promoters, cell identity, cell cycle and so on (7,55–56). In this study, we comprehensively evaluated the arrangement of TF binding motifs within regulatory elements in 382 different tissues and cell types though integrative analysis with position weight matrices, RNA-Seq, and ATAC-Seq/DNase-Seq. Based on these analyses, we built the first database focusing on the binding syntax of transcription factors in mammalian genomes. TFSyntax emphasizes the rules of TF binding sites in a cell-specific manner, in terms of TF colocalization, positional, and density preferences. In addition, we also included TF ChIP-Seq data as supportive information. TFSyntax provides a series of user-friendly interfaces in each module, thus biologists can easily access and further analyze the preprocessed data of their favorite transcription factors, without the constraint of computational and bioinformatic skills. In short, these features make TFSyntax a useful tool to explore how TFs cooperate with each other and bind to regulatory element in the context of TF partners. We believe TFSyntax will be an important resource and platform for studying the mechanism of transcriptional regulation and exploring how SNPs within regulatory elements cause disease.
Cooperative TF binding was reported to control cell differentiation in immune cells (57). To better understand the TF binding syntax in different cell contexts, we will keep integrating data from distinct cells and tissue types. Emergence of multimodal omics measurement in single-cell resolution, especially chromatin accessibility and transcriptome data (58), will be used to explore the TF binding rules in different sub-cell populations. In addition, besides exploring various rules governing TF binding, we will also make TFSyntax compatible with other databases to facilitate the study of transcriptional regulation, such as SNP data in dbSNP. Thus, datasets in other repositories can be loaded and compared with TF binding information within regulatory elements. Therefore, biologists or physicians will learn comprehensive information about enhancers.
DATA AVAILABILITY
All data in the database are freely available from https://tfsyntax.zhaopage.com/.
Supplementary Material
ACKNOWLEDGEMENTS
This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov). The author would like to express gratitude to Dr. Rafael Casellas for the great support and valuable discussion with this study.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
Intramural Research Program of the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) of the National Institutes of Health (NIH). Funding for open access charge: Intramural Research Program of the National Institute of Arthritis and Musculoskeletal and Skin Diseases.
Conflict of interest statement. None declared.
REFERENCES
- 1. Lambert S.A., Jolma A., Campitelli L.F., Das P.K., Yin Y., Albu M., Chen X., Taipale J., Hughes T.R., Weirauch M.T.. The human transcription factors. Cell. 2018; 172:650–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Hu H., Miao Y.R., Jia L.H., Yu Q.Y., Zhang Q., Guo A.Y.. AnimalTFDB 3.0: a comprehensive resource for annotation and prediction of animal transcription factors. Nucleic Acids Res. 2019; 47:D33–D38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Zinzen R.P., Girardot C., Gagneur J., Braun M., Furlong E.E.. Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature. 2009; 462:65–70. [DOI] [PubMed] [Google Scholar]
- 4. Morgunova E., Taipale J.. Structural perspective of cooperative transcription factor binding. Curr. Opin. Struct. Biol. 2017; 47:1–8. [DOI] [PubMed] [Google Scholar]
- 5. Panne D. The enhanceosome. Curr. Opin. Struct. Biol. 2008; 18:236–242. [DOI] [PubMed] [Google Scholar]
- 6. King D.M., Hong C.K.Y., Shepherdson J.L., Granas D.M., Maricque B.B., Cohen B.A.. Synthetic and genomic regulatory elements reveal aspects of cis-regulatory grammar in mouse embryonic stem cells. Elife. 2020; 9:e41279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Rao S., Ahmad K., Ramachandran S.. Cooperative binding between distant transcription factors is a hallmark of active enhancers. Mol. Cell. 2021; 81:1651–1665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Zeitlinger J. Seven myths of how transcription factors read the cis-regulatory code. Curr. Opin. Syst. Biol. 2020; 23:22–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Gallagher M.D., Chen-Plotkin A.S.. The post-GWAS era: from association to function. Am. J. Hum. Genet. 2018; 102:717–730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Hess J., Angel P., Schorpp-Kistner M.. AP-1 subunits: quarrel and harmony among siblings. J. Cell Sci. 2004; 117:5965–5973. [DOI] [PubMed] [Google Scholar]
- 11. Xie D., Boyle A.P., Wu L., Zhai J., Kawli T., Snyder M.. Dynamic trans-acting factor colocalization in human cells. Cell. 2013; 155:713–724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Hai T., Curran T.. Cross-family dimerization of transcription factors fos/jun and ATF/CREB alters DNA binding specificity. Proc. Natl. Acad. Sci. U.S.A. 1991; 88:3720–3724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Rauscher F.J. 3rd, Voulalas P.J., Franza B.R. Jr, Curran T.. Fos and jun bind cooperatively to the AP-1 site: reconstitution in vitro. Genes Dev. 1988; 2:1687–1699. [DOI] [PubMed] [Google Scholar]
- 14. Stefflova K., Thybert D., Wilson M.D., Streeter I., Aleksic J., Karagianni P., Brazma A., Adams D.J., Talianidis I., Marioni J.C.et al.. Cooperativity and rapid evolution of cobound transcription factors in closely related mammals. Cell. 2013; 154:530–540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. He Q., Bardet A.F., Patton B., Purvis J., Johnston J., Paulson A., Gogol M., Stark A., Zeitlinger J.. High conservation of transcription factor binding and evidence for combinatorial regulation across six drosophila species. Nat. Genet. 2011; 43:414–420. [DOI] [PubMed] [Google Scholar]
- 16. Zhao Y., Vartak S.V., Conte A., Wang X., Garcia D.A., Stevens E., Kyoung Jung S., Kieffer-Kwon K.-R., Vian L., Stodola T.et al.. Stripe” transcription factors provide accessibility to co-binding partners in mammalian genomes. Mol. Cell. 2022; 82:3398–3411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Rickels R., Shilatifard A.. Enhancer logic and mechanics in development and disease. Trends Cell Biol. 2018; 28:608–630. [DOI] [PubMed] [Google Scholar]
- 18. Farley E.K., Olson K.M., Zhang W., Brandt A.J., Rokhsar D.S., Levine M.S.. Suboptimization of developmental enhancers. Science. 2015; 350:325–328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Kulkarni M.M., Arnosti D.N.. Information display by transcriptional enhancers. Development. 2003; 130:6569–6575. [DOI] [PubMed] [Google Scholar]
- 20. Deplancke B., Alpern D., Gardeux V.. The genetics of transcription factor DNA binding variation. Cell. 2016; 166:538–554. [DOI] [PubMed] [Google Scholar]
- 21. Inukai S., Kock K.H., Bulyk M.L.. Transcription factor-DNA binding: beyond binding site motifs. Curr. Opin. Genet. Dev. 2017; 43:110–119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Reiter F., Wienerroither S., Stark A.. Combinatorial function of transcription factors and cofactors. Curr. Opin. Genet. Dev. 2017; 43:73–81. [DOI] [PubMed] [Google Scholar]
- 23. Wang F., Bai X., Wang Y., Jiang Y., Ai B., Zhang Y., Liu Y., Xu M., Wang Q., Han X.et al.. ATACdb: a comprehensive human chromatin accessibility database. Nucleic Acids Res. 2021; 49:D55–D64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Zou Z., Ohta T., Miura F., Oki S.. ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating chip-seq, ATAC-seq and bisulfite-seq data. Nucleic Acids Res. 2022; 50:W175–W182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Pratt H.E., Andrews G.R., Phalke N., Purcaro M.J., van der Velde A., Moore J.E., Weng Z.. Factorbook: an updated catalog of transcription factor motifs and candidate regulatory motif sites. Nucleic Acids Res. 2022; 50:D141–D149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Liu T., Ortiz J.A., Taing L., Meyer C.A., Lee B., Zhang Y., Shin H., Wong S.S., Ma J., Lei Y.et al.. Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 2011; 12:R83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Mei S., Qin Q., Wu Q., Sun H., Zheng R., Zang C., Zhu M., Wu J., Shi X., Taing L.et al.. Cistrome data browser: a data portal for chip-Seq and chromatin accessibility data in human and mouse. Nucleic Acids Res. 2017; 45:D658–D662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Kolmykov S., Yevshin I., Kulyashov M., Sharipov R., Kondrakhin Y., Makeev V.J., Kulakovskiy I.V., Kel A., Kolpakov F.. 2021) GTRD: an integrated view of transcription regulation. Nucleic Acids Res. 49:D104–D111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Zhou K.R., Liu S., Sun W.J., Zheng L.L., Zhou H., Yang J.H., Qu L.H.. ChIPBase v2.0: decoding transcriptional regulatory networks of non-coding RNAs and protein-coding genes from chip-seq data. Nucleic Acids Res. 2017; 45:D43–D50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Hammal F., de Langen P., Bergon A., Lopez F., Ballester B.. ReMap 2022: a database of human, mouse, drosophila and arabidopsis regulatory regions from an integrative analysis of DNA-binding sequencing experiments. Nucleic Acids Res. 2022; 50:D316–D325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Yanez-Cuna J.O., Kvon E.Z., Stark A.. Deciphering the transcriptional cis-regulatory code. Trends Genet. 2013; 29:11–22. [DOI] [PubMed] [Google Scholar]
- 32. Barrett T., Wilhite S.E., Ledoux P., Evangelista C., Kim I.F., Tomashevsky M., Marshall K.A., Phillippy K.H., Sherman P.M., Holko M.et al.. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013; 41:D991–D995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Langmead B., Salzberg S.L.. Fast gapped-read alignment with bowtie 2. Nat. Methods. 2012; 9:357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R.1000 Genome Project Data Processing Subgroup . The sequence alignment/map format and SAMtools. Bioinformatics. 2009; 25:2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Zhang Y., Liu T., Meyer C.A., Eeckhoute J., Johnson D.S., Bernstein B.E., Nusbaum C., Myers R.M., Brown M., Li W.et al.. Model-based analysis of chip-Seq (MACS). Genome Biol. 2008; 9:R137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Li Z., Schulz M.H., Look T., Begemann M., Zenke M., Costa I.G.. Identification of transcription factor binding sites using ATAC-seq. Genome Biol. 2019; 20:45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R.. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29:15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Li B., Dewey C.N.. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf. 2011; 12:323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Castro-Mondragon J.A., Riudavets-Puig R., Rauluseviciute I., Lemma R.B., Turchi L., Blanc-Mathieu R., Lucas J., Boddie P., Khan A., Manosalva Perez N.et al.. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2022; 50:D165–D173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Weirauch M.T., Yang A., Albu M., Cote A.G., Montenegro-Montero A., Drewe P., Najafabadi H.S., Lambert S.A., Mann I., Cook K.et al.. Determination and inference of eukaryotic transcription factor sequence specificity. Cell. 2014; 158:1431–1443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Wingender E., Dietze P., Karas H., Knuppel R.. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 1996; 24:238–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Grant C.E., Bailey T.L., Noble W.S.. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011; 27:1017–1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Quinlan A.R., Hall I.M.. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26:841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Vierstra J., Lazar J., Sandstrom R., Halow J., Lee K., Bates D., Diegel M., Dunn D., Neri F., Haugen E.et al.. Global reference mapping of human transcription factor footprints. Nature. 2020; 583:729–736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Hart T., Komori H.K., LaMere S., Podshivalova K., Salomon D.R.. Finding the active genes in deep RNA-seq gene expression studies. BMC Genomics. 2013; 14:778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Grossman S.R., Engreitz J., Ray J.P., Nguyen T.H., Hacohen N., Lander E.S.. Positional specificity of different transcription factor classes within enhancers. Proc. Natl. Acad. Sci. U.S.A. 2018; 115:E7222–E7230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Hosokawa H., Ungerback J., Wang X., Matsumoto M., Nakayama K.I., Cohen S.M., Tanaka T., Rothenberg E.V.. Transcription factor PU.1 represses and activates gene expression in early t cells by redirecting partner transcription factor binding. Immunity. 2018; 48:1119–1134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Clark K.L., Halay E.D., Lai E., Burley S.K.. Co-crystal structure of the HNF-3/fork head DNA-recognition motif resembles histone h5. Nature. 1993; 364:412–420. [DOI] [PubMed] [Google Scholar]
- 49. Iwafuchi-Doi M., Donahue G., Kakumanu A., Watts J.A., Mahony S., Pugh B.F., Lee D., Kaestner K.H., Zaret K.S.. The pioneer transcription factor FoxA maintains an accessible nucleosome configuration at enhancers for tissue-specific gene activation. Mol. Cell. 2016; 62:79–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Waterman M.L., Fischer W.H., Jones K.A.. A thymus-specific member of the HMG protein family regulates the human t cell receptor c alpha enhancer. Genes Dev. 1991; 5:656–669. [DOI] [PubMed] [Google Scholar]
- 51. Shrinivas K., Sabari B.R., Coffey E.L., Klein I.A., Boija A., Zamudio A.V., Schuijers J., Hannett N.M., Sharp P.A., Young R.A.et al.. Enhancer features that drive formation of transcriptional condensates. Mol. Cell. 2019; 75:549–561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Farley E.K., Olson K.M., Zhang W., Rokhsar D.S., Levine M.S.. Syntax compensates for poor binding sites to encode tissue specificity of developmental enhancers. Proc. Natl. Acad. Sci. U.S.A. 2016; 113:6508–6513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Heinz S., Benner C., Spann N., Bertolino E., Lin Y.C., Laslo P., Cheng J.X., Murre C., Singh H., Glass C.K.. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and b cell identities. Mol. Cell. 2010; 38:576–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Mouri K., Guo M.H., de Boer C.G., Lissner M.M., Harten I.A., Newby G.A., DeBerg H.A., Platt W.F., Gentili M., Liu D.R.et al.. Prioritization of autoimmune disease-associated genetic variants that perturb regulatory element activity in t cells. Nat. Genet. 2022; 54:603–612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Sengupta S., George R.E.. Super-enhancer-driven transcriptional dependencies in cancer. Trends Cancer. 2017; 3:269–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Banerjee N., Zhang M.Q.. Identifying cooperativity among transcription factors controlling the cell cycle in yeast. Nucleic Acids Res. 2003; 31:7024–7031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Martinez G.J., Rao A.. Cooperative transcription factor complexes in control. Science. 2012; 338:891–892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Cusanovich D.A., Hill A.J., Aghamirzaie D., Daza R.M., Pliner H.A., Berletch J.B., Filippova G.N., Huang X., Christiansen L., DeWitt W.S.et al.. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell. 2018; 174:1309–1324. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data in the database are freely available from https://tfsyntax.zhaopage.com/.




