TFBSshape: a motif database for DNA shape features of transcription factor binding sites

Lin Yang; Tianyin Zhou; Iris Dror; Anthony Mathelier; Wyeth W Wasserman; Raluca Gordân; Remo Rohs

doi:10.1093/nar/gkt1087

. 2013 Nov 7;42(Database issue):D148–D155. doi: 10.1093/nar/gkt1087

TFBSshape: a motif database for DNA shape features of transcription factor binding sites

Lin Yang ¹, Tianyin Zhou ¹, Iris Dror ^1,2, Anthony Mathelier ³, Wyeth W Wasserman ³, Raluca Gordân ⁴, Remo Rohs ^1,^*

PMCID: PMC3964943 PMID: 24214955

Abstract

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein–DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.

INTRODUCTION

The DNA binding specificities of transcription factors (TFs) can be described as consensus sequences or position frequency matrices (PFMs) representing the probability of occurrence of each nucleotide at each position of a DNA binding site. These probability matrices are usually transformed into position weight matrices (PWMs) (1,2) and can be visualized as motif logos (3). PWMs traditionally assume independence between individual nucleotide positions within the binding site. Recent approaches have expanded the basic concept of PWMs by adding dinucleotide parameters, based on observations that individual nucleotide positions within a motif are not independent from each other (4–6). Interdependencies between nucleotide positions within a motif give rise to the three-dimensional structure of DNA and, thus, the shape of TF binding sites (TFBSs).

The important role of DNA shape as a determinant of protein–DNA binding specificity has been previously discussed (7–10), and we have demonstrated mechanisms of DNA shape readout for numerous TFs (11–17) and other DNA binding proteins (18–20). The DNA structure implicitly contains all interdependencies between nucleotide positions of a TFBS and does not require explicit knowledge of individual interdependencies. Although DNA shape is a function of sequence, the sequence–structure relationship is highly complex and degenerate. DNA shape can explain why sequences that flank TFBSs contribute to binding specificity (17). Spacers between binding sites of different DNA binding proteins (21) or half sites of multimeric TFs can play a similarly important role (22,23).

Structural data have only been obtained for a small number of relatively short DNA sequences that have been studied experimentally or by computationally expensive molecular simulations. This limited availability has been a major bottleneck for using DNA shape information in genome analysis. To overcome this limitation, we recently developed a fast and efficient method for the high-throughput prediction of DNA shape, and validated the method with massive experimental and computational data (24).

Using this approach, in the present study, we describe the development of our TFBSshape database, which provides DNA structural features for nucleotide sequences preferred by different TFs. We analysed 739 datasets derived from open-access motif databases that describe the DNA binding specificities of TFs from 23 different species. We used the sequence information provided by JASPAR (25) and UniPROBE (26) to calculate DNA shape features of TFBSs. These features include minor groove width (MGW), Roll, propeller twist (ProT) and helix twist (HelT). Our TFBSshape database qualitatively illustrates the TFBS shape profiles in heat maps for TF core binding sites. Flanking sequences are included whenever such information is available. Download options provide quantitative data for further analysis. TFBSshape includes a tool to compare, both qualitatively and quantitatively, any two selected TFBS shape profiles. A user can also upload a sequence dataset and compare its DNA shape features with any chosen TFBS shape profile in the database.

We applied the TFBSshape approach to different biological applications. We analysed the differential DNA shape preferences of the human basic helix-loop-helix (bHLH) TFs Mad2 (‘Mad’), Max and c-Myc (‘Myc’) using genome-context protein binding microarray (gcPBM) data (27). To demonstrate the added value of describing TFBSs using structural features, we used L2-regularized multiple linear regression (MLR) to predict the DNA binding specificities of these bHLH factors based on nucleotide sequence alone compared to a model that combines DNA sequence and shape. We showed that shape-augmented MLR models improved the accuracy in DNA binding specificity predictions by >20%. We also described the DNA shape preferences of Hox proteins in mouse using DNA binding sequences derived from universal protein binding microarray (PBM) experiments (28). The results of this analysis showed that distinct DNA shape features of TFBSs for anterior versus posterior Hox TFs, as previously reported for Drosophila (12), can be observed across species.

DATABASE

Database architecture and methodology

TFBSshape derives TFBS sequence information from the motif databases JASPAR (25) and UniPROBE (26) and generates DNA shape data for TFBSs based on the high-throughput prediction of DNA structural features, including the parameters MGW, Roll, ProT and HelT (24). The approach uses a sliding pentamer window and query tables of structural features derived from all-atom Monte Carlo simulations for all 512 unique pentamers. We previously validated this method using massive experimental data from X-ray crystallography, NMR spectroscopy and hydroxyl radical cleavage experiments, as well as statistical analysis and cross-validation (24).

The backend of TFBSshape consists of a MySQL database, PHP scripts hosted on an Apache server, and other scripts invoked by the PHP scripts to perform TFBS assembly and DNA shape prediction upon user request (Supplementary Figure S1). The frontend of TFBSshape includes HTML web pages with components of CSS and JavaScripts that provide a user-friendly interface for retrieving data from the database. In addition, it provides an interface for comparison of two TFBS shape profiles from the database and an interface for generating DNA shape data for user-uploaded TFBS sequences, which can also be compared to a TFBS shape profile in the database. Among the TFBS data derived from JASPAR and UniPROBE, 371 TFs are from JASPAR (29), including 149 TFs from the latest JASPAR2014 release (25) and 368 TFs are from UniPROBE (26). TFs in JASPAR or UniPROBE without TFBS sequence information are not included in TFBSshape. Due to their different storage formats, sequence data from JASPAR and UniPROBE need to undergo different pre-processing steps prior to TFBS assembly and DNA shape prediction.

Interface with JASPAR

JASPAR curates TFBS sequence data derived from the literature in its sub-database JASPAR CORE (29). For TFs with available motif information, the TFBS sequences are provided in FASTA format, with the core binding site highlighted in upper-case letters and the flanking sequences in lower-case letters. Using this sequence data, TFBSshape derives DNA shape features for all nucleotide positions within the core binding site. This prediction is possible because TFBS sequences from JASPAR always contain the core binding sites, whereas their flanking sequences can be missing. Due to the methodology used to predict DNA shape features at the centre of a sliding pentamer (24), 2-bp flanks are needed to calculate the structural features for the entire core binding motif.

Because the flanking sequence information can be missing, TFBSshape first calculates the portion of TFBS sequences that contain ≥2-bp flanks on each side of the core binding site. If this portion is >50% of all sequences available for a TF, sequences that do not contain 2-bp flanks are removed, and the TFBSs are assembled in the form of nnNNN … NNNnn using the remaining sequences. Here, the stretch of NNN … NNN represents the core binding site, and nn represents the 2-bp flanks required for predicting DNA shape features for the two NN positions at either end of the core binding site. If the portion of TFBSs with flanking sequence information is ≤50%, then no sequences are removed, and TFBSs are assembled in the form of NNN … NNN. In this case, DNA structural features cannot be predicted for the first and last two NN positions of the core binding site.

In either case, DNA shape features are predicted for the core binding sites and visualized as heat maps. A white space at the left and right margins of the heat maps indicates that structural features for the end positions of the core binding site are not available due to missing flanks. This situation is more common for TFBS data from the previous JASPAR release (29). It does not happen for data from the JASPAR2014 update (25). For each structural feature, a heat map for individual sequences and an average heat map are provided. Each row in the heat map for individual sequences represents the structural feature values of a corresponding sequence (Figure 1A). If the number of rows is ≤3000, these rows are clustered using the agglomerative hierarchical clustering algorithm, with distances defined as the Euclidean distance (ED) between the values of the individual rows. The average heat map provides average structural features at each nucleotide position of the TFBS (Figure 1B). Although these heat maps can be seen as a qualitative analysis, links are provided for the user to download the actual DNA shape data for further quantitative analysis. The PWM generated based on the analysed set of sequences is visualized in the TFBSshape database as a motif logo (Figure 1C). In this format, the numbering of the nucleotide positions corresponds to the numbering used in the structural feature heat maps.

Figure 1. — Example TFBSshape analysis of DNA shape preferences for an Hnf4a TF dataset from UniPROBE. (A) Heat map showing predicted MGW profiles for individual sequences, clustered based on EDs of MGW profiles, and (B) average heat map for all sequences. The colour code for both heat maps uses red for narrow MGW, blue for wide MGW and white for intermediate values. (C) PWM calculated using all analysed TFBS sequences, aligned with DNA shape heat maps as nucleotide sequence reference.

Interface with UniPROBE

UniPROBE hosts TF binding data generated from universal PBM experiments (26). In these experiments, each array is designed to contain probes that cover all possible 10-mer variants (30). A TF of interest can access each probe to initiate a binding event, and the binding intensities for different probes are compared based on their fluorescence signal intensities. These data are used to derive PFMs that represent the in vitro DNA binding specificities of the TF. TFBSshape retrieves the probe set and PFM from UniPROBE and uses them as input data of the Find Individual Motif Occurrences (FIMO) algorithm (31) from the MEME Suite of motif-based sequence analysis tools (32). FIMO searches in the input sequences for occurrences of a motif specified by a PFM. Each motif occurrence found by FIMO is associated with a P-value, indicating the statistical significance level of the detected occurrence. TFBSshape determines motif occurrences with P ≤ 10⁻³ as core binding sites. This significance level is an empirical threshold based on the fact that PFMs generated from such TFBSs are highly consistent with the original PFMs provided by UniPROBE (26).

The assumption that TFBSs are enriched among probes with higher PBM signal intensities is generally true for TFBSs found in the above manner. A barcode visualizes the enrichment of the TFBSs among the ranked probes, with vertical bars representing probes with PBM signal intensities in descending order from left to right, and a white bar indicating no occurrence of a TFBS, a yellow bar indicating one TFBS and a brown bar indicating multiple TFBSs. TFBS-containing probes are subjected to the same TFBS assembly and other procedures as described for JASPAR. TFBSs derived from UniPROBE data usually have ≥2-bp flanks because most TFBSs are not located at the end of the probe. The TFBS sequence and shape data can be downloaded for further quantitative analysis.

User interface for analysis of DNA shape profile of one TF dataset

TFBSshape provides tab pages that dynamically display specific content and form an interface for retrieving data. The ‘Selection’ tab enables the user to specify the search criteria for either JASPAR or UniPROBE data or to upload custom-aligned sequences. After the user initiates the selection, the ‘Refine’ tab displays a table listing all of the TFs that satisfy the search criteria or shows a form for submitting sequences. The user can select a TF from the list or upload custom DNA sequences. The ‘Results’ tab displays a table containing information on the analysed dataset, with a download link for the DNA sequence and shape data. The ‘Results’ tab also displays structural feature heat maps for individual sequences (Figure 1A), average heat maps for each shape parameter (Figure 1B), and the motif logo representing the PWM calculated using TFBS sequence information (Figure 1C). The TFBSshape interface dynamically updates the content under the tab pages, allowing the user to maintain a temporary customization throughout an analysis session.

User interface for comparison of DNA shape profiles of two TF datasets

TFBSshape provides an interface for comparing two TFBS shape profiles from the database, or for comparing an uploaded TFBS dataset with a user-chosen reference TF dataset from the database. Under the ‘Selection’ tab, the user can initiate the selection to compare two TFs or to upload custom-aligned sequences. The ‘Refine’ tab then displays a form that guides the user through the selection of the two desired TF datasets, derived from either JASPAR or UniPROBE, or a form that enables the user to upload the sequences and select the desired reference TF dataset from the database. In this form, the user needs to specify the alignment of the two TF motifs by setting the reference positions for the compared datasets or the offset in nucleotide positions for the uploaded sequences. After this step, the user will find the comparison of DNA shape features under the ‘Results’ tab. The user can return to the ‘Refine’ tab to adjust the alignment.

As an example for this functionality of TFBSshape we compared UniPROBE datasets for the bHLH TFs Max and Cbf1, which are from mouse and yeast, respectively. The TFBSs were aligned based on the PWMs for both TFs (Figure 2A). Using the requested sequence alignment, TFBSshape visualized quantitative comparisons of average heat maps for the DNA shape features MGW, Roll, ProT and HelT, and provided Pearson’s correlation coefficients (PCC) and EDs as quantitative measures for the comparison (Figure 2B).

Figure 2. — Example TFBSshape comparison of DNA shape preferences of two TF datasets from UniPROBE for the homologous TFs Max from mouse and Cbf1 from yeast. (A) PWMs calculated for Max and Cbf1 using all analysed TFBS sequences, with nucleotide positions numbered according to the user-determined alignment. (B) Using the chosen alignment, average heat maps for the four DNA shape features MGW, Roll, ProT and HelT are shown for Max (TF1) and Cbf1 (TF2). These shape profiles were quantitatively compared using PCC and ED.

BIOLOGICAL APPLICATIONS

DNA shape preferences of human bHLH TFs

Paralogous TFs often bind to TFBSs with very similar and often identical core binding motifs, although they bind to different target sites in the genome to execute their specific in vivo functions. We previously showed that the yeast bHLH factors Cbf1 and Tye7 select distinct DNA shape features that contribute to their DNA binding specificities to genomic target sites, despite their strong preference for the CACGTG E-box as a shared core binding motif (17). TFBSshape can be used to compare structural features of TFBSs in analysing DNA binding specificities among TFs of the same family. Therefore, in this report, we extended our study to human bHLH factors. We analysed the TFBSs of Mad, Max and Myc derived from gcPBM experiments (27).

Heat maps and box plots for MGW (Figure 3A, Supplementary Figure S2A–C), Roll, ProT and HelT (Supplementary Figures S3–S5) clearly indicated the unique structural features of the E-box. In addition, a Kolmogorov–Smirnov (K-S) significance test revealed that Mad and Max exhibit much more similar DNA shape preferences compared to the more distinct DNA binding specificity of Myc (Figure 3B, Supplementary Figures S2–S5). Whereas the E-box as a core binding motif is shared between all three TFs, differential DNA binding specificities can be detected through motif-based analysis of DNA shape preferences. These differences can be due to variations in the flanking sequences, as shown for Cbf1 and Tye7, or nucleotide variants within the E-box (Supplementary Table S1). To confirm the significance of the detected TFBS shape differences, we analysed a replicate experiment using Myc. The results indicated that the DNA shape features selected by the same TF in two independent gcPBM experiments were not distinct, according to K-S P-values (Figure 3B, Supplementary Figure S2D, Supplementary Table S1).

To further test whether the subtle differences in DNA shape features of the binding sites of the three paralogous bHLH TFs Mad, Max and Myc contribute to binding specificity beyond nucleotide sequence, we used L2-regularized MLR and 10-fold cross-validation to assess the prediction accuracy of models using sequence alone compared to a combination of sequence and shape parameters (MGW, Roll, ProT and HelT). We found that experimentally determined DNA binding specificities could be predicted with R²-values between 0.65 and 0.70, whereas a shape-augmented model reached R²-values between 0.80 and 0.88 (Figure 3C). Thus, by incorporating DNA shape features into binding specificity predictions, we achieved improvements of ∼26% for Mad, ∼26% for Max and ∼23% for Myc. These improvements indicate an important contribution of DNA shape features in protein–DNA recognition. In this model, DNA sequence and shape features were encoded using a strategy similar to our previous study of yeast bHLH TFs (16) but here we also considered variations within the E-box core motif. Using randomly shuffled shape parameters did not lead to any improvement over the sequence-based model (Figure 3C). These results clearly demonstrate the added value of shape-augmented descriptions of TFBSs in the modelling of DNA binding specificities.

DNA shape preferences of Hox TFs in mouse

We previously demonstrated that anterior and posterior Drosophila Hox proteins prefer distinct minor groove geometries (11,12), and recently analysed DNA shape preferences of mouse homeodomain TFs (13) derived from universal PBM experiments (28). Here, we show that the distinct DNA shape preferences of anterior and posterior Hox proteins hold for mouse, based on comparisons of their MGW (Figure 4A), Roll, ProT and HelT profiles (Supplementary Figure S6). Using EDs of MGW profiles, we generated a dendrogram revealing relationships between DNA binding specificities of mouse Hox proteins, and demonstrated clear distinctions in MGW preferences between anterior and posterior Hox TFs in mouse (Figure 4B), similar to the distinction previously observed for Drosophila (12). Thus, the TFBSshape database can be utilized to study relationships in DNA binding specificities of closely related TFs within protein families.

Figure 4. — DNA shape features distinguish TFBSs of anterior and posterior Hox proteins in mouse. (A) Heat map illustrating average MGW profiles of binding sites preferred by mouse Hox TFs determined by universal PBM (28). (B) A dendrogram based on EDs between average MGW profiles of preferred TFBSs demonstrates the different DNA shape preferences of anterior and posterior Hox TFs in mouse.

CONCLUSIONS

We have demonstrated that augmenting existing motif databases with DNA shape features provides new insights into the mechanisms used by TFs to achieve DNA binding specificity. Analysing DNA shape preferences can help to differentiate between similar DNA binding specificities of paralogous TFs (17). Such studies can be generalized to compare DNA binding specificities of homologous TFs from different species. Comparisons of structural features of TFBSs could potentially reveal evolutionary relationships between TFs based on the shape of their DNA binding sites (13). Integrating TFBSshape with the motif databases JASPAR (25) and UniPROBE (26) makes DNA shape information readily available for known motifs. Whereas TFBSshape currently contains data for 23 species from the open-access motif databases JASPAR and UniPROBE, species-specific databases (33,34) can easily be integrated to expand the repertoire of datasets for comparative analysis of TF binding specificities. The availability of DNA shape features for TFBSs suggests many further applications, such as shape-augmented genome annotations (9) and TFBS predictions using DNA structural features (35–37).

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

USC-Technion Visiting Fellows Program, an Alfred P. Sloan Research Fellowship (to R.R.); National Institutes of Health (NIH) [U01GM103804 and R01HG003008] (in part to R.R.); PhRMA Foundation Research Starter Grant (to R.G.). Funding for open access charge: USC-Technion Visiting Fellows Program.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

The authors acknowledge that BIRS workshop 13w5042 made this collaboration with JASPAR2014 possible. See Supplementary Data for detailed author contributions.

REFERENCES

1.Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. doi: 10.1093/bioinformatics/16.1.16. [DOI] [PubMed] [Google Scholar]
2.Stormo GD. Modeling the specificity of protein-DNA interactions. In: Zhang M, Tang C, editors. Quantitative Biology. Vol. 1. Berlin-Heidelberg, Germany: Springer; 2013. pp. 115–130. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18:6097–6100. doi: 10.1093/nar/18.20.6097. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Sharon E, Lubliner S, Segal E. A feature-based approach to modeling protein-DNA interactions. PLoS Comput. Biol. 2008;4:e1000154. doi: 10.1371/journal.pcbi.1000154. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Zhao Y, Ruan S, Pandey M, Stormo GD. Improved models for transcription factor binding site identification using nonindependent interactions. Genetics. 2012;191:781–790. doi: 10.1534/genetics.112.138685. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Weirauch MT, Cote A, Norel R, Annala M, Zhao Y, Riley TR, Saez-Rodriguez J, Cokelaer T, Vedenko A, Talukder S, et al. Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 2013;31:126–134. doi: 10.1038/nbt.2486. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Rohs R, West SM, Sosinsky A, Liu P, Mann RS, Honig B. The role of DNA shape in protein-DNA recognition. Nature. 2009;461:1248–1253. doi: 10.1038/nature08473. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Rohs R, Jin X, West SM, Joshi R, Honig B, Mann RS. Origins of specificity in protein-DNA recognition. Annu. Rev. Biochem. 2010;79:233–269. doi: 10.1146/annurev-biochem-060408-091030. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Parker SC, Tullius TD. DNA shape, genetic codes, and evolution. Curr. Opin. Struct. Biol. 2011;21:342–347. doi: 10.1016/j.sbi.2011.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Ostuni R, Natoli G. Lineages, cell types and functional states: a genomic view. Curr. Opin. Cell Biol. 2013;25:759–764. doi: 10.1016/j.ceb.2013.07.006. [DOI] [PubMed] [Google Scholar]
11.Joshi R, Passner JM, Rohs R, Jain R, Sosinsky A, Crickmore MA, Jacob V, Aggarwal AK, Honig B, Mann RS. Functional specificity of a Hox protein mediated by the recognition of minor groove structure. Cell. 2007;131:530–543. doi: 10.1016/j.cell.2007.09.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Slattery M, Riley T, Liu P, Abe N, Gomez-Alcala P, Dror I, Zhou T, Rohs R, Honig B, Bussemaker HJ, et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell. 2011;147:1270–1282. doi: 10.1016/j.cell.2011.10.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Dror I, Zhou T, Mandel-Gutfreund Y, Rohs R. Covariation of homeodomain transcription factors and the shape of their DNA binding sites. Nucleic Acids Res. 2014;42:430–441. doi: 10.1093/nar/gkt862. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Kitayner M, Rozenberg H, Rohs R, Suad O, Rabinovich D, Honig B, Shakked Z. Diversity in DNA recognition by p53 revealed by crystal structures with Hoogsteen base pairs. Nat. Struct. Mol. Biol. 2010;17:423–429. doi: 10.1038/nsmb.1800. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Eldar A, Rozenberg H, Diskin-Posner Y, Rohs R, Shakked Z. Structural studies of p53 inactivation by DNA-contact mutations and its rescue by suppressor mutations via alternative protein-DNA interactions. Nucleic Acids Res. 2013;41:8748–8759. doi: 10.1093/nar/gkt630. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Chen Y, Bates DL, Dey R, Chen PH, Dantas Machado AC, Laird-Offringa IA, Rohs R, Chen L. DNA binding by GATA transcription factor suggests mechanisms of DNA looping and long-range gene regulation. Cell Rep. 2012;2:1197–1206. doi: 10.1016/j.celrep.2012.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Gordân R, Shen N, Dror I, Zhou T, Horton J, Rohs R, Bulyk ML. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep. 2013;3:1093–1104. doi: 10.1016/j.celrep.2013.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Hancock SP, Ghane T, Cascio D, Rohs R, Di Felice R, Johnson RC. Control of DNA minor groove width and Fis protein binding by the purine 2-amino group. Nucleic Acids Res. 2013;41:6750–6760. doi: 10.1093/nar/gkt357. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Lazarovici A, Zhou T, Shafer A, Dantas Machado AC, Riley TR, Sandstrom R, Sabo PJ, Lu Y, Rohs R, Stamatoyannopoulos JA, et al. Probing DNA shape and methylation state on a genomic scale with DNase I. Proc. Natl Acad. Sci. USA. 2013;110:6376–6381. doi: 10.1073/pnas.1216822110. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Chang YP, Xu M, Dantas Machado AC, Yu XJ, Rohs R, Chen XS. Mechanism of origin DNA recognition and assembly of an initiator-helicase complex by SV40 large tumor antigen. Cell Rep. 2013;3:1117–1127. doi: 10.1016/j.celrep.2013.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Kim S, Brostromer E, Xing D, Jin J, Chong S, Ge H, Wang S, Gu C, Yang L, Gao YQ, et al. Probing allostery through DNA. Science. 2013;339:816–819. doi: 10.1126/science.1229223. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Watson LC, Kuchenbecker KM, Schiller BJ, Gross JD, Pufall MA, Yamamoto KR. The glucocorticoid receptor dimer interface allosterically transmits sequence-specific DNA signals. Nat. Struct. Mol. Biol. 2013;20:876–883. doi: 10.1038/nsmb.2595. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Chen Y, Zhang X, Dantas Machado AC, Ding Y, Chen Z, Qin PZ, Rohs R, Chen L. Structure of p53 binding to the BAX response element reveals DNA unwinding and compression to accommodate base-pair insertion. Nucleic Acids Res. 2013;41:8368–8376. doi: 10.1093/nar/gkt584. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Zhou T, Yang L, Lu Y, Dror I, Dantas Machado AC, Ghane T, Di Felice R, Rohs R. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Res. 2013;41:W56–W62. doi: 10.1093/nar/gkt437. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Mathelier A, Zhao X, Zhang AZ, Parcy F, Worsley-Hunt R, Arenillas DJ, Buchman S, Chen C-Y, Chou A, Ienasescu H, et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 2014;42:D142–D147. doi: 10.1093/nar/gkt997. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Robasky K, Bulyk ML. UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res. 2011;39:D124–D128. doi: 10.1093/nar/gkq992. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Mordelet F, Horton J, Hartemink AJ, Engelhardt BE, Gordân R. Stability selection for regression-based models of transcription factor-DNA binding specificity. Bioinformatics. 2013;29:i117–i125. doi: 10.1093/bioinformatics/btt221. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Berger MF, Badis G, Gehrke AR, Talukder S, Philippakis AA, Pena-Castillo L, Alleyne TM, Mnaimneh S, Botvinnik OB, Chan ET, et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell. 2008;133:1266–1276. doi: 10.1016/j.cell.2008.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010;38:D105–D110. doi: 10.1093/nar/gkp950. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Berger MF, Philippakis AA, Qureshi AM, He FS, Estep PW, III, Bulyk ML. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 2006;24:1429–1435. doi: 10.1038/nbt1246. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37:W202–W208. doi: 10.1093/nar/gkp335. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Zhu LJ, Christensen RG, Kazemian M, Hull CJ, Enuameh MS, Basciotta MD, Brasefield JA, Zhu C, Asriyan Y, Lapointe DS, et al. FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system. Nucleic Acids Res. 2011;39:D111–D117. doi: 10.1093/nar/gkq858. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.de Boer CG, Hughes TR. YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities. Nucleic Acids Res. 2012;40:D169–D179. doi: 10.1093/nar/gkr993. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Meysman P, Dang TH, Laukens K, De Smet R, Wu Y, Marchal K, Engelen K. Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli. Nucleic Acids Res. 2011;39:e6. doi: 10.1093/nar/gkq1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Maienschein-Cline M, Dinner AR, Hlavacek WS, Mu F. Improved predictions of transcription factor binding sites using physicochemical features of DNA. Nucleic Acids Res. 2012;40:e175. doi: 10.1093/nar/gks771. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Hooghe B, Broos S, van Roy F, De Bleser P. A flexible integrative approach based on random forest improves prediction of transcription factor binding sites. Nucleic Acids Res. 2012;40:e106. doi: 10.1093/nar/gks283. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B1] 1.Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. doi: 10.1093/bioinformatics/16.1.16. [DOI] [PubMed] [Google Scholar]

[gkt1087-B2] 2.Stormo GD. Modeling the specificity of protein-DNA interactions. In: Zhang M, Tang C, editors. Quantitative Biology. Vol. 1. Berlin-Heidelberg, Germany: Springer; 2013. pp. 115–130. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B3] 3.Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18:6097–6100. doi: 10.1093/nar/18.20.6097. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B4] 4.Sharon E, Lubliner S, Segal E. A feature-based approach to modeling protein-DNA interactions. PLoS Comput. Biol. 2008;4:e1000154. doi: 10.1371/journal.pcbi.1000154. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B5] 5.Zhao Y, Ruan S, Pandey M, Stormo GD. Improved models for transcription factor binding site identification using nonindependent interactions. Genetics. 2012;191:781–790. doi: 10.1534/genetics.112.138685. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B6] 6.Weirauch MT, Cote A, Norel R, Annala M, Zhao Y, Riley TR, Saez-Rodriguez J, Cokelaer T, Vedenko A, Talukder S, et al. Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 2013;31:126–134. doi: 10.1038/nbt.2486. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B7] 7.Rohs R, West SM, Sosinsky A, Liu P, Mann RS, Honig B. The role of DNA shape in protein-DNA recognition. Nature. 2009;461:1248–1253. doi: 10.1038/nature08473. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B8] 8.Rohs R, Jin X, West SM, Joshi R, Honig B, Mann RS. Origins of specificity in protein-DNA recognition. Annu. Rev. Biochem. 2010;79:233–269. doi: 10.1146/annurev-biochem-060408-091030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B9] 9.Parker SC, Tullius TD. DNA shape, genetic codes, and evolution. Curr. Opin. Struct. Biol. 2011;21:342–347. doi: 10.1016/j.sbi.2011.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B10] 10.Ostuni R, Natoli G. Lineages, cell types and functional states: a genomic view. Curr. Opin. Cell Biol. 2013;25:759–764. doi: 10.1016/j.ceb.2013.07.006. [DOI] [PubMed] [Google Scholar]

[gkt1087-B11] 11.Joshi R, Passner JM, Rohs R, Jain R, Sosinsky A, Crickmore MA, Jacob V, Aggarwal AK, Honig B, Mann RS. Functional specificity of a Hox protein mediated by the recognition of minor groove structure. Cell. 2007;131:530–543. doi: 10.1016/j.cell.2007.09.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B12] 12.Slattery M, Riley T, Liu P, Abe N, Gomez-Alcala P, Dror I, Zhou T, Rohs R, Honig B, Bussemaker HJ, et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell. 2011;147:1270–1282. doi: 10.1016/j.cell.2011.10.053. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B13] 13.Dror I, Zhou T, Mandel-Gutfreund Y, Rohs R. Covariation of homeodomain transcription factors and the shape of their DNA binding sites. Nucleic Acids Res. 2014;42:430–441. doi: 10.1093/nar/gkt862. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B14] 14.Kitayner M, Rozenberg H, Rohs R, Suad O, Rabinovich D, Honig B, Shakked Z. Diversity in DNA recognition by p53 revealed by crystal structures with Hoogsteen base pairs. Nat. Struct. Mol. Biol. 2010;17:423–429. doi: 10.1038/nsmb.1800. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B15] 15.Eldar A, Rozenberg H, Diskin-Posner Y, Rohs R, Shakked Z. Structural studies of p53 inactivation by DNA-contact mutations and its rescue by suppressor mutations via alternative protein-DNA interactions. Nucleic Acids Res. 2013;41:8748–8759. doi: 10.1093/nar/gkt630. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B16] 16.Chen Y, Bates DL, Dey R, Chen PH, Dantas Machado AC, Laird-Offringa IA, Rohs R, Chen L. DNA binding by GATA transcription factor suggests mechanisms of DNA looping and long-range gene regulation. Cell Rep. 2012;2:1197–1206. doi: 10.1016/j.celrep.2012.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B17] 17.Gordân R, Shen N, Dror I, Zhou T, Horton J, Rohs R, Bulyk ML. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep. 2013;3:1093–1104. doi: 10.1016/j.celrep.2013.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B18] 18.Hancock SP, Ghane T, Cascio D, Rohs R, Di Felice R, Johnson RC. Control of DNA minor groove width and Fis protein binding by the purine 2-amino group. Nucleic Acids Res. 2013;41:6750–6760. doi: 10.1093/nar/gkt357. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B19] 19.Lazarovici A, Zhou T, Shafer A, Dantas Machado AC, Riley TR, Sandstrom R, Sabo PJ, Lu Y, Rohs R, Stamatoyannopoulos JA, et al. Probing DNA shape and methylation state on a genomic scale with DNase I. Proc. Natl Acad. Sci. USA. 2013;110:6376–6381. doi: 10.1073/pnas.1216822110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B20] 20.Chang YP, Xu M, Dantas Machado AC, Yu XJ, Rohs R, Chen XS. Mechanism of origin DNA recognition and assembly of an initiator-helicase complex by SV40 large tumor antigen. Cell Rep. 2013;3:1117–1127. doi: 10.1016/j.celrep.2013.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B21] 21.Kim S, Brostromer E, Xing D, Jin J, Chong S, Ge H, Wang S, Gu C, Yang L, Gao YQ, et al. Probing allostery through DNA. Science. 2013;339:816–819. doi: 10.1126/science.1229223. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B22] 22.Watson LC, Kuchenbecker KM, Schiller BJ, Gross JD, Pufall MA, Yamamoto KR. The glucocorticoid receptor dimer interface allosterically transmits sequence-specific DNA signals. Nat. Struct. Mol. Biol. 2013;20:876–883. doi: 10.1038/nsmb.2595. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B23] 23.Chen Y, Zhang X, Dantas Machado AC, Ding Y, Chen Z, Qin PZ, Rohs R, Chen L. Structure of p53 binding to the BAX response element reveals DNA unwinding and compression to accommodate base-pair insertion. Nucleic Acids Res. 2013;41:8368–8376. doi: 10.1093/nar/gkt584. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B24] 24.Zhou T, Yang L, Lu Y, Dror I, Dantas Machado AC, Ghane T, Di Felice R, Rohs R. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Res. 2013;41:W56–W62. doi: 10.1093/nar/gkt437. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B25] 25.Mathelier A, Zhao X, Zhang AZ, Parcy F, Worsley-Hunt R, Arenillas DJ, Buchman S, Chen C-Y, Chou A, Ienasescu H, et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 2014;42:D142–D147. doi: 10.1093/nar/gkt997. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B26] 26.Robasky K, Bulyk ML. UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res. 2011;39:D124–D128. doi: 10.1093/nar/gkq992. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B27] 27.Mordelet F, Horton J, Hartemink AJ, Engelhardt BE, Gordân R. Stability selection for regression-based models of transcription factor-DNA binding specificity. Bioinformatics. 2013;29:i117–i125. doi: 10.1093/bioinformatics/btt221. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B28] 28.Berger MF, Badis G, Gehrke AR, Talukder S, Philippakis AA, Pena-Castillo L, Alleyne TM, Mnaimneh S, Botvinnik OB, Chan ET, et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell. 2008;133:1266–1276. doi: 10.1016/j.cell.2008.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B29] 29.Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010;38:D105–D110. doi: 10.1093/nar/gkp950. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B30] 30.Berger MF, Philippakis AA, Qureshi AM, He FS, Estep PW, III, Bulyk ML. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 2006;24:1429–1435. doi: 10.1038/nbt1246. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B31] 31.Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B32] 32.Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37:W202–W208. doi: 10.1093/nar/gkp335. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B33] 33.Zhu LJ, Christensen RG, Kazemian M, Hull CJ, Enuameh MS, Basciotta MD, Brasefield JA, Zhu C, Asriyan Y, Lapointe DS, et al. FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system. Nucleic Acids Res. 2011;39:D111–D117. doi: 10.1093/nar/gkq858. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B34] 34.de Boer CG, Hughes TR. YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities. Nucleic Acids Res. 2012;40:D169–D179. doi: 10.1093/nar/gkr993. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B35] 35.Meysman P, Dang TH, Laukens K, De Smet R, Wu Y, Marchal K, Engelen K. Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli. Nucleic Acids Res. 2011;39:e6. doi: 10.1093/nar/gkq1071. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B36] 36.Maienschein-Cline M, Dinner AR, Hlavacek WS, Mu F. Improved predictions of transcription factor binding sites using physicochemical features of DNA. Nucleic Acids Res. 2012;40:e175. doi: 10.1093/nar/gks771. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt1087-B37] 37.Hooghe B, Broos S, van Roy F, De Bleser P. A flexible integrative approach based on random forest improves prediction of transcription factor binding sites. Nucleic Acids Res. 2012;40:e106. doi: 10.1093/nar/gks283. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

TFBSshape: a motif database for DNA shape features of transcription factor binding sites

Lin Yang

Tianyin Zhou

Iris Dror

Anthony Mathelier

Wyeth W Wasserman

Raluca Gordân

Remo Rohs

Series information

Abstract

INTRODUCTION