Abstract
TAL (transcriptional activator-like) effectors are DNA-binding repeat proteins recently shown to recognize their target sites by an unprecedented, 1:1 mapping between repeat residues and DNA bases. The structural basis for this recognition is not known; in particular, it is not clear whether such 1:1 recognition can be accommodated by standard major-groove readout of B-form DNA. Here we describe a structure prediction protocol tailored to the TAL–DNA system, and report simulation results that shed light on observed repeat-base associations and overall TAL structure. Our models demonstrate that TAL–DNA interactions can be explained by a model in which the TAL repeat domain forms a superhelical repeat structure that wraps around undistorted B-form DNA, paralleling the geometry of the major groove, with contacts between position 13 of each repeat and its associated base pair on the sense strand determining the specificity of DNA recognition.
Keywords: structure prediction, protein–DNA interactions, repeat proteins
Introduction
TAL (transcriptional activator-like) effectors are DNA-binding proteins used by certain bacterial plant pathogens to retarget their host's transcriptional machinery during infection.1 TAL effectors contain multiple, tandemly repeated copies of a 33–34 amino acid repeat unit that shows very high sequence conservation, both within individual proteins and across the family. They have recently been shown to recognize DNA by an unprecedented mechanism involving a 1:1 mapping between specific repeat residues (positions 12 and 13, termed the repeat-variable diresidue, or RVD) and individual bases of their DNA target site.2, 3 The identity of the RVD amino acids can be used to predict, with considerable accuracy, the identity of the corresponding target-site base: “HD” is associated with C, NI with A, NG with T, and so on. Using this mapping between RVD sequence and DNA target site, several groups have created engineered TAL effectors with altered binding specificity for use as transcriptional activators or genome editing reagents (when fused to a nonspecific nuclease domain).4, 5 The structural basis for TAL-effector–DNA recognition is not known: structural data for TAL effectors is currently limited to an NMR structure of 1.5 repeats solved in the absence of DNA.6 Here, we describe a de novo folding protocol for the prediction of symmetrical protein–DNA interactions implemented within the Rosetta package,7, 8 and we report simulation results for the TAL–DNA system. Our models provide plausible explanations for observed RVD–DNA associations, shed light on proposed evolutionary relationships between TAL effectors and tetratricopeptide repeat (TPR) family proteins, and support the general conclusion that 1:1 RVD-base contacts can be accommodated by the major groove of B-form DNA without significant bending or kinking.
Results and Discussion
We generated de novo models for repeats containing the RVD–DNA associations HD-C, NI-A, NN-G/A, HG-T, and NK-G. These models were pooled and clustered based on structural similarity in order to identify conformations that were compatible with most or all of the observed associations. A variety of clustering algorithms and parameters were explored; while the details of the cluster compositions varied slightly, several trends emerge that are robust to this variation: left-handed helical connections between the TAL repeat units are preferred over right-handed connections (by contrast, the helical bundles formed by TPR proteins have right-handed connections); RVD position 13 makes contacts with the leading strand of the DNA site (Fig. 1) while position 12 plays an apparent structural role in stabilizing the closely packed RVD loops (perhaps in association with ordered water molecules); the highly conserved glycine at repeat position 15 is packed tightly at the junction between the two repeat alpha helices, with little room for a nonhydrogen sidechain [Fig 1(A), inset]; the second alpha-helix of the TAL repeat unit is consistently kinked at the location of a conserved proline (as also seen in the NMR structure). The backbone conformation of the RVD loop varies somewhat across clusters, with a slight preference for a loop in which RVD positions 12 and 13 have an extended conformation (negative phi, positive psi) while position G14 has a positive phi angle, perhaps explaining the need for glycine at this position. The center of the largest cluster for a representative set of parameters is shown in Figure 1(A,B) (the full dendrogram for this clustering is given in Supporting Information Fig. S1). By examining other models in this cluster and nearby clusters we are able to formulate hypotheses regarding RVD–DNA contacts that explain many of the observed associations [Fig. 1(C–F)]: hydrogen bonds to the cytosine N4 atom (HD-C), the purine N7 (NN-G/A), and guanine acceptors (NK-G), as well as packing interactions with the thymine methyl group (HG-T). We hope that these simulation results will stimulate further progress in our understanding of the structural basis of TAL-effector—DNA recognition and in the prediction and design of symmetric protein–DNA interactions.
Figure 1.
(A,B) Top cluster center model showing 8 NN-A:T repeat associations, colored from blue to red along the peptide chain, with positions 12 and 13 show in stick representation and N12, N13, GG14-15 carbon-colored purple, black, and gold, respectively. (C–F) Predicted specificity determining interactions, with RVD positions (12–13) and cognate DNA bases labeled and shown in stick representation and positions 12–15 colored as in (A). (C) HD-C: D13 forms H-bond to cytosine N4. (D) NN-G/A: N13 forms H-bond to purine N7. (E) HG-T: G13 alpha-carbon packs against thymine methyl group. (F) NK-G: K13 forms H-bonds to guanine acceptors.
Methods
Our TAL effector structure prediction simulations are based on two key assumptions: first, that at least one of the two specificity-determining residues (repeat positions 12 and 13) is in contact with the DNA; second, that successive repeats are structurally symmetric both internally and in their orientation relative to the DNA. In implementing the second assumption, we expected that successive repeats containing the same RVD would look essentially identical to one another, whereas repeats with different RVDs might differ in the local conformation of the DNA-contacting loop, while the packing of the core structure would be expected to be highly similar (given the high degree of sequence similarity between repeats).
To construct models subject to these assumptions, we collected template information from the structural database in three forms: amino-acid—base contacts to model the RVD–DNA interaction [Fig. 2(A); represented as protein–DNA rigid-body interface transforms as described previously9], protein backbone fragments to model the internal conformation of the repeats, and duplex DNA base-step fragments to model the conformation of the DNA. To reduce the conformational space that must be explored during each simulation, we chose to model fully symmetrical repeat systems consisting of 5–10 copies of a single RVD-containing repeat sequence in complex with its cognate DNA target site (e.g., 8xNI in complex with 8xA:T, 8xHD in complex with 8xC:G, etc). At the start of each independent simulation, a symmetrical DNA target site is constructed by selecting a base-step transform and associated dihedral angles from the structural database and replicating this transform for the length of the DNA site to construct a symmetrical DNA model in which the local conformation around each base-pair is identical [Fig. 2(B,C)]. DNA models deviating greatly from ideal B-form or having internal clashes are discarded. We then construct a model for the protein–DNA complex through a Monte Carlo fragment replacement simulation using the protein backbone fragments and the protein–DNA interface transforms [Fig. 2(D,E)]. Symmetry is enforced throughout this simulation by making identical backbone fragment insertions in each of the repeat segments, and identical interface fragment insertions for each of the repeat-DNA interfaces. This symmetrical fragment assembly simulation is followed by a short, nonsymmetrical all-atom refinement with DNA flexibility to allow relaxation of any internal strain and assignment of a meaningful final energy. We cluster the low-energy conformations identified in these independent simulations to look for protein–DNA complex structures that are sampled frequently by multiple different RVD systems. Additional implementation details are given in the online Supporting Information.
Figure 2.

Symmetrical modeling of protein–DNA interactions. (A) Amino-acid—base contacts taken from the structural database are used to model the RVD–DNA associations. To model HD-C:G, for example, all contacts between either His or Asp sidechains and C:G base pairs were used (a few representative contacts are shown). (B) To generate symmetrical DNA templates a base-step transform (indicated by the yellow arrows) and associated torsion angles taken from a protein–DNA complex structure is replicated throughout the modeled duplex, so that the local neighborhood of each base pair is identical. (C) Library of symmetrical DNA templates used for the simulations. (D) TAL–DNA model consisting of three RVD–DNA repeats (colored blue, green, and yellow), shown before and after (darker and lighter colors, respectively) making an interface transform sampling move. Each repeat unit is anchored to its cognate base-step by a rigid-body linkage (dotted lines). (D) Three-repeat model shown before and after making a protein backbone fragment insertion move (colored as in C). Note that conformational change does not propagate between repeat units: each repeat unit is built outward from the RVD anchor position, with chainbreaks inserted between repeats and a pseudoenergy term favoring closure applied during the folding simulation.
Acknowledgments
The author thanks the members of the Rosetta community for their contributions to the software used in this study, as well as the attendees of the FHCRC structural journal club for feedback and stimulating discussions.
Note added in proof
Two papers reporting crystal structures of TAL-effector DNA complexes have just been published online.10, 11 These structures confirm the major findings of this study.
Supplementary material
Additional Supporting Information may be found in the online version of this article.
References
- 1.Bogdanove AJ, Schornack S, Lahaye T. TAL effectors: finding plant genes for disease and defense. Curr Opin Plant Biol. 2010;13:394–401. doi: 10.1016/j.pbi.2010.04.010. [DOI] [PubMed] [Google Scholar]
- 2.Moscou MJ, Bogdanove AJ. A simple cipher governs DNA recognition by TAL effectors. Science. 2009;326:1501. doi: 10.1126/science.1178817. [DOI] [PubMed] [Google Scholar]
- 3.Boch J, Scholze H, Schornack S, Landgraf A, Hahn S, Kay S, Lahaye T, Nickstadt A, Bonas U. Breaking the code of DNA binding specificity of TAL-type III effectors. Science. 2009;326:1509–1512. doi: 10.1126/science.1178811. [DOI] [PubMed] [Google Scholar]
- 4.Scholze H, Boch J. TAL effectors are remote controls for gene activation. Curr Opin Microbiol. 2011;14:47–53. doi: 10.1016/j.mib.2010.12.001. [DOI] [PubMed] [Google Scholar]
- 5.Miller JC, Tan S, Qiao G, Barlow KA, Wang J, Xia DF, Meng X, Paschon DE, Leung E, Hinkley SJ, Dulay GP, Hua KL, Ankoudinova I, Cost GJ, Urnov FD, Zhang HS, Holmes MC, Zhang L, Gregory PD, Rebar EJ. A TALE nuclease architecture for efficient genome editing. Nat Biotechnol. 2010;29:143–148. doi: 10.1038/nbt.1755. [DOI] [PubMed] [Google Scholar]
- 6.Murakami MT, Sforca ML, Neves JL, Paiva JH, Domingues MN, Pereira AL, Zeri AC, Benedetti CE. The repeat domain of the type III effector protein PthA shows a TPR-like structure and undergoes conformational changes upon DNA interaction. Proteins. 2010;78:3386–3395. doi: 10.1002/prot.22846. [DOI] [PubMed] [Google Scholar]
- 7.Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman K, Renfrew PD, Smith CA, Sheffler W, Davis IW, Cooper S, Treuille A, Mandell DJ, Richter F, Ban YE, Fleishman SJ, Corn JE, Kim DE, Lyskov S, Berrondo M, Mentzer S, Popović Z, Havranek JJ, Karanicolas J, Das R, Meiler J, Kortemme T, Gray JJ, Kuhlman B, Baker D, Bradley P. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011;487:545–574. doi: 10.1016/B978-0-12-381270-4.00019-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Andre I, Bradley P, Wang C, Baker D. Prediction of the structure of symmetrical protein assemblies. Proc Natl Acad Sci USA. 2007;104:17656–17661. doi: 10.1073/pnas.0702626104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Yanover C, Bradley P. Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers. Nucleic Acids Res. 2011;39:4564–4576. doi: 10.1093/nar/gkr048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mak AN, Bradley P, Cernadas RA, Bogdanove AJ, Stoddard BL. The crystal structure of TAL effector PthXo1 bound to its DNA target. Science. 2012 doi: 10.1126/science.1216211. [DOI: 10.1126/science.1216211] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Deng D, Yan C, Pan X, Mahfouz M, Wang J, Zhu JK, Shi Y, Yan N. Structural basis for sequence-specific recognition of DNA by TAL effectors. Science. 2012 doi: 10.1126/science.1215670. [DOI: 10.1126/science.1215670] [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

