Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2016 Mar 2;44(8):3936–3945. doi: 10.1093/nar/gkw133

Structures of an all-α protein running along the DNA major groove

Li-Yan Yu 1,, Wang Cheng 1,, Kang Zhou 1,, Wei-Fang Li 1, Hong-Mei Yu 1, Xinlei Gao 1, Xudong Shen 2, Qingfa Wu 1, Yuxing Chen 1,*, Cong-Zhao Zhou 1,*
PMCID: PMC4856987  PMID: 26939889

Abstract

Despite over 3300 protein–DNA complex structures have been reported in the past decades, there remain some unknown recognition patterns between protein and target DNA. The silkgland-specific transcription factor FMBP-1 from the silkworm Bombyx mori contains a unique DNA-binding domain of four tandem STPRs, namely the score and three amino acid peptide repeats. Here we report three structures of this STPR domain (termed BmSTPR) in complex with DNA of various lengths. In the presence of target DNA, BmSTPR adopts a zig-zag structure of three or four tandem α-helices that run along the major groove of DNA. Structural analyses combined with binding assays indicate BmSTPR prefers the AT-rich sequences, with each α-helix covering a DNA sequence of 4 bp. The successive AT-rich DNAs adopt a wider major groove, which is in complementary in shape and size to the tandem α-helices of BmSTPR. Substitutions of DNA sequences and affinity comparison further prove that BmSTPR recognizes the major groove mainly via shape readout. Multiple-sequence alignment suggests this unique DNA-binding pattern should be highly conserved for the STPR domain containing proteins which are widespread in animals. Together, our findings provide structural insights into the specific interactions between a novel DNA-binding protein and a unique deformed B-DNA.

INTRODUCTION

Recognitions of proteins towards specific DNA sequences are indispensable to read out the genetic information for all living organisms. Since the first X-ray structure of protein–DNA complex reported in 1987 (1), we have illustrated more and more structural insights into how a protein selectively binds to one or a few DNA sites out of millions along the genome. The previous proposal of ‘simple recognition code’ has been proved to be inaccurate to describe the specific interactions between protein and DNA (24). Instead, structural analyses reveal that specific recognitions of protein towards DNA are accomplished by the combination of both base (direct) readout and shape (indirect) readout (57). The former is involved in direct interactions, such as hydrogen bonds and/or hydrophobic contacts between amino acids and nucleotide bases (79), whereas the latter corresponds to the recognition of protein towards sequence-dependent DNA conformation, such as the curvature and narrow minor groove of A-tracts (1012). To date, >3300 DNA-complexed protein structures are available in the database (http://npidb.belozersky.msu.ru/) (13,14), which are grouped into ∼100 superfamilies according to Structural Classification of Proteins (15). However, most protein–DNA interaction patterns are dominantly mediated by base readout, whereas the cases mainly or exclusively contributed by DNA shape readout are relatively rare.

The silkglands of silkworm Bombyx mori have been known as the most efficient factories in nature that produce the silk proteins (16). In the posterior silkglands, the fibroin gene is selectively transcribed at the fifth instar larval stage (17). A series of transcriptional factors, which were originally identified from the crude extract of posterior silkglands, finely coordinate this efficient expression system via specifically binding to the regulatory elements at the upstream and/or the intron of fibroin gene (1821). Remarkably, the fibroin modulator binding protein-1 (FMBP-1) possesses three binding elements around −130, +220 and +290 sites of fibroin gene (21). It exhibits a tissue- and stage-specific expression profile that perfectly correlates with that of fibroin gene (2123). Sequence analysis reveals the 218-residue FMBP-1 consists of two distinct domains (Figure 1A). The N-terminal domain of unknown function contains an acidic region (residues Glu55–Glu84) followed by a hyper-basic stretch (residues Pro85–Ser98), whereas the C-terminal DNA-binding domain (residues Glu99–Thr218) consists of four tandem repeats R1–R4, each of which contains 23 residues, thus termed the score and three amino acid peptide repeat (STPR) (24). The four repeats of this domain (termed BmSTPR for short) are highly homologous to each other with a sequence-identity of 60–80% (Figure 1B), which was proposed to favour DNA fragments with a consensus sequence of 5′-atntwtnta-3′ (n: any nucleotide, w: a or t) through cooperative binding (24). In the absence of DNA, only the N-terminal moiety of each repeat of BmSTPR is folded into a short α-helix (25), whereas the intact repeat adopts an α-helical structure upon the addition of a hydrogen-bond promoting solvent trifluoroethanol (25,26). Competitive binding assays further suggested that BmSTPR most likely binds to the major groove of DNA (27). Bioinformatic analysis indicated the STPR domain is widespread in diverse eukaryotic organisms, including the model organisms Caenorhabditis elegans, Drosophila, mouse and human (24). However, the DNA-binding profile of the STPR domain remains unknown due to the lack of its intact structure in complex with DNA.

Figure 1.

Figure 1.

Structure of BmSTPR. (A) Domain organization of FMBP-1. (B) Sequence alignment of the four repeats of BmSTPR. The highly conserved residues involved in DNA binding are labelled with blue pentangles. (C) Crystal structure of BmSTPR in complex with the 13-bp DNA. Repeats R2 to R4 are shown as cylinders and coloured in cyan, yellow and purple, respectively. The coding and noncoding strands of the 13-bp DNA are shown as green and orange, respectively. The detailed interactions that stabilize the repeats R3 and R4 of BmSTPR are zoomed-in at the right panel. The involved residues are labelled and shown as sticks.

Here we present three structures of BmSTPR in complex with DNA of various lengths. Upon binding to either a 13-bp or 20-bp DNA fragment derived from the +290 site of fibroin gene, repeats R2–R4 of BmSTPR fold into three tandem α-helices, running along the major groove of DNA, whereas all or the majority of R1 is missing in the electron density map. The three repeats display a relatively rigid helical structure, forming an inter-helix angle of about 60°, exactly covering a 4-bp DNA segment by each repeat. This regular binding pattern enabled us to design a double-stranded DNA containing four tandem repetitive units of 5′-atac-3′, which makes the intact R1 fold into a helix similar to that of R2–R4. Biochemical study indicated that BmSTPR favours the AT-rich sequences, which most likely adopts a narrower minor groove (28,29), and a wider major groove to accommodate the rigid α-helix of BmSTPR. Notably, the DNA bound to BmSTPR adopts a unique deformed B-DNA conformation. Moreover, substitutions of DNA sequences combined with binding assays reveal that BmSTPR recognizes the DNA major groove mainly via indirect interactions. Together, our findings provide structural insights into a novel protein–DNA interaction pattern that mainly mediated by DNA shape readout.

MATERIALS AND METHODS

Samples preparation

The coding region of BmSTPR (residues Glu99–Ser193) was cloned into the ligation-independent cloning vector 2BT with an N-terminal 6×His tag. The construct was overexpressed in Escherichia coli BL21 (DE3) strain (Novagen) at 37°C for 4 h after induction by 0.2 mM isopropyl β-D-l-thiogalactopyranoside at an OD600nm of 0.8. Cells were harvested and resuspended in the lysis buffer (1 M NaCl, 20 mM potassium phosphate, pH 9.0), and then disrupted by sonication. After centrifugation, the His-tagged fusion proteins were isolated with Ni-NTA affinity column (Qiagen) and further purified by gel filtration (Superdex 75, GE Healthcare) in a buffer containing 1 M NaCl, 20 mM Tris-HCl, pH 9.0. The peak fractions containing the target protein were collected and then applied to the desalting column (Hiprep 26/10, GE Healthcare) in the buffer containing 7.5 mM MgCl2, 60 mM NaCl and 30 mM Tris-HCl, pH 7.9. The eluted proteins were pooled and frozen for further study.

The selenium-methionine (SeMet)-labelled BmSTPR protein was overexpressed in E. coli strain B834 (DE3). Transformed cells were grown at 37°C in SeMet medium (M9 medium supplemented with 25 μg/ml SeMet and other amino acids at 50 μg/ml) to an OD600nm of 0.8, and then induced with 0.2 mM isopropyl β-D-l-thiogalactopyranoside for another 4 h. The BmSTPR mutants were obtained with the Mut ExpressTM Fast Mutagenesis Kit using the plasmid encoding the wild-type BmSTPR as the template. SeMet substituted and mutant BmSTPR proteins were purified using the same protocol used for the native protein.

Single-stranded DNA (ssDNA) was synthesized by Sangon Biotech (Shanghai). The ssDNA was resuspended in the buffer containing 7.5 mM MgCl2, 60 mM NaCl and 30 mM Tris-HCl, pH 7.9, and then mixed with a complementary strand with equal molar amount. After heating at 95°C for about 6 min, the mixture was annealed by slow cooling to room temperature to prepare the double-stranded DNA.

Crystallization, data collection and processing

The protein–DNA complexes were obtained by incubation of BmSTPR with the DNA fragments at a molar ratio of 1:1.2 for 40 min on ice. Afterwards, the mixture was concentrated to ∼18 mg/ml for crystallization at 289 K. The optimized crystals of BmSTPR in complex with the 13-bp DNA (5′-tttacatagattc-3′) appeared in the solution containing 20% (v/v) 2-propanol, 17% (w/v) polyethylene glycol 4000 and 0.1 M sodium citrate tribasic dehydrate, pH 5.6. The crystals in complex with the 20-bp DNA (5′-agtatttacatagattcatc-3′) were obtained from the reservoir solution of 16% (v/v) glycerol, 22% (w/v) polyethylene glycol 3350, 0.2 M ammonium citrate tribasic, pH 7.0, whereas the crystals complexed with the 18-bp DNA (5′-catacatacatacataca-3′) were obtained from the solution containing 18% (w/v) polyethylene glycol 2000, 0.1 M sodium citrate tribasic dehydrate, pH 5.6.

The crystals were transferred to a cryoprotectant-containing glycerol and flash-cooled in liquid nitrogen. The diffraction data were collected at 100 K in a liquid nitrogen stream using beamline BL17U with a Q315rCCD (ADSC, MARresearch, Germany) at the Shanghai Synchrotron Radiation Facility. The data were indexed, integrated and scaled with the HKL2000 package (30).

Structure determination and refinement

Using a SeMet-substituted protein crystal, the structure of BmSTPR in complex with 13-bp DNA was determined by the single wavelength anomalous dispersion phasing method (31) with the program phenix.solve implemented in PHENIX (32). The initial model was built automatically with the program AutoBuild in PHENIX. The complete model of BmSTPR in complex with 13-bp DNA was built manually using the Coot program (33). The model was then refined with the Refmac5 program (34) and TLS refinement (35). Using the 13-bp DNA complexed structure as the search model, the other two complex structures were determined with molecular replacement and refined with the same procedure. The final models were evaluated with the programs MolProbity (36) and Procheck (37). Data collection and structure refinement statistics are listed in Supplementary Table S1. All structure figures were prepared using the program PyMOL (38).

Logo formation of the repetitive units favoured for BmSTPR

Sequence logos were generated with the seqLogo software (39), which is used for graphical representation of nucleic acids for displaying the patterns in a set of aligned sequences. We first used the context-independent algorithm, where the probability of the 4-bp repetitive unit x(1)x(2)x(3)x(4) is calculated by the formula Px(1)x(2)x(3)x(4) = Px(1)×Px(2)×Px(3)×Px(4) [x(n) is the nucleotide at the position n of the 4-bp unit]. The weight of each repetitive 4-bp unit of 135 possible combinations was given with the value equal to the relative folds of its binding affinity to that of 5′-(gcca)3-3′, which has the lowest binding affinity towards BmSTPR. The correlation analysis revealed a value of 0.658 with a P-value <2.2e-16. The related seqLogo graphic was shown as Supplementary Figure S3. Alternatively, a context-dependent model was generated with the first-order Markov chain algorithm, where the probability of each 4-bp unit is calculated with the formula Px(n+1) = Px(n)×Ptransition matrix [Ptransition matrix is the probability of the transition from base x(n) to x(n+1)]. The transition probability matrix was generated according to the Kd values of the 135 DNA sequences of three 4-bp repetitive units. The correlation analysis shows a much higher value of 0.824 with a P-value <2.2e-16. The related seqLogo graphic was shown as Figure 3.

Figure 3.

Figure 3.

The favoured 4-bp repetitive unit binding to BmSTPR. (A) A context-dependent consensus generated by the first-order Markov chains algorithm. (B) The correlation analysis of the context-dependent consensus.

Isothermal titration calorimetry (ITC)

Microcalorimetric titrations were performed at 25°C employing a MicroCal iTC200 instrument (GE Healthcare). Both samples of protein and DNA were dissolved in the buffer of 7.5 mM MgCl2, 60 mM NaCl and 30 mM Tris-HCl, pH 7.9, and then degassed before use. The sample cell was loaded with 200 μl DNA at 10 μM, whereas the injection syringe was loaded with 40 μl BmSTPR at 280 μM. The number and injected volume of the titration steps (0.4 μl+19×2 μl) were the same for all measurements, and the spacing between injections was set to 120 s. Additionally, heats of dilution, determined by titrating the proteins into solution buffer (7.5 mM MgCl2, 60 mM NaCl and 30 mM Tris-HCl, pH 7.9), were subtracted from the raw titration data. Analyses of all data were performed with MicroCal Origin software accompanying the ITC instrument.

RESULTS AND DISCUSSION

Overall structure of BmSTPR−DNA

In order to obtain a suitable DNA sequence for co-crystallization with BmSTPR, we first compared the binding affinities of three previously reported DNA sequences of 28 bp (21) and found that the fragment derived from +290 site of the fibroin gene displayed the highest affinity towards BmSTPR (Supplementary Table S2). Further screening of an optimum DNA length enabled us to focus on two DNA fragments of 13 bp (5′-tttacatagattc-3′) and 20 bp (5′-agtatttacatagattcatc-3′), respectively, which were applied to co-crystallization trials. Eventually, we succeeded in solving the crystal structures of BmSTPR complexed with the 13-bp DNA at 1.95 Å and the 20-bp DNA at 2.40 Å.

In the 13-bp DNA complexed structure, only the residues of repeats R2 to R4 (Arg127–Ser190) could be clearly traced in the electron density map (Figure 1C). The three tandem α-helices run along the DNA major groove, in a reverse direction of the fibroin gene coding strand (Figure 1C). Each STPR starts with a two-residue linker followed by a 21-residue helix (residues No. 3–23), with an inter-helix angle of ∼60° (Figure 1C). Similar to the previously reported solution structures of the four individual repeats (25), we also observed salt bridges between the side chains of two highly conserved Glu1 and Arg9 in the repeats R3 and R4, in addition to two hydrogen bonds between the backbone nitrogen of Gln5 and the two oxygen atoms of Thr2 (Figure 1C). These interactions have been proposed to stabilize the α-helical conformation of the N-terminal moiety of each repeat in the absence of DNA (25). In fact, substitution of Glu1 with Gln in any repeat could lead to the decrease of DNA-binding affinity towards BmSTPR (25).

Despite in the presence of an extended DNA sequence of 20 bp, the repeat R1 remains partially folded into a short α-helix of six residues (Supplementary Figure S1), indicating that there is no specific interaction between R1 and DNA sequence at +290 site of fibroin gene. As shown in Supplementary Figure S2, the truncated protein without R1 possesses a Kd value of 546.6 nM, comparable to that of the full-length BmSTPR with a Kd value of 118.8 nM towards the 20-bp DNA. It suggested that the latter three repeats are sufficient for BmSTPR to specifically recognize the regulatory element at +290 site of fibroin gene. Notably, we found that R2 and the partially folded R1 also form an inter-helix angle of about 60°, implying that the regular angle between two adjacent helices of BmSTPR is an induced fit upon binding to the consecutive DNA major groove.

The tandem binding pattern between BmSTPR and DNA

The complex structure of BmSTPR with 13-bp DNA that has a much higher resolution was applied to further structural analyses. The repeats R2 to R4 run along the major groove of 12-bp DNA from t1:a1′ to c12:g12′, with each repeat covering 4-bp DNA (Figure 2A). The three base pairs, c4:g4′, g8:c8′ and c12:g12′, facing the sharp turn of two adjacent helices have no interaction with the protein. The highly conserved residues Arg6, Arg9, Leu10, Tyr16 and Arg20 are involved in salt bridges or hydrogen bonds with the phosphate groups, whereas the side chains of the conserved residues Met13, Ser14, Ala17 and Leu21 in R2 and R3 recognize DNA via hydrophobic interactions (Figures 1B and 2). Binding assays also proved that this tandem interaction pattern of 4-bp DNA per α-helix covers the 12-bp DNA from t1 to c12 (Figure 2A), which has a significantly higher affinity compared to the 12-bp DNA from t0 to t11 (Supplementary Table S2).

Figure 2.

Figure 2.

The tandem interactions between BmSTPR and 13-bp DNA. (A) A diagram of BmSTPR interacting with DNA. Residues from R2 to R4 are coloured as their located repeat. Water molecules are donated as open circles labelled with the letter ‘W’. The contacted base groups are displayed as light orange and green, respectively. (B) Cartoon representation of the contacts between R2 to R4 and corresponding nucleotides. The involved nucleotides and residues are labelled and shown as sticks. Water molecules are shown as red spheres.

In detail, Arg6 and Arg9 in the repeats R2 to R4 form salt bridges with the phosphate group of a2′, a6′ and a10′ through their polar side chains, respectively (Figure 2). The main-chain oxygen atom of Leu10 in R3 (or R4) forms a hydrogen bond with the phosphate group of t7′ (or a11′) via a water molecule, whereas Leu10 in R2 shows no interaction with DNA (Figure 2B). Residues Met13, Ser14 and Ala17 of R2 and R3 constitute a hydrophobic pocket to accommodate the methyl group of t3′ and t7′, respectively. However, corresponding hydrophobic contacts are missing between R4 and a11′. The residue Leu21 in R2 (or R3) forms hydrophobic interaction with the methyl group of t5′ (or t9′). The side-chain hydroxyl group of Tyr16 in each repeat of R2 to R4 forms hydrogen bond with the phosphate group of t1, a5 or a9, in addition to hydrophobic contacts with the methyl group of t2, t6 or t10 (Figure 2B). The side chains of Arg20 in R2 and R3 form salt bridges with the phosphate groups of t2 and t6, respectively (Figure 2B). In contrast, the side chain of Arg20 in R4 points towards the DNA major groove and forms a hydrogen bond with the base group of t10 mediated by a water molecule (Figure 2B). To the best of our knowledge, this kind of tandem interaction pattern is unprecedented in previously identified protein–DNA structures.

The favoured 4-bp DNA repetitive units recognized by BmSTPR

The tandem interaction pattern strongly suggested that the highly conserved repeats of BmSTPR should be able to bind to the tandem repeats of 4-bp DNA. Accordingly, we synthesized the 12-bp DNA sequences of all 135 possible combinations that contain three 4-bp repetitive units, except for the sequence 5′-(gggg)3-3′ that could not be synthesized, and compared their binding affinity towards BmSTPR. Only nine DNA sequences show a lower Kd value compared to that of the physiologically identified 12-bp DNA (5′-ttacatagattc-3′) (Table 1 and Supplementary Table S3). These sequences are featured with a high A/T content, including six sequences with a 100% A/T repetitive unit (5′-atat-3′, 5′-aata-3′, 5′-attt-3′, 5′-ataa-3′, 5′-tata-3′ or 5′-taaa-3′) and three with 75% A/T (5′-atac-3′, 5′-tatc-3′ or 5′-atag-3′). Notably, the DNA 5′-(gcca)3-3′ possesses a lowest affinity (Kd of 44563.3 nM), which is about 1% to that of the 12-bp DNA at +290 site.

Table 1. The nine representative DNAs of high binding affinity towards BmSTPR.

DNA Sequence (5′→3′) n = 3 Kd (nM) n = 4 Kd (nM)
No.1 (atac)n 135.0 ± 7.8 107.5 ± 4.5
No.2 (atat)n 175.1 ± 7.3 97.4 ± 5.7
No.3 (aata)n 189.9 ± 12.6 57.6 ± 3.0
No.4 (tatc)n 216.6 ± 6.1 56.1 ± 3.4
No.5 (attt)n 232.8 ± 8.1 113.8 ± 4.7
No.6 (ataa)n 268.3 ± 25.0 155.7 ± 8.5
No.7 (atag)n 335.4 ± 17.9 199.7 ± 10.3
No.8 (tata)n 340.1 ± 19.6 133.9 ± 8.6
No.9 (taaa)n 412.8 ± 43.8 82.1 ± 8.0
+290 ttacatagattc 422.0 ± 38.8

Based on statistic analyses of these binding affinity data in combination with the first-order Markov chain algorithm, we generated a context-dependent consensus using the seqLogo program (39). The consensus is featured with an AT-rich content, with a correlation coefficient value of 0.824 at a P-value <2.2e-16 (Figure 3). In contrast, a context-independent consensus also possesses an AT-rich sequence, but exhibits a correlation coefficient value of 0.658 at a P-value <2.2e-16 (Supplementary Figure S3). A higher correlation coefficient value of context-dependent logo repeat further indicated that the indirect readout from the context of DNA sequence contributes the majority to binding BmSTPR.

Structure of the intact BmSTPR in complex with 5′-(atac)4-3′

The 20-bp DNA complexed structure suggested that a target DNA sequence might be able to induce the folding of an intact repeat R1. Using the repetitive units of top nine DNA sequences of highest affinity (Table 1), we synthesized nine sequences of 16-bp DNA composed of four tandem repeats. As expected, binding assays revealed an increased affinity towards BmSTPR for all of these 16-bp DNAs (Table 1). Furthermore, we crystalized BmSTPR in complex with a 18-bp DNA that contains four repetitive units of 5′-atac-3′ in addition to two protecting nucleotides at both termini, and solved its structure at 2.2 Å. Similar to the 13-bp DNA complexed structure, repeats R2 to R4 of BmSTPR wrapping the 18-bp DNA also adopt a 2-residue linker followed by a 21-residue helix (Figure 4A). Moreover, the repeat R1 is indeed folded into a similar helix that lies in the DNA major groove as the other three repeats (Figure 4A). As a result, the four tandem α-helices of BmSTPR wrap the 18-bp DNA along the major groove one after another, with an inter-helix angle of 54–63° (Figure 4A). It further suggested that the regular angle between two adjacent helices of BmSTPR is resulted from binding to the consecutive DNA major groove.

Figure 4.

Figure 4.

The structure of BmSTPR in complex with 18-bp DNA containing four repeats of 5′-atac-3′. (A) Cartoon representation of BmSTPR in complex with 18-bp DNA. The DNA strands and repeats of BmSTPR adopt the same colour coding as Figure 1C, in addition to R1 coloured in red. (B) Cartoon representation of the contacts between R1 and corresponding nucleotides in the 18-bp DNA complexed structure. The involved nucleotides and residues are labelled and shown as sticks. The water molecules are indicated as red spheres and marked with the letter ‘W’. (C) A diagram of the interactions between BmSTPR and 18-bp DNA.

Structure-based analysis demonstrated that each repeat, including R1, applies an almost identical pattern to wrap a 4-bp DNA, via both direct and indirect contacts (Figure 4B and C). The direct interactions include hydrophobic interactions with the methyl groups of three nucleotide bases t1′, t2 and t3′ of each unit (Figure 4C). For example, t1′ and t2 are separately stabilized by the side chains of Leu21 and Tyr16, whereas the methyl group of t3′ is accommodated in a hydrophobic pocket formed by Met13, Ser14 and Ala17 of each repeat (Figure 4C). To test the contribution of these direct interactions, we substituted the two central thymidylates (namely t2 and t3′ of each 4-bp repetitive unit) with uridylate, respectively. As shown in Supplementary Figure S4, substitution of t2 or t3′ to uridylate in each 4-bp repetitive unit led to a Kd value of 183.3 or 635.5 nM, which represents a slight decrease of binding affinity as compared to the original 16-bp DNA with a Kd value of 107.5 nM. In contrast, a single substitution of the central base A/T with a G/C that alters the major groove width resulted in a sharp decrease of BmSTPR binding affinity of 30–60-folds, as seen from the affinity comparison of three DNA sequences (No.1, No.57 and No.116, Supplementary Table S3). It indicated that the recognition of BmSTPR to DNA is a combination of direct and indirect interactions; however, the main contribution is from the indirect readout.

The DNA geometry in the three complex structures

It was reported that DNA bound to helical proteins in the major groove adopts a deformed B-DNA conformation, for example Beg-DNA (where eg stands for enlarged groove) (40). Using the 3DNA server (http://w3dna.rutgers.edu/) (41), we performed a DNA geometry analysis of our three DNA structures through nine major parameters (Table 2). Upon binding to BmSTPR via the major groove, the three DNA sequences share a structure of quite similar parameters to each other. However, compared to the canonical B-DNA (42), the different values in x-displacement, roll angle, inclination degree and groove width indicated that our three DNA structures adopt a deformed B-DNA conformation induced by BmSTPR binding (Table 2). Moreover, the three DNAs exhibit a different structure from the previously defined Beg-DNA (40), which also binds to helical proteins via the enlarged major groove. Compared to the two Beg-DNA representatives glucocorticoid-DNA (PDB code: 1R4O) and Zif268-DNA (PDB code: 1ZAA), BmSTPR-bound DNAs have a negative x-displacement and a negative inclination degree, indicating a distinct relative position between base pair and helical axis, in addition to an altered relative displacement, which corresponds to the spatial relationship between the base pairs and the phosphate backbone (Table 2). In addition, our three DNA structures display an average value of negative roll angle, different from that for either the canonical B-DNA or Beg-DNA (Table 2). All together, the three BmSTPR-bound DNAs adopt a unique deformed B-DNA conformation which is distinct from the previously defined Beg-DNA. Notably, compared to the 11.7-Å major groove width for canonical B-DNA, the three BmSTPR-bound DNAs share a rather wider major groove of 12.8, 13.4 and 13.2 Å in average, respectively (Table 2). In fact, the AT-rich sequences usually adopt a narrower minor groove (28,29), in consequence a wider major groove, as the widths of minor and major grooves are usually correlated to each other (43). Moreover, comparison of the key parameters of the BmSTPR-bound DNA structures with the free AT-rich DNA structures (Table 2 and Supplementary Table S4) revealed a significant induced fit upon binding to BmSTPR. Together, we propose that the high flexibility and intrinsically wider major groove of AT-rich DNAs contribute to the specific recognition towards BmSTPR.

Table 2. DNA parameters.

DNA segment Pitch Rp Rise Twist x-Disp Roll Incl Groove width (Å) D
(Å) (Å) (Å) (°) (Å) (°) (°) Minor Major (Å)
B-DNA 34.0 9.4 3.3–3.4 36.0 0.10 0.6 2.4 5.7 11.7 3.43
BmSTPR-13 bp 32.3 9.4±0.9 3.23±0.12 36.0±3.9 0.06±1.1 −2.1±3.7 −3.1±6.1 5.0±1.2 12.8±1.1 3.58
BmSTPR-18 bp 33.7 9.6±1.0 3.32±0.20 35.5±4.9 −0.34±1.2 −0.4±3.8 −0.6±6.1 5.0±0.8 13.4±1.1 3.58
BmSTPR-20 bp 33.8 9.7±0.8 3.32±0.12 35.3±4.6 −0.41±0.9 −1.1±3.4 −1.8±5.5 4.2±1.1 13.2±0.7 3.84
glucocorticoid-DNA 36.1 10.1±1.4 3.32±0.28 33.1±8.8 −1.57±1.2 4.5±3.1 8.1±5.7 7.7±0.2 12.9±1.1 2.05
Zif268-DNA 36.8 10.0±0.9 3.29±0.25 32.2±5.4 −1.57±1.0 4.5±2.7 8.0±4.9 7.6±0.2 11.7±1.5 1.66

The DNA sequences are: BmSTPR-13 bp, 5′-tttacatagattc-3′; BmSTPR-18 bp, 5′-catacatacatacataca-3′; BmSTPR-20 bp, 5′-agtatttacatagattcatc-3′; glucocorticoid-DNA, 5′-gatgttctg-3′; Zif268-DNA, 5′-gcgtgggcgt-3′. The parameters include the pitch, the radius of the best-fit cylinder through all the phosphates (Rp), the rise, the twist, the displacement (x-Disp), the roll, the inclination (Incl), the groove width (minor and major) and relative displacement (D). D is defined as the previous report (40).

STPR-containing proteins are widely spread in animals

Sequence homology search against the NCBI database (http://blast.ncbi.nih.nlm.gov) (44,45) yielded an output of 178 STPR-containing proteins of a sequence-identity higher than 37% with an E-value <80. Similar to BmSTPR, most STPR domains consist of four repeats. However, there are a few exceptions that possess three repeats or five to seven repeats. Interestingly, all STPR-containing proteins are mainly distributed in animals, except for one case from Physcomitrella patens which possesses five repeats. We aligned the STPR domains of proteins from the model organisms including human, Caenorhabditis elegans, Danio rerio and Drosophila melanogaster, in addition to P. patens. Each repeat is strictly composed of 23 residues and rich of basic residues (Figure 5), indicating its DNA-binding capacity. Moreover, each repeat harbours three highly conserved residues: Glu1, Arg9 and Thr/Ser2 (Figure 5), which contribute to stabilizing the α-helical conformation of the N-terminal moiety of each repeat. Thus, we propose that the STPR-containing proteins from other organisms might also be able to wrap the favoured DNA along the major groove in a somewhat similar pattern. However, these proteins are usually fused with various domains either at the N- and/or C-terminus, indicating their diverse physiological functions.

Figure 5.

Figure 5.

Multiple-sequence alignment of BmSTPR against its homologs with the programs Cobalt (46) and Espript (47). The secondary structural elements of BmSTPR are displayed at the top. The three conserved residues such as Glu1, Arg9 and Thr/Ser2 in each repeat are labelled with red stars. The STPR domains are from the following sequences (NCBI accession numbers in parentheses): B. mori FMBP-1 (NP_001036969.1), H. sapiens Zinc finger protein 821 isoform 2 (NP_060000.1), D. rerio predicted Zinc finger protein 821-like isoform X1 (XP_005169107.1), Drosophila-1 CG14440 isoform A (NP_572343.1), Drosophila-2 CG14442 isoform A (NP_572342.1), C. elegans protein C05D11.13 (NP_498414.1) and P. patens predicted protein (XP_001767050.1). All STPRs cover the four repeats from R1 to R4, except Drosophila-2 covers repeats R3–R6.

Supplementary Material

SUPPLEMENTARY DATA

Acknowledgments

We appreciate the help of the staff at the Shanghai Synchrotron Radiation Facility.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

The 973 project from the Ministry of Science and Technology of China [2012CB114601, 2012CB911002]; Chinese National Natural Science Foundation [31572318, 31272362]; Fundamental Research Funds for the Central Universities [WK2070000054]. Funding for open access charge: The 973 project from the Ministry of Science and Technology of China [2012CB114601, 2012CB911002]; Chinese National Natural Science Foundation [31572318, 31272362]; Fundamental Research Funds for the Central Universities [WK2070000054].

Conflict of interest statement. None declared.

REFERENCES

  • 1.Anderson J.E., Ptashne M., Harrison S.C. Structure of the repressor-operator complex of bacteriophage 434. Nature. 1987;326:846–852. doi: 10.1038/326846a0. [DOI] [PubMed] [Google Scholar]
  • 2.Matthews B.W. Protein-DNA interaction. No code for recognition. Nature. 1988;335:294–295. doi: 10.1038/335294a0. [DOI] [PubMed] [Google Scholar]
  • 3.Pabo C.O., Aggarwal A.K., Jordan S.R., Beamer L.J., Obeysekare U.R., Harrison S.C. Conserved residues make similar contacts in two repressor-operator complexes. Science. 1990;247:1210–1213. doi: 10.1126/science.2315694. [DOI] [PubMed] [Google Scholar]
  • 4.Pavletich N.P., Pabo C.O. Crystal structure of a five-finger GLI-DNA complex: new perspectives on zinc fingers. Science. 1993;261:1701–1707. doi: 10.1126/science.8378770. [DOI] [PubMed] [Google Scholar]
  • 5.Rohs R., Jin X., West S.M., Joshi R., Honig B., Mann R.S. Origins of specificity in protein-DNA recognition. Annu. Rev. Biochem. 2010;79:233–269. doi: 10.1146/annurev-biochem-060408-091030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Otwinowski Z., Schevitz R.W., Zhang R.G., Lawson C.L., Joachimiak A., Marmorstein R.Q., Luisi B.F., Sigler P.B. Crystal structure of trp repressor/operator complex at atomic resolution. Nature. 1988;335:321–329. doi: 10.1038/335321a0. [DOI] [PubMed] [Google Scholar]
  • 7.Seeman N.C., Rosenberg J.M., Rich A. Sequence-specific recognition of double helical nucleic acids by proteins. Proc. Natl. Acad. Sci. U.S.A. 1976;73:804–808. doi: 10.1073/pnas.73.3.804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Harrison S.C., Aggarwal A.K. DNA recognition by proteins with the helix-turn-helix motif. Annu. Rev. Biochem. 1990;59:933–969. doi: 10.1146/annurev.bi.59.070190.004441. [DOI] [PubMed] [Google Scholar]
  • 9.Bewley C.A., Gronenborn A.M., Clore G.M. Minor groove-binding architectural proteins: structure, function, and DNA recognition. Annu. Rev. Biophys. Biomol. Struct. 1998;27:105–131. doi: 10.1146/annurev.biophys.27.1.105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Nelson H.C., Finch J.T., Luisi B.F., Klug A. The structure of an oligo(dA).oligo(dT) tract and its biological implications. Nature. 1987;330:221–226. doi: 10.1038/330221a0. [DOI] [PubMed] [Google Scholar]
  • 11.Hizver J., Rozenberg H., Frolow F., Rabinovich D., Shakked Z. DNA bending by an adenine-thymine tract and its role in gene regulation. Proc. Natl. Acad. Sci. U.S.A. 2001;98:8490–8495. doi: 10.1073/pnas.151247298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Haran T.E., Mohanty U. The unique structure of A-tracts and intrinsic DNA bending. Q. Rev. Biophys. 2009;42:41–81. doi: 10.1017/S0033583509004752. [DOI] [PubMed] [Google Scholar]
  • 13.Kirsanov D.D., Zanegina O.N., Aksianov E.A., Spirin S.A., Karyagina A.S., Alexeevski A.V. NPIDB: nucleic acid-protein interaction database. Nucleic Acids Res. 2013;41:D517–D523. doi: 10.1093/nar/gks1199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zanegina O., Kirsanov D., Baulin E., Karyagina A., Alexeevski A., Spirin S. An updated version of NPIDB includes new classifications of DNA-protein complexes and their families. Nucleic Acids Res. 2016;44:D144–D153. doi: 10.1093/nar/gkv1339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Murzin A.G., Brenner S.E., Hubbard T., Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995;247:536–540. doi: 10.1006/jmbi.1995.0159. [DOI] [PubMed] [Google Scholar]
  • 16.Inoue S., Tanaka K., Arisaka F., Kimura S., Ohtomo K., Mizuno S. Silk fibroin of Bombyx mori is secreted, assembling a high molecular mass elementary unit consisting of H-chain, L-chain, and P25, with a 6:6:1 molar ratio. J. Biol. Chem. 2000;275:40517–40528. doi: 10.1074/jbc.M006897200. [DOI] [PubMed] [Google Scholar]
  • 17.Suzuki Y., Giza P.E. Accentuated expression of silk fibroin genes in vivo and in vitro. J. Mol. Biol. 1976;107:183–206. doi: 10.1016/s0022-2836(76)80001-0. [DOI] [PubMed] [Google Scholar]
  • 18.Hui C.C., Matsuno K., Suzuki Y. Fibroin gene promoter contains a cluster of homeodomain binding sites that interact with three silk gland factors. J. Mol. Biol. 1990;213:651–670. doi: 10.1016/S0022-2836(05)80253-0. [DOI] [PubMed] [Google Scholar]
  • 19.Suzuki T., Matsuno K., Takiya S., Ohno K., Ueno K., Suzuki Y. Purification and characterization of an enhancer-binding protein of the fibroin gene. I. Complete purification of fibroin factor 1. J. Biol. Chem. 1991;266:16935–16941. [PubMed] [Google Scholar]
  • 20.Suzuki T., Takiya S., Matsuno K., Ohno K., Ueno K., Suzuki Y. Purification and characterization of an enhancer-binding protein of the fibroin gene. II. Functional analyses of fibroin factor 1. J. Biol. Chem. 1991;266:16942–16947. [PubMed] [Google Scholar]
  • 21.Takiya S., Kokubo H., Suzuki Y. Transcriptional regulatory elements in the upstream and intron of the fibroin gene bind three specific factors POU-M1, Bm Fkh and FMBP-1. Biochem. J. 1997;321:645–653. doi: 10.1042/bj3210645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Suzuki Y., Tsuda M., Hirose S., Takiya S. Transcription signals and factors of the silk genes. Adv. Biophys. 1986;21:205–215. doi: 10.1016/0065-227x(86)90024-9. [DOI] [PubMed] [Google Scholar]
  • 23.Maekawa H., Suzuki Y. Repeated turn-off and turn-on of fibroin gene transcription during silk gland development of Bombyx mori. Dev. Biol. 1980;78:394–406. doi: 10.1016/0012-1606(80)90343-7. [DOI] [PubMed] [Google Scholar]
  • 24.Takiya S., Ishikawa T., Ohtsuka K., Nishita Y., Suzuki Y. Fibroin-modulator-binding protein-1 (FMBP-1) contains a novel DNA-binding domain, repeats of the score and three amino acid peptide (STP), conserved from Caenorhabditis elegans to humans. Nucleic Acids Res. 2005;33:786–795. doi: 10.1093/nar/gki228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Saito S., Aizawa T., Kawaguchi K., Yamaki T., Matsumoto D., Kamiya M., Kumaki Y., Mizuguchi M., Takiya S., Demura M., et al. Structural approach to a novel tandem repeat DNA-binding domain, STPR, by CD and NMR. Biochemistry. 2007;46:1703–1713. doi: 10.1021/bi061780c. [DOI] [PubMed] [Google Scholar]
  • 26.Saito S., Yokoyama T., Aizawa T., Kawaguchi K., Yamaki T., Matsumoto D., Kamijima T., Kamiya M., Kumaki Y., Mizuguchi M., et al. Structural properties of the DNA-bound form of a novel tandem repeat DNA-binding domain, STPR. Proteins. 2008;72:414–426. doi: 10.1002/prot.21939. [DOI] [PubMed] [Google Scholar]
  • 27.Takiya S., Saito S., Yokoyama T., Matsumoto D., Aizawa T., Kamiya M., Demura M., Kawano K. DNA-binding property of the novel DNA-binding domain STPR in FMBP-1 of the silkworm Bombyx mori. J. Biochem. 2009;146:103–111. doi: 10.1093/jb/mvp053. [DOI] [PubMed] [Google Scholar]
  • 28.Aymami J., Nunn C.M., Neidle S. DNA minor groove recognition of a non-self-complementary AT-rich sequence by a tris-benzimidazole ligand. Nucleic Acids Res. 1999;27:2691–2698. doi: 10.1093/nar/27.13.2691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gordon B.R., Li Y., Cote A., Weirauch M.T., Ding P., Hughes T.R., Navarre W.W., Xia B., Liu J. Structural basis for recognition of AT-rich DNA by unrelated xenogeneic silencing proteins. Proc. Natl. Acad. Sci. U.S.A. 2011;108:10690–10695. doi: 10.1073/pnas.1102544108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Otwinowski Z., Minor W. Processing of X-ray diffraction data collected in oscillation mode. Method Enzymol. 1997;276:307–326. doi: 10.1016/S0076-6879(97)76066-X. [DOI] [PubMed] [Google Scholar]
  • 31.Brodersen D.E., de La Fortelle E., Vonrhein C., Bricogne G., Nyborg J., Kjeldgaard M. Applications of single-wavelength anomalous dispersion at high and atomic resolution. Acta Crystallogr. D Biol. Crystallogr. 2000;56:431–441. doi: 10.1107/s0907444900000834. [DOI] [PubMed] [Google Scholar]
  • 32.Adams P.D., Grosse-Kunstleve R.W., Hung L.W., Ioerger T.R., McCoy A.J., Moriarty N.W., Read R.J., Sacchettini J.C., Sauter N.K., Terwilliger T.C. PHENIX: building new software for automated crystallographic structure determination. Acta Crystallogr. D Biol. Crystallogr. 2002;58:1948–1954. doi: 10.1107/s0907444902016657. [DOI] [PubMed] [Google Scholar]
  • 33.Emsley P., Cowtan K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D Biol. Crystallogr. 2004;60:2126–2132. doi: 10.1107/S0907444904019158. [DOI] [PubMed] [Google Scholar]
  • 34.Murshudov G.N., Vagin A.A., Dodson E.J. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr. D Biol. Crystallogr. 1997;53:240–255. doi: 10.1107/S0907444996012255. [DOI] [PubMed] [Google Scholar]
  • 35.Painter J., Merritt E.A. Optimal description of a protein structure in terms of multiple groups undergoing TLS motion. Acta Crystallogr. D Biol. Crystallogr. 2006;62:439–450. doi: 10.1107/S0907444906005270. [DOI] [PubMed] [Google Scholar]
  • 36.Davis I.W., Leaver-Fay A., Chen V.B., Block J.N., Kapral G.J., Wang X., Murray L.W., Arendall W.B. III, Snoeyink J., Richardson J.S. MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res. 2007;35:W375–W383. doi: 10.1093/nar/gkm216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Laskowski R.A., Macarthur M.W., Moss D.S., Thornton J.M. Procheck - a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 1993;26:283–291. [Google Scholar]
  • 38.DeLano W.L. The PyMOL Molecular Graphics System. San Carlos, CA: DeLano Scientific LLC; 2002. [Google Scholar]
  • 39.Schneider T.D., Stephens R.M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18:6097–6100. doi: 10.1093/nar/18.20.6097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Nekludova L., Pabo C.O. Distinctive DNA conformation with enlarged major groove is found in Zn-finger-DNA and other protein-DNA complexes. Proc. Natl. Acad. Sci. U.S.A. 1994;91:6948–6952. doi: 10.1073/pnas.91.15.6948. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Zheng G.H., Lu X.J., Olson W.K. Web 3DNA-a web server for the analysis, reconstruction, and visualization of three-dimensional nucleic-acid structures. Nucleic Acids Res. 2009;37:W240–W246. doi: 10.1093/nar/gkp358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Grasby J.A., Neidle S., Blackburn G.M., Gait M.J., Loakes D., Williams D.M., Egli M., Flavell M., Flavell A., Pyle A.M. DNA and RNA structure. In: Blackburn GM, Gait MJ, Loakes D, Willams DM, editors. Nucleic Acids in Chemistry and Biology. 3rd edn. Vol. 3. Cambridge: RSC Publishing; 2006. pp. 29–30. [Google Scholar]
  • 43.Boutonnet N., Hui X.W., Zakrzewska K. Looking into the grooves of DNA. Biopolymers. 1993;33:479–490. doi: 10.1002/bip.360330314. [DOI] [PubMed] [Google Scholar]
  • 44.Altschul S.F., Madden T.L., Schaffer A.A., Zhang J.H., Zhang Z., Miller W., Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Johnson M., Zaretskaya I., Raytselis Y., Merezhuk Y., McGinnis S., Madden T.L. NCBI BLAST: a better web interface. Nucleic Acids Res. 2008;36:W5–W9. doi: 10.1093/nar/gkn201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Papadopoulos J.S., Agarwala R. COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics. 2007;23:1073–1079. doi: 10.1093/bioinformatics/btm076. [DOI] [PubMed] [Google Scholar]
  • 47.Robert X., Gouet P. Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res. 2014;42:W320–W324. doi: 10.1093/nar/gku316. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SUPPLEMENTARY DATA

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES