Abstract
STAT proteins bind DNA as dimers and tetramers to control cellular development, differentiation, survival, and expansion. The tetramer binding sites are comprised of two dimer-binding sites repeated in tandem. The genome-wide distribution of the spacings between the dimer binding sites shows a distinctive, non-random pattern. Here, we report on estimating the feasibility of building possible molecular models of STAT5A tetramers bound to a DNA double helix with all possible spacings between the dimer binding sites. We found that the calculated feasibility estimates correlated well with the experimentally measured frequency of tetramer-binding sites. This suggests that the feasibility of forming the tetramer complex was a major factor in the evolution of this DNA sequence variation.
Introduction
STAT (Signal Transducer and Activator of Transcription) proteins are activated by interferons, cytokines, and growth factors as a critical cytoplasmic to nuclear signaling mechanism [1]. Classically, STAT proteins were shown to be activated by tyrosine phosphorylation, allowing their dimerization and nuclear translocation [2]. Subsequently, N-terminal domain (NTD) mediated dimerization of dimers to form tetramers was also demonstrated [3]. We previously showed that STAT5 tetramerization is critical for normal cytokine responses and immune function [4]. Moreover, STAT5 tetramers, but not dimers, were reported to be associated with leukemia development in mice [5]. Tetramerization of STAT1 is also vital for normal immune function [6]. Many specific genes have been shown to be activated by STAT protein tetramers, including those encoding IL-2 receptor α [7], interferon-γ [8], α2-macroglobulin [9, 10], and serine protease inhibitor 2.1 [11]. Together, these studies underscore the critical role of STAT tetramerization in vivo.
A STAT5A dimer binds to DNA by recognizing a γ-interferon-activated sequence (GAS) motif, with the consensus sequence TTCNNNGAA [12], through the two DNA binding domains (DBDs) of the dimer. A tetramer can form when two dimers bind to tandemly linked GAS and GAS-like motifs and their NTDs interact by dimerization [8, 13]. The spacing between the two GAS motifs varies and the genome-wide frequency of tetramer binding sites with different spacings is not random, with the gap length between the GAS motifs of ~11–12 base pairs (bps) the most frequent and secondary peaks at gap lengths of ~6–7 and 16–17 bps [4]. Since a GAS motif is 9 bps long, the center-to-center distance (CTCD) is 9 bps greater than the gap length. Accordingly, these peaks occur at multiples of ~5 bps in terms of CTCDs, approximately corresponding to the number of base pairs in a half helical turn of DNA and suggesting that the observed frequency pattern may relate to the likelihood of forming tetramers at different dimer separations. To test this hypothesis, here we report building STAT5A tetramer models on DNA, computing relative feasibility measures for forming them with all possible spacings between the two GAS motifs, and comparing the computed feasibility measures with experimentally measured frequencies of dimer spacings. We find that there is a high correlation between the two quantities.
Results
STAT proteins contain six domains [14–16] and connecting peptides (Fig 1). The “core” of the molecule is made of the four middle domains, lacking the N-terminal (NTD) and the transactivation (TAD) domains and all but three residues of the phospho-tyrosine containing segment (PTS). There is a 13-residue flexible linker between the NTD and the core. Tetramer models of the DNA-bound phosphorylated core at a particular DNA spacing were built before for STAT5A [12] and STAT1 [15]. We built tetramer models of STAT5A with all spacings between the dimer binding sites and included considerations of the N-terminal dimer formation. A flow chart of our model building process is given in Fig 2. We first built a DNA-bound, phosphorylated core dimer model using the crystal structures of STAT5A in its un-phosphorylated form (1y1u) [16] and of STAT3β in its phosphorylated dimer form interacting with DNA (1bg1) [14]. The core dimer was then duplicated and the duplicated dimer rotated and slid, one base pair at a time, along an ideal B-DNA with 10.5 bps per turn (see S1 Movie). The N-terminal domain dimer (NDD) that forms between two core dimers was built separately by homology modeling using the STAT4 NDD structure (1bgf) as a template. For each core dimer pair with particular DNA spacing, the NDD was then placed at many different locations and orientations and the feasibility was assessed as to whether the NDD could be connected to the two core dimers by means of the 13-residue linker on each side of the NDD.
We labeled the two monomers of one core dimer as A and D and the corresponding monomers in the duplicated dimer as A’ and D’. As the A’-D’ core dimer is slid along the DNA away from the A-D core dimer, it also rotates around the DNA axis and the distance between a pair of N-termini of moving and stationary dimers oscillates while it generally increases (Fig 3). Two core dimers that are 9 or less CTCDs apart had clashes between them. Also, two dimers could not be connected by an N-terminal domain dimer (NDD) when the CTCD was greater than 39 (Fig 3 and Methods). Therefore, we built models with CTCD values from 10 to 39.
There are two topologically distinct tetramers depending on which two monomers are connected by an NDD. All core tetramer models have a 2-fold symmetry axis, which runs perpendicular to the DNA axis and relates monomers A to D’ and D to A’ (Fig 4). An NDD can connect the symmetry-related monomers, which come near each other on the same side of DNA when the two core dimers are rotated approximately 180° from one another (‘staggered’ tetramer), at CTCD values of ~15 or 25 (green and blue lines in Fig 3; Fig 4A and 4B; and S2 Movie). Since an NDD is itself 2-fold symmetric, we placed these NDDs on the symmetry axis, one on each side of DNA (on-axis NDDs). Another type of tetramer results when an NDD connects monomers A and A’, in which case the symmetry-related NDD can connect D and D’. These monomers come near each other when the two core dimers are related essentially by translation along the DNA axis with no or relatively small rotation (‘eclipsed’ tetramer) at CTCD values of ~10, 20 or 30 (red line in Fig 3; Fig 4C and 4D; and S3 Movie). The NDDs in this case are away from the symmetry axis (off-axis NDDs).
For both types of tetramers, the separately built NDD was placed at many discrete points and orientations, and the probability of connecting each to a pair of core monomers was assessed (see Methods). The relative feasibility, F2, of forming a tetramer using one or two NDDs was calculated as the sum of these probabilities. We note that when only one NDD connects the two dimers, there are two other NTDs per tetramer, which are free and can be used to form a higher order oligomer; however, we do not consider higher oligomers in this paper. The relative feasibility, F4, of forming a tetramer using two NDDs, which may be required for the stable cooperative tetramer formation on DNA, is the product of the sums of the probabilities with NDDs on each side of DNA.
The computed feasibility measures are given in Table 1 and shown in Fig 5A (for F4) and Fig 5B (F2). Since NDDs at on-axis positions are constrained to lie on the symmetry axis whereas off-axis positions have no such constraint, there are many more possible positions for off-axis NDDs, which increases the feasibility values. Therefore, we scaled the on- and off-axis feasibility measures separately so that the sum best matched the experimental frequencies (see Methods). Both Fig 5A and 5B show five peaks, at CTCDs of 10–11, 15, 20–21, 25–26 and 31–32, similar to those seen in the plot of the experimentally measured frequencies (thin blue lines in Fig 5A and 5B). These peaks represent both ‘eclipsed’ (red peaks at CTCDs of around 10, 20, and 31) and ‘staggered’ (blue peaks at CTCDs of around 15 and 26) tetramers. The troughs are at or near the valleys in the experimental frequency distribution (e.g., at or near CTCD values of 13, 18, 23, and 28). The inter-dimer rotation angles in these tetramers are approximately ±90°. Note also that the computed feasibility measure is zero or close to zero for CTCD values larger than 32. This is close to the CTCD value of 31, beyond which the experimentally observed frequency of tetramer binding sites is also at the background level.
Table 1. Computed scaled feasibility measures using 10.5 bps/turn DNA.
CTCD | F2 | F4 | ||
---|---|---|---|---|
Off-axis | On-axis | Off-axis | On-axis | |
307.43* | 690.89* | 4269.3* | 89064.8* | |
10 | 34.86 | 0.01 | 13.72 | 0.00 |
11 | 38.95 | 0.00 | 17.13 | 0.00 |
12 | 24.00 | 0.06 | 6.51 | 0.00 |
13 | 0.45 | 4.18 | 0.00 | 0.30 |
14 | 0.03 | 23.90 | 0.00 | 24.58 |
15 | 0.00 | 35.13 | 0.00 | 57.38 |
16 | 0.00 | 26.88 | 0.00 | 33.67 |
17 | 0.01 | 6.95 | 0.00 | 0.01 |
18 | 2.95 | 0.62 | 0.10 | 0.00 |
19 | 35.32 | 0.25 | 14.09 | 0.00 |
20 | 98.38 | 0.09 | 109.30 | 0.00 |
21 | 98.76 | 0.01 | 110.14 | 0.00 |
22 | 6.75 | 0.52 | 0.51 | 0.00 |
23 | 5.33 | 2.53 | 0.32 | 0.03 |
24 | 0.06 | 33.20 | 0.00 | 4.75 |
25 | 0.00 | 48.52 | 0.00 | 31.52 |
26 | 0.00 | 47.83 | 0.00 | 42.72 |
27 | 0.00 | 36.12 | 0.00 | 17.15 |
28 | 0.14 | 27.24 | 0.00 | 1.20 |
29 | 1.17 | 1.04 | 0.02 | 0.00 |
30 | 13.32 | 0.06 | 2.00 | 0.00 |
31 | 21.75 | 0.00 | 5.34 | 0.00 |
32 | 19.90 | 0.00 | 4.47 | 0.00 |
33 | 1.47 | 0.00 | 0.02 | 0.00 |
34 | 0.01 | 0.15 | 0.00 | 0.00 |
35 | 0.00 | 1.08 | 0.00 | 0.00 |
36 | 0.00 | 2.47 | 0.00 | 0.00 |
37 | 0.00 | 1.98 | 0.00 | 0.00 |
38 | 0.00 | 0.44 | 0.00 | 0.00 |
39 | 0.00 | 0.01 | 0.00 | 0.00 |
* These are the scale factors used to put the computed values at the same scale as the experimental genomic frequencies.
There are some obvious differences. For example, the calculated values are zero or nearly zero at CTCD values around 13, 18, 23, and 29, where the experimental frequencies appear to be significant. Also, as noted above, there are clashes between the core dimers when the CTCD value is 9 (gap length 0), but the experimental data (Fig 4D of ref. [4]) show a small, non-zero frequency at this spacing. The main causes for these discrepancies are presumably the facts that our models are rigid and that the DNA maintains the ideal geometry. There are clashes between core dimers even at CTCDs of 10–12, but we ignored them because they are small in number (<10) and appear to be avoidable by local alterations of the conformation of flexible loops and/or by small bending and twisting of the DNA.
To see the effect of the twisting of DNA, we performed the same computations using a DNA with 10.0 bps per turn instead of 10.5 bps (Table 2). The calculated feasibility measures were similar for the two DNAs, but the peaks and troughs generally were shifted to the left with 10.0 bp DNA (Fig 6), so that the fit to the experimental frequency data was generally poorer than with 10.5 bp DNA (Fig 5). However, the position of the trough at a CTCD of 28 was nearly reproduced with 10.0 bp DNA. Thus, while the main features of the gap length dependence of the tetramer binding site frequencies can be understood without considering DNA and protein flexibility, such flexibility may have contributed to shape the finer features of the frequency distribution.
Table 2. Computed scaled feasibility measures using 10.0 bps/turn DNA.
CTCD | F2 | F4 | ||
---|---|---|---|---|
Off-axis | On-axis | Off-axis | On-axis | |
235.58* | 582.87* | 2634.2* | 59764.0* | |
10 | 25.80 | 0.00 | 7.90 | 0.00 |
11 | 27.74 | 0.00 | 9.13 | 0.00 |
12 | 3.60 | 1.27 | 0.15 | 0.07 |
13 | 0.02 | 11.31 | 0.00 | 4.99 |
14 | 0.00 | 36.85 | 0.00 | 36.79 |
15 | 0.00 | 19.08 | 0.00 | 15.25 |
16 | 0.00 | 10.93 | 0.00 | 0.83 |
17 | 1.56 | 1.02 | 0.03 | 0.00 |
18 | 20.31 | 0.10 | 4.90 | 0.00 |
19 | 78.89 | 0.01 | 73.85 | 0.00 |
20 | 86.30 | 0.02 | 88.38 | 0.00 |
21 | 23.85 | 0.31 | 6.75 | 0.00 |
22 | 0.35 | 2.67 | 0.00 | 0.04 |
23 | 0.02 | 39.92 | 0.00 | 6.49 |
24 | 0.00 | 37.47 | 0.00 | 24.30 |
25 | 0.00 | 42.83 | 0.00 | 49.93 |
26 | 0.01 | 14.97 | 0.00 | 5.04 |
27 | 0.35 | 6.63 | 0.00 | 0.19 |
28 | 5.48 | 0.47 | 0.36 | 0.00 |
29 | 22.15 | 0.03 | 5.82 | 0.00 |
30 | 25.13 | 0.00 | 7.50 | 0.00 |
31 | 12.29 | 0.00 | 1.79 | 0.00 |
32 | 0.30 | 0.13 | 0.00 | 0.00 |
33 | 0.00 | 0.98 | 0.00 | 0.00 |
34 | 0.00 | 3.21 | 0.00 | 0.00 |
35 | 0.00 | 4.05 | 0.00 | 0.01 |
36 | 0.00 | 1.46 | 0.00 | 0.00 |
37 | 0.00 | 0.16 | 0.00 | 0.00 |
38 | 0.00 | 0.00 | 0.00 | 0.00 |
39 | 0.00 | 0.00 | 0.00 | 0.00 |
* These are the scale factors used to put the computed values at the same scale as the experimental genomic frequencies.
Discussion
Figs 5 and 6 clearly show that the gap length dependence of the calculated feasibility measures shares critical features with that of the experimentally measured frequency of tetramer binding sites in the mouse genome, including the number and position of the five peaks and four valleys. Our models and calculations, therefore, suggest that the peaks at or near the CTCD values of 10, 20 and 30 correspond to eclipsed tetramers connected by one or two off-axis NDDs while the peaks at or near CTCD values of 15 and 26 correspond to staggered tetramers connected by on-axis NDD(s).
In general, the frequency with which a particular spacing occurs in a genome is an evolutionary feature that needs not be related to the ease with which a tetramer can be formed. For example, the frequency of a spacing could be random or dictated by the biological usefulness of the target genes so long as a tetramer can be formed at that spacing. However, our study indicates that the frequency is in fact dictated mainly by the feasibility of forming a tetramer.
This point is most clear from the low but non-zero frequency of the spacing at which the calculated feasibility measure is zero or nearly zero, e.g. at the CTCD value of 18 or 22. The non-zero frequency of tetramer binding sites with these spacings indicates that a tetramer complex can be formed on DNA sites with one of these spacings, presumably because of the flexibility of DNA and of surface loops of the protein. On the other hand, the fact that the frequency is low at these spacings indicates that the frequency is not random but related to the feasibility of forming the tetramer complex.
We note here that our feasibility measure is essentially a weighted number of different positions and orientations of the N-terminal domain dimer (NDD) at which it can connect a pair of protein core dimers through the two 13-mer peptide linkers. It is not a full measure of the stability of the tetramer complex because it does not include the energy of protein-DNA interaction, calculation of which is computationally demanding. We note also that some of the binding sites with significant F2 values may involve higher order oligomerization, which we did not consider in this study.
On the basis of the observation that mutations on the surface of NDD affect the stability of the tetramer complex, Staab et al.[18] recently suggested a model of the tetramer complex in which the two core dimers are arranged in an eclipsed state and connected by one on-axis NDD sitting close to the DNA. Our calculations show that a model of this type is possible (although possible positions of NDD are not in the ‘front’ but on the ‘backside’ of DNA in between the SH2 domains, see Fig 4C and 4D) but is associated with a low feasibility measure. At the CTCD of 21, for example, when the two core dimers are maximally eclipsed for the 10.5 bps/turn DNA, the F2 value (Table 1) for the on-axis NDD is non-zero but small; in most positions, the NDD either clashes with some parts of the core or is too far from the N-termini of the core coiled coil domains for the 13-mer peptide linker to connect (see S1 Tables). It is possible that this model, with NDD at a few barely possible positions, is stabilized energetically by an interaction with DNA, which our calculations do not include (see above). It is interesting that the possibility of this model is not needed in order to produce a high feasibility at the CTCD of 21 because there are many off-axis NDD positions that can connect a pair of core dimers at this separation.
The reason that the number of tetramer binding sites in a genome should correlate with the feasibility of forming a tetramer is not entirely clear. We are not aware of any previous work reporting such a correlation between a feature of DNA sequence evolution and a measure of ease with which a protein-DNA complex can be formed. It is possible that there are many more weaker tetramer binding sites on the genome and that the observed frequency represents only those sites that have appreciable binding at the physiological concentration of STAT.
By estimating a measure of the feasibility of the tetramer formation, our model can therefore predict the number of such sites in the genome relative to those with other spacings. Given the high homology of human vs. mouse STAT5A as well as the homology of STAT5A to other STAT proteins, including particularly STAT5B but also STAT3, the models we have built here are applicable to other STAT proteins as well.
Methods
STAT5A sequence and domain definitions
The Mus musculus STAT5A protein sequence (NCBI Accn# CAA88419.1, http://www.ncbi.nlm.nih.gov/) used in this study and its domain boundaries are shown in Fig 1. The N-terminal domain (NTD, residues 1–127, highlighted cyan) corresponds to the visible residues in the STAT4 NTD structure [19]. The domain boundaries for the coiled-coil domain (CCD, 138–331, yellow), DNA binding domain and linker domain (DBD+LD, 332–594, green), and SH2 domain (595–684, magenta) correspond to the four domains of STAT3β structure according to the alignment between STAT3β and STAT5A[14]. The transactivation domain (TAD, 712–793, un-highlighted) and the linker between SH2 and TAD, which we call the phospho-tyrosine containing segment (PTS, 685–711, un-highlighted), also correspond to those defined for STAT3[14]. The domain boundaries are similar according to the STAT1 and STAT5A structures [15, 16], except that the SH2 domain in the STAT5A structure was defined to include the phospho-tyrosine containing segment.
Most of the residues of the PTS are not visible in the crystal structure of STAT5A [16] and the solution conformation of the few visible residues (residues 685–690) is uncertain because the corresponding residues are missing in the structure of STAT3β [14]. Presumably, this region of the sequence assumes flexible structures. It is poorly conserved among different STAT proteins except for three residues including the phosphorylated tyrosine residue 694 [14, 17, 20]. No crystal structure is available for the TAD, although NMR structures of this domain from STAT1 and STAT2 proteins in complex with TAZ1 and TAZ2 domains of CREB-binding protein (CBP) are available [21]. The TAD sequences are not similar among different STAT proteins including STAT5A [20] and are probably intrinsically unstructured in the absence of the cognate co-factors. We therefore excluded both PTS and TAD in our models except for residues 694–696 (YVK in red font in Fig 1B), which include the phospho-tyrosine and two additional residues.
Construction of STAT5A core dimer models
Crystal structures for STAT5A in its un-phosphorylated form (1y1u) [16] and for STAT3β in its phosphorylated dimer form interacting with DNA (1bg1) [14] are available. Since STAT5A is homologous to STAT3β (32% identical by NCBI BLAST pairwise alignment), we built a model of phosphorylated, DNA bound STAT5A core dimer by using the STAT3β dimer-DNA structure (1bg1) and superimposing and separately replacing each of the core domains, CCD, DBD+LD, and SH2, by the corresponding domains in the structure of the un-phosphorylated STAT5A (1y1u). The resulting structure was not refined in any way, as we did not want to potentially deviate from the observed phosphorylated dimer-DNA complex structure. Residues 694–696 in the PTS of the STAT3β structure (1bg1) were retained in the core dimer model of STAT5A. The structure of the core dimer of STAT1 [15] is very similar to that of STAT3β, but we used STAT3β because of its greater sequence similarity to STAT5A.
Construction of STAT5A core tetramer models
The 3D-DART webserver (http://haddock.chem.uu.nl/dna/dna.php) was used to build an ideal 60-mer B-DNA with 10.5 or 10.0 bps per turn. Two copies of the DNA-bound core STAT5A dimer, built as above, were placed on this DNA by superimposing the two 18-mer DNAs of the dimers on the 60-mer DNA with a desired base pair separation between them. The 18-mer DNAs were then removed.
Construction of the N-terminal domain dimer (NDD)
The structure of the NDD is available for STAT4 (1bgf) [19]. Since the NTD of STAT5A is homologous to that of STAT4 (NCBI BLAST pairwise blast gives 34% identity), we built the NDD of STAT5A by homology modeling from the STAT4 NDD structure (1bgf) using swiss-pdb web server (http://swissmodel.expasy.org). The dimer interface used to build the dimer was the reinterpreted dimerization interface by Chen et al.[22].
An upper limit of the distance between the core dimers
An upper limit of the distance between the core dimers is reached when the two core dimers are so far apart that their N-terminal domains (NTDs) cannot come close enough to dimerize. We determined this upper limit using the condition
where Dnn is the distance between the N-terminal Cα atoms of the two CCDs that one potential NDD connects, Dcc is the distance between the Cα atoms of the two C-terminal residues (Asn 124) of the NDD, which is 55.8 Å in the model we built, and Dmax is the maximum possible end-to-end distance of the 13-mer peptide linker between NTD and CCD of a STAT5A monomer, which we took to be 40 Å (see below). Therefore, a condition for the upper limit is Dnn < 136 Å.
Distance between two N-terminal Cα atoms of two dimers of a core tetramer
There are four possible core monomer pairs that may be linked by two N-terminal domains in-between: A-A’, A-D’, D-A’, and D-D’. Two of these (A-A’ and D-D’) are equivalent by 2-fold symmetry. Assuming that the DNA does not bend and the core dimers are rigid, the distances, Dnn, between the N-terminal Cα atoms of each of these pairs can be calculated using the following formulas:
and
where letters in parenthesis after a distance indicate the core monomer pair, n is equal to the CTCD, z and α are the rise and the rotation angle per bp of the DNA double helix, and dz(AD), r, and β are geometrical properties of a core dimer as described below. For the B-DNA with 10.5 bps per turn that we used, z = 3.37 Å and α = 34.3°.
Let the cylindrical coordinates of the N-terminal Cα atoms of the monomers A and D be (z, r, φ) with subscripts A and D respectively, with the z-axis along the DNA axis. Then dz(AD) = zD–zA, r = rA = rD, and β = [180° - (φD- φA)]/2. For the core dimer model we built, dz(AD) = 13.64 Å, r = 49.79 Å, and β = 0.4°.
An upper limit of the spacing between the dimers
Fig 3 shows the calculated Dnn distances for CTCD values between 10 and 40. The A-A’ and D-D’ distances become larger than the upper limit of 136 Å, and therefore an eclipsed tetramer with an off-axis NDD will not form, when the CTCD value is larger than 34. The A-D’ distance is beyond the limit when the CTCD value is larger than 29 whereas the D-A’ distance reaches the limit at the CTCD value of 39. The experimental data show that the number of tetramer binding sites detected is very low when the CTCD value is larger than 31 (gap length of 22 in Fig 4D of ref. [4]), which suggests that tetramer formation with only one NDD connection is rare, although the little blips in frequency at CTCDs of 36 and 37 may be an indication that a tetramer with only one NDD, presumably on-axis connecting D and A’ monomers, can form.
End-to-end distance distribution of the linker between the NTD and CCD
There are ten residues (residues 128–137: SPAGVLVDAM) between the N-terminal and coiled-coil domains that are not visible in the STAT5A crystal structure [16]. We assume that this peptide fragment is flexible. The fact that the corresponding part is missing in the full STAT1 structure (1yvl) is consistent with such flexibility. The residues flanking these linker residues on the N-terminal side are the C-terminal residues of the NTD, the last three of which appear to be flexible in the NTD structure of STAT4, being outside of the last helix. We included these three residues as a part of the linker, so that the length of the linker was increased to 13 residues (residues 125 to 137 shown boxed in Fig 1B: NCSSPAGVLVDAM). We correspondingly shortened the NTD by three residues at the C-terminal end; whenever we refer to the C-terminal end of the NTD or the NDD, we refer to this shortened end.
Instead of constructing possible structures for the 13-residue linker, we computed the end-to-end distance distribution of all 13-residue peptides in a subset of all the protein structures in the Protein Data Bank (PDB), which consisted of the structures with resolution < 2.5 Å, R-factor < 1.0 that were deposited before 03/23/2012 (19,412 chains in total). Each 13-residue peptide was aligned to the 13-residue linker sequence and the Blosum62 score [23] was computed for the alignment. The histograms of the end-to-end distances of the 13-residue peptides with high Blosum62 scores are shown in Fig 7. The end-to-end distances range between 4 and 40Å, with a peak at ~19Å. The overall distribution was rather insensitive to the particular Blosum62 score cutoff value. The histogram of the peptides with Blosum62 score greater than 8 was converted to the probabilities of the end-to-end distances in 1Å bins. These probabilities were used as the probability of linking the NTD to the CCD with the given end-to-end distance.
Calculation of feasibility measures for tetramer formation with on-axis NDDs (Fig 4A and 4B)
An NDD was placed so that its 2-fold axis coincided with the 2-fold axis of the core tetramer-DNA complex. The NDD was then moved along the 2-fold axis on both sides of DNA at 10Å intervals. At each position, the NDD was flipped 180° around the vertical (perpendicular to the 2-fold) axis and both the flipped and un-flipped NDDs were rotated about the 2-fold axis at 30° intervals. At each position and orientation of an NDD, three numbers were calculated: the number of clashes between the NDD and the core tetramer and the two distances from the C-terminus of a monomer of the NDD to the N-termini of the two core monomers that are on the same side of DNA as the NDD. A “clash” is defined as an instance when a Cα to Cα non-bonded distance is less than 5Å or when a Cα atom of an NDD comes within van der Waals contact with a DNA atom, as defined by Chimera. The number of clashes and the two distances are reported in S1 Tables and S3 Tables for models with 10.5 and 10.0 bps/turn DNA, respectively. If a clash occurred, the probability p(x, ϕ) that a tetramer will form with an NDD at position x and orientation ϕ was set to zero. Otherwise, the probability was calculated as p(x, ϕ) = p12 + p22, where p1 and p2 are the probabilities that a 13-mer peptide will have the end-to-end distance equal to the two distances calculated, respectively. Note that p1 and p2 denote probabilities of different ways of linking an NTD to the core dimer and that each needs to be squared because two symmetric connections are made in each case. These probabilities are also reported in the S1 Tables and S3 Tables.
We take the sum
as a relative measure of the feasibility of forming a tetramer linked by at least one on-axis NDD, which connects two core monomers. Here, the first summation is over all positions and orientations of an NDD on one side of DNA, connecting the core monomers A and D’, and the second summation is for positions of an NDD on the other side of DNA, connecting D and A’. We take the product
as the relative feasibility measure for forming a tetramer with two connections involving both NDDs and all four core monomers. The computed values of the probability sums and of F2 and F4 are given in the “on-axis” columns of Table 3 for the 10.5 bps/turn DNA and of Table 4 for the 10.0 bps/turn DNA.
Table 3. Computed probability sums and raw feasibility measures using 10.5 bps/turn DNA.
CTCD | P*1000 | Unscaled F2*1000 | Unscaled F4*10000 | ||||
---|---|---|---|---|---|---|---|
Off-axis | On-axis | Off-axis | On-axis | Off-axis | On-axis | ||
P* | P+# | P-& | |||||
10 | 56.69 | 0.01 | 0.00 | 113.38 | 0.01 | 32.14 | 0.00 |
11 | 63.35 | 0.00 | 0.00 | 126.70 | 0.00 | 40.13 | 0.00 |
12 | 39.04 | 0.00 | 0.08 | 78.07 | 0.08 | 15.24 | 0.00 |
13 | 0.74 | 5.43 | 0.61 | 1.48 | 6.05 | 0.01 | 0.03 |
14 | 0.04 | 22.12 | 12.48 | 0.09 | 34.60 | 0.00 | 2.76 |
15 | 0.00 | 23.91 | 26.94 | 0.00 | 50.85 | 0.00 | 6.44 |
16 | 0.00 | 18.81 | 20.10 | 0.00 | 38.91 | 0.00 | 3.78 |
17 | 0.02 | 0.01 | 10.06 | 0.04 | 10.06 | 0.00 | 0.00 |
18 | 4.80 | 0.00 | 0.90 | 9.61 | 0.90 | 0.23 | 0.00 |
19 | 57.45 | 0.37 | 0.00 | 114.90 | 0.37 | 33.00 | 0.00 |
20 | 160.00 | 0.13 | 0.00 | 320.01 | 0.13 | 256.01 | 0.00 |
21 | 160.62 | 0.01 | 0.00 | 321.24 | 0.01 | 257.98 | 0.00 |
22 | 10.98 | 0.74 | 0.00 | 21.96 | 0.75 | 1.21 | 0.00 |
23 | 8.67 | 3.57 | 0.09 | 17.35 | 3.66 | 0.75 | 0.00 |
24 | 0.10 | 46.91 | 1.14 | 0.19 | 48.05 | 0.00 | 0.53 |
25 | 0.00 | 64.76 | 5.47 | 0.00 | 70.22 | 0.00 | 3.54 |
26 | 0.00 | 61.43 | 7.81 | 0.00 | 69.23 | 0.00 | 4.80 |
27 | 0.00 | 48.30 | 3.99 | 0.00 | 52.29 | 0.00 | 1.93 |
28 | 0.22 | 39.08 | 0.34 | 0.45 | 39.43 | 0.00 | 0.13 |
29 | 1.91 | 1.51 | 0.00 | 3.81 | 1.51 | 0.04 | 0.00 |
30 | 21.66 | 0.09 | 0.00 | 43.32 | 0.09 | 4.69 | 0.00 |
31 | 35.37 | 0.00 | 0.00 | 70.73 | 0.00 | 12.51 | 0.00 |
32 | 32.36 | 0.00 | 0.00 | 64.72 | 0.00 | 10.47 | 0.00 |
33 | 2.40 | 0.00 | 0.00 | 4.80 | 0.00 | 0.06 | 0.00 |
34 | 0.01 | 0.21 | 0.00 | 0.02 | 0.21 | 0.00 | 0.00 |
35 | 0.00 | 1.57 | 0.00 | 0.00 | 1.57 | 0.00 | 0.00 |
36 | 0.00 | 3.58 | 0.00 | 0.00 | 3.58 | 0.00 | 0.00 |
37 | 0.00 | 2.87 | 0.00 | 0.00 | 2.87 | 0.00 | 0.00 |
38 | 0.00 | 0.63 | 0.00 | 0.00 | 0.63 | 0.00 | 0.00 |
39 | 0.00 | 0.01 | 0.00 | 0.00 | 0.01 | 0.00 | 0.00 |
* ∑p(z,r,ϕ,θ)
# ∑x>0p(x,ϕ)
& ∑x<0p(x,ϕ)
Table 4. Computed probability sums and raw feasibility measures using 10.0 bps/turn DNA.
CTCD | P*1000 | Unscaled F2*1000 | Unscaled F4*10000 | ||||
---|---|---|---|---|---|---|---|
Off-axis | On-axis | Off-axis | On-axis | Off-axis | On-axis | ||
P* | P+# | P-& | |||||
10 | 54.75 | 0.00 | 0.00 | 109.51 | 0.00 | 29.98 | 0.00 |
11 | 58.87 | 0.00 | 0.00 | 117.74 | 0.00 | 34.66 | 0.00 |
12 | 7.63 | 1.26 | 0.91 | 15.27 | 2.17 | 0.58 | 0.01 |
13 | 0.05 | 12.95 | 6.45 | 0.10 | 19.40 | 0.00 | 0.84 |
14 | 0.00 | 51.21 | 12.02 | 0.00 | 63.23 | 0.00 | 6.16 |
15 | 0.00 | 12.82 | 19.91 | 0.00 | 32.73 | 0.00 | 2.55 |
16 | 0.00 | 0.78 | 17.98 | 0.00 | 18.76 | 0.00 | 0.14 |
17 | 3.30 | 0.00 | 1.74 | 6.61 | 1.74 | 0.11 | 0.00 |
18 | 43.11 | 0.17 | 0.00 | 86.23 | 0.17 | 18.59 | 0.00 |
19 | 167.44 | 0.02 | 0.00 | 334.87 | 0.02 | 280.35 | 0.00 |
20 | 183.17 | 0.03 | 0.00 | 366.35 | 0.03 | 335.52 | 0.00 |
21 | 50.63 | 0.50 | 0.04 | 101.26 | 0.53 | 25.63 | 0.00 |
22 | 0.75 | 4.45 | 0.13 | 1.49 | 4.58 | 0.01 | 0.01 |
23 | 0.05 | 66.86 | 1.62 | 0.10 | 68.49 | 0.00 | 1.09 |
24 | 0.00 | 57.17 | 7.11 | 0.00 | 64.28 | 0.00 | 4.07 |
25 | 0.00 | 59.41 | 14.06 | 0.00 | 73.47 | 0.00 | 8.35 |
26 | 0.01 | 21.82 | 3.86 | 0.03 | 25.68 | 0.00 | 0.84 |
27 | 0.75 | 11.09 | 0.28 | 1.49 | 11.38 | 0.01 | 0.03 |
28 | 11.64 | 0.81 | 0.00 | 23.27 | 0.81 | 1.35 | 0.00 |
29 | 47.01 | 0.05 | 0.00 | 94.02 | 0.05 | 22.10 | 0.00 |
30 | 53.35 | 0.00 | 0.00 | 106.69 | 0.00 | 28.46 | 0.00 |
31 | 26.09 | 0.00 | 0.00 | 52.17 | 0.00 | 6.81 | 0.00 |
32 | 0.64 | 0.21 | 0.00 | 1.28 | 0.21 | 0.00 | 0.00 |
33 | 0.00 | 1.69 | 0.00 | 0.00 | 1.69 | 0.00 | 0.00 |
34 | 0.00 | 5.50 | 0.01 | 0.00 | 5.51 | 0.00 | 0.00 |
35 | 0.00 | 6.94 | 0.02 | 0.00 | 6.96 | 0.00 | 0.00 |
36 | 0.00 | 2.51 | 0.00 | 0.00 | 2.51 | 0.00 | 0.00 |
37 | 0.00 | 0.27 | 0.00 | 0.00 | 0.27 | 0.00 | 0.00 |
38 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
39 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
* ∑p(z,r,ϕ,θ)
# ∑x>0p(x,ϕ)
& ∑x<0p(x,ϕ)
Calculation of feasibility measures for tetramer formation with off-axis NDDs (Fig 4C and 4D)
A lattice of grid points was set up using a cylindrical coordinate system. The axis of the cylinder was chosen as the line connecting the N-terminal ends of the monomers D and D’. The grid points had the coordinates (z, r, ϕ) where z, r, and ϕ are the distance along, radial distance from, and the azimuthal angle around the cylinder axis, respectively. The distances z and r were varied in 10 Å intervals and ϕ was varied differently depending on the value of r in such a manner that the grid points were separated roughly by 10 Å in all directions. At each grid point, the NDD was placed with the NDD line, defined as the line joining the two C-termini of the NDD monomers, parallel to the z-axis and then rotated about the NDD line at an interval of 60°. At each position and orientation of the NDD, again three numbers were calculated: (1) the number of clashes between the NDD and the core tetramer, (2) the distance from the N-terminus of monomer D to the nearer of the two C-termini of the NDD, and (3) the distance between the N-terminus of D’ to the other C-terminus of the NDD. These data are reported in S2 Tables and S4 Tables for models with 10.5 and 10.0 bps/turn DNA, respectively. If a clash occurred, the probability p(z, r, ϕ, θ) that a tetramer will form with an NDD at position (z, r, ϕ) and orientation θ was set to zero. Otherwise, the probability was calculated for the position and orientation as p(z, r, ϕ, θ) = p1. p2, where p1 and p2 are the probabilities that a 13-mer peptide will have the end-to-end distance equal to the two distances calculated, respectively. These probabilities are also reported in the S2 Tables and S4 Tables.
We then take the sum form
as the measure of the feasibility of forming a tetramer with at least one off-axis NDD, which connects two core monomers. Here, the summation is over all positions and orientations of an NDD connecting monomers D and D’. The probability is doubled because for every D-D’ connection, a symmetry-related A-A’ connection is also possible. We take the product form
as the feasibility measure for forming a tetramer with two connections involving both NDDs. The computed values of the probability sum and of F2 and F4 are given in the “off-axis” columns of Table 3 for the 10.5 bp/turn DNA and of Table 4 for the 10.0 bp/turn DNA.
Scaling on- and off-axis feasibility measures
The feasibility of forming a tetramer with a particular CTCD value is the weighted sum of those with NDD at the on- and off-axis positions. The weights were determined so as to minimize the difference between the weighted sum and the experimentally measured number of tetramer binding sites. Thus, we determined the weights wj, where j = 1 or 2 for the on- or off-axis feasibilities, respectively, which will minimize S = ΣI (ci−ni)2, where ci = Σj wjFji is the calculated feasibility measure for on- (j = 1) or off- (j = 2) axis position for the CTCD value i, and ni is the experimentally measured number of tetramer binding sites at the CTCD value i. The solution of the minimization problem is: w = a-1b, ajk = ΣiFjiFki, and bj = ΣiFjini. The computed w values are 690.9 and 307.4, respectively, for the on- and off-axis F2 and 89064.8 and 4269.3 for F4 using 10.5 bps/turn DNA. For the 10.0 bps/turn DNA, the corresponding numbers are 582.9 and 235.6 for F2 and 59764.0 and 2634.2 for F4. The residual root-mean-square deviation between the experimental frequency and the scaled F2 and F4 values were 14.1 and 16.9, respectively, for the 10.5 bps/turn and 23.8 and 27.8 for the 10.0 bps/turn DNA.
Supporting Information
Acknowledgments
We thank Dr. Hongjun Bai for computing the data that produced Fig 7. All models, except for the DNA and the NDD, were built using the Chimera software[24]. Chimera is developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from the National Institutes of Health (National Center for Research Resources grant 2P41RR001081, National Institute of General Medical Sciences grant 9P41GM103311). Chimera was also used for visualization and most of the calculations. This research was supported by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research and the Division of Intramural Research, National Heart, Lung, and Blood Institute.
Data Availability
All relevant data are within the paper and its Supporting Information files.
Funding Statement
This research was supported by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research and the Division of Intramural Research, National Heart, Lung, and Blood Institute. The Chimera software we used was developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from the National Institutes of Health (National Center for Research Resources grant 2P41RR001081, National Institute of General Medical Sciences grant 9P41GM103311). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Leonard WJ, O'Shea JJ. Jaks and STATs: biological implications. Annu Rev Immunol. 1998;16:293–322. Epub 1998/05/23. 10.1146/annurev.immunol.16.1.293. . [DOI] [PubMed] [Google Scholar]
- 2.Stark GR, Darnell JE Jr. The JAK-STAT pathway at twenty. Immunity. 2012;36(4):503–14. Epub 2012/04/24. 10.1016/j.immuni.2012.03.013. 10.1016/j.immuni.2012.03.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Timofeeva OA, Tarasova NI. Alternative ways of modulating JAK-STAT pathway: Looking beyond phosphorylation. JAKSTAT. 2012;1(4):274–84. Epub 2013/09/24. 10.4161/jkst.22313 2012JAKS0059R [pii]. 10.4161/jkst.22313 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lin JX, Li P, Liu D, Jin HT, He J, Ata Ur Rasheed M, et al. Critical Role of STAT5 transcription factor tetramerization for cytokine responses and normal immune function. Immunity. 2012;36(4):586–99. 10.1016/j.immuni.2012.02.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Moriggl R, Sexl V, Kenner L, Duntsch C, Stangl K, Gingras S, et al. Stat5 tetramer formation is associated with leukemogenesis. Cancer cell. 2005;7(1):87–99. . [DOI] [PubMed] [Google Scholar]
- 6.Begitt A, Droescher M, Meyer T, Schmid CD, Baker M, Antunes F, et al. STAT1-cooperative DNA binding distinguishes type 1 from type 2 interferon signaling. Nat Immunol. 2014;15(2):168–76. Epub 2014/01/15. 10.1038/ni.2794. 10.1038/ni.2794 [DOI] [PubMed] [Google Scholar]
- 7.John S, Vinkemeier U, Soldaini E, Darnell JE, Jr., Leonard WJ. The significance of tetramerization in promoter recruitment by Stat5. Mol Cell Biol. 1999;19(3):1910–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Xu X, Sun YL, Hoey T. Cooperative DNA binding and sequence-selective recognition conferred by the STAT amino-terminal domain. Science. 1996;273(5276):794–7. Epub 1996/08/09. . [DOI] [PubMed] [Google Scholar]
- 9.Dajee M, Kazansky AV, Raught B, Hocke GM, Fey GH, Richards JS. Prolactin induction of the alpha 2-Macroglobulin gene in rat ovarian granulosa cells: stat 5 activation and binding to the interleukin-6 response element. Mol Endocrinol. 1996;10(2):171–84. Epub 1996/02/01. 10.1210/mend.10.2.8825557. . [DOI] [PubMed] [Google Scholar]
- 10.Zhang X, Darnell JE Jr. Functional importance of Stat3 tetramerization in activation of the alpha 2-macroglobulin gene. The Journal of biological chemistry. 2001;276(36):33576–81. Epub 2001/07/05. 10.1074/jbc.M104978200 M104978200 [pii]. . [DOI] [PubMed] [Google Scholar]
- 11.Bergad PL, Shih HM, Towle HC, Schwarzenberg SJ, Berry SA. Growth hormone induction of hepatic serine protease inhibitor 2.1 transcription is mediated by a Stat5-related factor binding synergistically to two gamma-activated sites. The Journal of biological chemistry. 1995;270(42):24903–10. Epub 1995/10/20. . [DOI] [PubMed] [Google Scholar]
- 12.Soldaini E, John S, Moro S, Bollenbacher J, Schindler U, Leonard WJ. DNA binding site selection of dimeric and tetrameric Stat5 proteins reveals a large repertoire of divergent tetrameric Stat5a binding sites. Mol Cell Biol. 2000;20(1):389–401. Epub 1999/12/14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Vinkemeier U, Cohen SL, Moarefi I, Chait BT, Kuriyan J, Darnell JE Jr. DNA binding of in vitro activated Stat1 alpha, Stat1 beta and truncated Stat1: interaction between NH2-terminal domains stabilizes binding of two dimers to tandem DNA sites. Embo J. 1996;15(20):5616–26. Epub 1996/10/15. [PMC free article] [PubMed] [Google Scholar]
- 14.Becker S, Groner B, Muller CW. Three-dimensional structure of the Stat3beta homodimer bound to DNA. Nature. 1998;394(6689):145–51. Epub 1998/07/22. 10.1038/28101. . [DOI] [PubMed] [Google Scholar]
- 15.Chen X, Vinkemeier U, Zhao Y, Jeruzalmi D, Darnell JE Jr., Kuriyan J. Crystal structure of a tyrosine phosphorylated STAT-1 dimer bound to DNA. Cell. 1998;93(5):827–39. Epub 1998/06/18. S0092-8674(00)81443-9 [pii]. . [DOI] [PubMed] [Google Scholar]
- 16.Neculai D, Neculai AM, Verrier S, Straub K, Klumpp K, Pfitzner E, et al. Structure of the unphosphorylated STAT5a dimer. The Journal of biological chemistry. 2005;280(49):40782–7. . [DOI] [PubMed] [Google Scholar]
- 17.Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, et al. Data growth and its impact on the SCOP database: new developments. Nucleic acids research. 2008;36(Database issue):D419–25. Epub 2007/11/15. 10.1093/nar/gkm993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Staab J, Riebeling T, Koch V, Herrmann-Lingen C, Meyer T. The two interfaces of the STAT1 N-terminus exhibit opposite functions in IFNgamma-regulated gene expression. Mol Immunol. 2015;67(2 Pt B):596–606. 10.1016/j.molimm.2015.07.015. 10.1016/j.molimm.2015.07.015 [DOI] [PubMed] [Google Scholar]
- 19.Vinkemeier U, Moarefi I, Darnell JE Jr., Kuriyan J. Structure of the amino-terminal protein interaction domain of STAT-4. Science. 1998;279(5353):1048–52. Epub 1998/03/07. . [DOI] [PubMed] [Google Scholar]
- 20.Shen Y, Darnell JE Jr. Antiviral response in cells containing Stat1 with heterologous transactivation domains. Journal of virology. 2001;75(6):2627–33. . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wojciak JM, Martinez-Yamout MA, Dyson HJ, Wright PE. Structural basis for recruitment of CBP/p300 coactivators by STAT1 and STAT2 transactivation domains. Embo J. 2009;28(7):948–58. 10.1038/emboj.2009.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Chen X, Bhandari R, Vinkemeier U, Van Den Akker F, Darnell JE Jr., Kuriyan J. A reinterpretation of the dimerization interface of the N-terminal domains of STATs. Protein Sci. 2003;12(2):361–5. Epub 2003/01/23. 10.1110/ps.0218903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992;89(22):10915–9. . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, et al. UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem. 2004;25(13):1605–12. Epub 2004/07/21. 10.1002/jcc.20084. . [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All relevant data are within the paper and its Supporting Information files.