Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2022 Dec 21;18(12):e1010794. doi: 10.1371/journal.pcbi.1010794

Allosteric regulation in STAT3 interdomains is mediated by a rigid core: SH2 domain regulation by CCD in D170A variant

Tingting Zhao 1, Nischal Karki 1, Brian D Zoltowski 1, Devin A Matthews 1,*
Editor: James Gallo2
PMCID: PMC9815575  PMID: 36542668

Abstract

Signal Transducer and Activator of Transcription 3 (STAT3) plays a crucial role in cancer development and thus is a viable target for cancer treatment. STAT3 functions as a dimer mediated by phosphorylation of the SRC-homology 2 (SH2) domain, a key target for therapeutic drugs. While great efforts have been employed towards the development of compounds that directly target the SH2 domain, no compound has yet been approved by the FDA due to a lack of specificity and pharmacologic efficacy. Studies have shown that allosteric regulation of SH2 via the coiled-coil domain (CCD) is an alternative drug design strategy. Several CCD effectors have been shown to modulate SH2 binding and affinity, and at the time of writing at least one drug candidate has entered phase I clinical trials. However, the mechanism for SH2 regulation via CCD is poorly understood. Here, we investigate structural and dynamic features of STAT3 and compare the wild type to the reduced function variant D170A in order to delineate mechanistic differences and propose allosteric pathways. Molecular dynamics simulations were employed to explore conformational space of STAT3 and the variant, followed by structural, conformation, and dynamic analysis. The trajectories explored show distinctive conformational changes in the SH2 domain for the D170A variant, indicating long range allosteric effects. Multiple analyses provide evidence for long range communication pathways between the two STAT3 domains, which seem to be mediated by a rigid core which connects the CCD and SH2 domains via the linker domain (LD) and transmits conformational changes through a network of short-range interactions. The proposed allosteric mechanism provides new insight into the understanding of intramolecular signaling in STAT3 and potential pharmaceutical control of STAT3 specificity and activity.

Author summary

In all living organisms, the proliferation and survival of cells are regulated by various proteins. Signal Transducers and Activators of Transcription 3 (STAT3) protein is one of these important proteins. However, the abnormal regulation of these proteins will contribute to the proliferation of cancer. The constitutive activation of STAT3 has been linked to several types of solid tumors, leukemia, and lymphomas. Consequently, STAT3 proteins have been a key target for cancer therapy. SH2 (SRC-homology 2) domain is the key interaction site, great efforts have been made to target SH2 domain. However, specificity has been a major challenge in drug discovery. Research showing regulation of SH2 domain via CCD (coiled-coil domain) has opened a new path for drug discovery, but progress is challenged by poor understanding of the allosteric mechanism. Here, we show that CCD regulates SH2 conformation via a rigid backbone. The perturbations in CCD are transmitted through an α-helix to the rigid core that orchestrate the movement of CCD and LD (link domain), leading to structural changes in the SH2 domain. The present findings provide an allosteric mechanism with atomistic details underlying the regulation of CCD to SH2 domain in STAT3 protein. A detailed allosteric pathway allows informed drug design targeting CCD for desired downstream effect on SH2 domain and the overall STAT3 function.

Introduction

Proteins within the Signal Transducers and Activators of Transcription (STAT) family function as both signal transducers in the cytoplasm and transcription factors upon nuclear translocation. All members of STAT family consists of six domains (Fig 1A): amino-terminal domain (NTD), coiled-coil domain (CCD), DNA-binding domain (DBD), linker domain (LD), SRC-homology 2 domain (SH2), and transactivation domain (TAD) which is also named the C terminal domain [1]. STAT proteins are regulated by Janus Kinases (JAKs) where they play a crucial role in immune response, cell division, and apoptosis, as a gene expression regulatory arm of JAK-STAT signaling pathway [2]. However, each member is activated via different types of cytokines and have unique function in the pathway [35].

Fig 1. STAT3 structure.

Fig 1

1BG1 was used as the template; NTD is not shown. (A) STAT3 domain structure (Y705 is shown as spheres, D170 is shown as sticks). Vide infra for details of the initial structure.(B) Secondary structures are labeled according to the UniProt database (S1 Table) [6]: α helices are colored blue, β sheets are colored red, and unstructured regions (loops) are colored yellow (transverse view). The assigned secondary structures combine information from multiple x-ray crystal structures, thus there is some mismatch with the specific structures used in this work.

Constitutive activation of STAT3 has been shown to play a crucial role in cancer progression [7, 8]. STAT3 binds, via the SH2 domain, to cell-surface receptors upon activation and recruitment of receptor-associated kinases. Upon binding, the recruited kinases activate STAT3 through phosphorylation within the TAD (at Y705), followed by dissociation from the receptor to form homodimers through reciprocal interactions between the SH2 domain and the phosphotyrosine (pY705) residue. These activated homodimers are then translocated to the nucleus where the DBD binds to target genes, and TAD activates the expression of proteins crucial for cell growth and survival. In normal cells, the signaling pathway is well-regulated. However, the abnormal activation of this signaling pathway promotes the development of cancer: misregulation of STAT3 in cancer cells promotes pro-oncogenic inflammation and suppresses anti-tumor immunity [9].

The direct therapeutic inhibition of STAT3 is highly desirable but remains challenging as evident from the lack of FDA–approved drugs. Specifically, significant amounts of effort have been employed to develop molecules targeting the SH2 domain of STAT3 [1012]. The SH2 domain is a structurally conserved protein domain, which appears in many intracellular signal transducing proteins, offering a binding site for phosphorylated tyrosine residues. The SH2 domain in STAT3 contains two regions with specialized functions: the pY pocket, into which the phosphotyrosine of the target inserts, is the binding region, while residues of the pY+3 pocket interact with the three C-terminal residues of the phosphotyrosine in the target, forming a specificity–determining region [13, 14]. The inhibitors targeting the SH2 domain: phosphotyrosine motifs (pY-peptide) or phosphotyrosine-based peptidomimetic inhibitors which mimic the pTyr-Xaa-Yaa-Gln motif, have been previously investigated, as well as the associated binding mode [1517]. In the pY pocket, R609 is the principal binding partner, along with K591, S636 and S611 which directly interact with pY705. The relative conformation and position of these residues will have a direct effect on STAT3 binding activity. In the pY+3 pocket, V637 in β31 controls accessibility to this pocket, while Y657, Q644, Y640, and E638 facilitate the hydrogen bond interaction with its target, as well as I659, W623 and F621 assist in binding of target peptide by forming hydrophobic environment [17]. However, most of these compounds have yet to be explored in clinical studies or further development of these compounds was limited due to concerns with their relative lack of potency and selectivity [18].

Several studies [2022] have determined that effectors (small molecule and polypeptide) binding to CCD interfere with SH2 domain binding or preclude STAT3 nuclear translocation, which suggests CCD as a potential target for further drug design. Zhang et al. found that the coiled-coil domain is essential for STAT3 recruitment to the receptor: systematic deletion analysis of the N-domain and α helices of CCD, as well as mutagenesis of conserved residues (D170A) in the CCD of STAT3 were carried out and showed the diminishment of both pY-peptide binding and tyrosine phosphorylation [19]. Furthermore, the small molecule MM-206 was identified as an inhibitor of STAT3 phosphorylation, and Minus et al. surprisingly found the binding site was at α1 of CCD (around F174). In addition, the compound K116, found to bind to CCD by AlloFinder, was shown to be able to inhibit receptor binding, validated by mutagenesis and functional experiments [21]. Recently, a small polypeptide MS3–6 was found to bind to CCD, which caused significant helical tilt in CCD domain compared to the apo conformation, which further diminished DNA binding and nuclear translocation [22]. These observations are summarized in Table 1, and highlight CCD correlation to SH2 domain binding affinity as well as specificity.

Table 1. Summary of STAT3 effectors targeting CCD and their effect on pY-peptide and DNA binding activity.

a) [19], b) [20], c) [21], d) [22].

pY-Peptide Binding DNA Binding
Alteration/Effector Sequence / Activity(SPR-based assay) Phosphorylation Agent /Activity (western blot)
D170A mutationa) VVHSG(pY)RHQVPS / Inhibited EGF / Inhibited Not studied
MM-206b) LPVPE(pY)INQSVP / Inhibited IL-6 / Inhibited Inhibited
K116c) Ac-pYLPQTV-NH2 / Inhibited IL-6 induced / Inhibited Not studied
MS3–6d) GMPKS(pY)LPQTVR / Not inhibited IL-22 / Inhibited
IL-6 / Not inhibited
Diminished

The discovery of several and diverse inhibitory agents, which bind to the CCD rather than SH2, but regulate SH2 domain function, is a fascinating development. However, rational design of allosteric effectors requires a more detailed, mechanistic knowledge of how CCD binding affects SH2 structure and activity. There are no crystal structures with MM-206 or K116 binding with STAT3. Moreover, computational modeling of large fragments and optimizing the accurate binding configuration is challenging. D170A mutation, which has shown altered activity in SH2 domain, can thus be a useful avenue of investigation towards understanding the allosteric mechanism. This has the further advantage in that a single point mutation can easily be performed computationally without significantly altering the structural environment. We hypothesize that point mutation and effector binding result in a similar allosteric pathway outside of the CCD domain (where local interactions dominate). While we do not explicitly test this hypothesis here, unraveling the D170A allosteric mechanism will at least help to guide the search for allosteric mechanisms of alternate effectors.

In this work, we study the allosteric mechanism of D170A mutation on the inhibition of STAT3 activity, as predicted by the dynamic structures of the SH2 domain, known to be essential for pY-peptide binding and ensuing Y705 phosphorylation. Specifically, we investigate the structural properties within the SH2 domain upon CCD mutation over a number of molecular dynamics simulations, as well as the dynamical correlations between SH2 and CCD and the associated networks of allosteric residues and interactions within various structural motifs.

Results

Conformational differences in the pY+3 binding pocket correlate to the decreased binding affinity from wild type to D170A variant

The SH2 domain mediates binding of kinase-complexes to unphosphorylated STAT3, directing the phosphorylation of Y705 at TAD. Furthermore, the SH2 domain also provides the interface for dimerization of the phosphorylated TAD to form a functional pSTAT3 homo-dimer. Thus, any changes to the conformation of the specificity-determining region (pY+3 pocket, Fig 2D) as well as the binding region of phosphorylated TAD (pY pocket, Fig 2D) provides a key regulatory modification to STAT3 behavior. To describe the binding pocket conformations of SH2 domain, the pair residue center of mass (COM) distance matrix of key residues (Fig 2D, see Methods section for details) for both the pY and pY+3 pockets were calculated, separately. Principal component analysis (PCA) was employed to project the high dimensionality of pair residue COM distances into a 2D plane. The conformational space of both pockets for wild type and D170A variant is shown in Fig 2A (note that the principal components are determined from the combined wild type and D170A variant trajectories). The first two principal components contribute 62% of the total variation (S1(A) Fig), encapsulating the majority of the conformational space in just two dimensions.

Fig 2. Conformational analysis of SH2 domain binding pockets.

Fig 2

(A) PCA of both the pY and pY+3 pockets in the wild type (blue) and D170A mutant (orange). The contour lines show the density of recorded frames in each region, and the crystal structure (PDB ID: 6NUQ) is marked for reference. 6NUQ is STAT3 with a ligand bound to the SH2 domain, shown as a reference for the ligand binding conformation. (D) SH2 domain bound to SI109 (light pink) to demonstrate binding mode of ligand to the pY and pY+3 pocket (6NUQ [11]). The pY pocket (yellow) and pY+3 pocket (orange) are shown with key residues included in the COM pair distances are shown as sticks. (B,C) PCAs for wild type and D170A variant, respectively, colored by the COM distance from Q644 to Y657. (E,F) PCAs for wild type and D170A variant, respectively, colored by the COM distance from Q644 to E638. The distance between 644—657 and 638–644 can be used as an approximate representation of PC1 and PC2 (S1(C) Fig). The plot of these two distances distribution in the 12 simulations were shown in (S2 Fig). (G) The averaged structures for each macro-state with key residues show as sticks. (H) pY and pY+3 pockets PCA 2D plane colored by different macro-states. (I) Nested pie charts showing the degree to which each system (apo and D170A) occupy each macro-state. The outer circle is colored by macro-state as in (H), and the inner circle is colored by system.

In the combined PCA plot, the conformational space of the D170A variant partially overlaps with the wild type, while also exploring distinct novel conformations (Fig 2A). The additional conformational space explored by the D170A variant occurs predominately along the PC1 axis. The ten largest coefficients of the first two PCs highlight the pair of residues with largest relative motions across all the trajectories (S1(C) Fig). The motion of Y657 and Q644 yields the largest coefficients across PC1 and PC2 (Fig 2A), respectively, underlining their conformational importance. The Q644–Y657 pair has the largest variance along PC1(Fig 2B and 2C) while the E638–Q644 pair has the largest variance along PC2 (Fig 2E and 2F).

Comparing Fig 2B, 2C, 2E and 2F with Fig 2A, we see that the E638–Q644 pair has a slightly higher variance in the wild type whereas the Q644–Y657 motion, particularly above 15 Å, is predominately a feature of the D170A variant. Additionally, the D170A variant explores somewhat shorter Q644–Y657 distances than in the wild type. Most notably, all of the residues with highest variance occur in pY+3 pocket rather than the pY pocket. This result is consistent with observation by Zhang et. al., where the authors show that D170A reduces binding affinity of ligands targeting the SH2 domain. The binding mode analysis performed by Dhanik et. al. shows that the binding affinity of a ligand is directly correlated to additional interactions in the pY+3 pocket [15]. In biomolecular recognition, the two aspects binding affinity and binding specificity are coupled to each other, i.e. strong binding affinity is indicative of high substrate specificity and vice versa. Furthermore, strong coupling among flexibility and binding affinity has been shown in different systems [23, 24]. Thus, higher flexibility would lead to exploration of extended conformational landscape and reduced occupancy in substrate binding conformation. The changes in flexibility ofpY+3 pocket could potentially explain the reduction of binding affinity observed by Zhang et. al.

To further investigate key differences in the overall structure, a combination of Markov State Modeling and Perron Cluster Cluster Analysis was applied to cluster transient conformations into kinetically meta-stable macro-states. Clustering into five macro-states (0–4) was applied to the combined wild type and mutant conformations (Fig 2H).

Both the D170A variant and wild type are well-represented within macro-states 2 and 4, while macro-states 0, 1 and 3 were uniquely explored by the D170A variant (Fig 2I). The average structures of each macro-state were calculated and are shown in Fig 2G. Distinct conformations of Q644 and Y657 are observed for each of the macro-states, in agreement with the PCA data. The pY+3 pocket is blocked in macro-state 2 by Y657 and Y640, both of them pointing towards the pocket. Conversely, in macro-state 0, 1, and 3, Y657 points away from the pY+3 pocket leading to an open conformation. The shared conformational states between wild type and D170A variant consists of a viable functional state of STAT3, however, D170A variant has reduced occupancy at those conformational states (Fig 2I), thus leading to a differentiated function relative to the wild type.

Allosteric regulation of SH2 pY+3 occurs via translation of motion through a rigid core

To explore the correlation between SH2 domain conformational changes and CCD conformations, CCD was first characterized via PCA analysis of all pair Cα distances. However, as the CCD domain consists of rigid helices, pair Cα distances were not able to characterize the differences between the determined macro-states, as indicated by the low variance contribution of each pair Cα distance (S3(A), S3(C) and S3(D) Fig). Only macro-state 3 showed significant differences, which originate from kinked conformations of α1 (S3(B) Fig).

The rigidity of the α helices diminishes the utility of PCA withing the CCD alone to study intradomain conformational differences, however the communication between CCD and SH2 domains has been observed in the literature. To investigate this correlation, we hypothesize that the long helical arms of CCD may act as rigid levers, where any perturbation of the helical arm causes significant changes in inter-domain interaction sites. A role in the helical arm affecting the SH2 domain was in part motivated by prior studies demonstrating that monobodies targeting STAT3 interact at the helical arm and impact function through subtle bending and or rotation of helices in the CCD [22]. The argument for the subtle bending and or rotation of the helical arm impacting signal transduction is based on crystal structures where there are packing interactions that impact the helices. Thus, whether the movements of the helical arm are a result of the monobodies or crystal packing interactions remains an open point, and the MD simulation is of great value to follow up the crystallography studies. The hypothesis was tested by comparing corresponding helical tilt at the CCD of different macro-states identified in section ‘Conformational differences in the pY+3 binding pocket correlate to the decreased binding affinity from wild type to D170A variant for the pY and pY+3 pockets’. Differences in global helical tilt for each of the macro-states of the SH2 domain is observed in α3 (Fig 3F). Although a subtle rotation, these are consistent with the helical movements found in the CCD due to MS3–6 binding that impact biological function. Further, given that α3 helix interfaces with most of the other domains through its C-terminal helical turn, it lies in a central position that may be able to transmit slight movements within the helical arm through subtle motions of interacting residues between α3 and nearby domains. Thus, although the helical tilt is small, and may not be a primary signaling mechanism, it may reflect movements of key residues at the interface between α3 and nearby domains that propagate conformational information from the CCD to the SH2 domain.

Fig 3. Rigid core analysis showing interactions which are conserved across all macro-states.

Fig 3

The analysis includes all trajectories spanning macro-states M0–M4. (A) Distance distributions of the atom pair, that form hydrogen bonds, show similar distribution profiles. (B)Right: Inter-domain pair Cα distances (dashed lines) shows the intensity of coupling in the Cα pair motions between different domains. The pair Cα distances are colored from blue, green, yellow, and orange based on their deviation throughout the trajectory (Map in subplot). Left: A close-up of multi-domain interaction site shows the least first 200 deviating Cα pairs, demonstrating strongly correlated movements among CCD, LD and SH2 domain. (C) Conserved hydrogen bond network between CCD, LD, and DBD. (D) Conserved π-π interaction network between LD and DBD. (E) Conserved hydrogen bond network as well as hydrophobic interactions between LD and SH2. (F) Distribution of global helical tilt of α3 in CCD for each macro-state (M0–M4, see also supplemental S6 Fig). Statistical tests (T-test and Kolmogorov-Smirnov test) are shown in Table A in S1 Text

Conserved Cα pair distances

The rigidity transmission, where a perturbation of rigidity at one binding site can be allosterically transmitted to a second distant site, was observed in other proteins [25, 26]. Thus we further hypothesized that the motion of α3 is transmitted allosterically to SH2 via a “rigid core”, that is, an interlocking sequence of conserved interactions which function as a sort of molecular machine. The existence of such an interaction network is demonstrated in Fig 3B, where the inter-domain pair Cα distances (See Methods) are plotted and colored according to standard deviation values computed across all trajectories. There is a rigid backbone through the protein from CCD, LD, DBD, and finally to SH2 (Fig 3B) which is highly conserved during dynamical motion of the protein before and after mutation at D170. α3, α20, α21 compose the first section of the rigid core between CCD, DBD, and LD (Fig 3C), which could convey the dynamics of CCD into this highly rigid region. Upon close inspection, the three helices α3, α20, and α21 are locked via hydrogen bond network between I252, Q511 and W474 (Fig 3A and 3C), such that any rotation of these residues results in a corresponding reorientation of the helices to preserve the hydrogen bond network. We additionally find strongly conserved inter-domain interactions between the DBD and LD (Fig 3D), as well as between the LD and SH2 (Fig 3E), which complete the rigid core. Upon PCA analysis of residues of the rigid core, we see a minimal variance within the rigid core (S4(A) and S4(B) Fig). The PCA 2D plot shows two distinct conformational states, however, this distinction is primarily attributed to the pair C distances between residue 562 and residues from α3 helix (S4(C) Fig). Exclusion of the residue 562 from the PCA analysis of the rigid core shows an indistinguishable conformation landscape (S5(A) and S5(B) Fig). These factors allow subtle changes in the CCD configurations to convey movement from α3 through α21 (in LD) and α20 (in DBD) to α26 and α24 (in LD), and finally leading to allosteric modification of SH2 via β29.

The conformation analysis presented above demonstrates that D170A mutation leads to changes in the orientation of the CCD α helices, which could then lead to allosteric regulation of SH2 domain conformations though a network of conserved interactions. Rigid body analysis shows that these interactions consist of hydrogen bond, π-π, and hydrophobic networks that strongly correlate motions of different domains. However, this analysis does not highlight the allosteric path that differs between the wild type and D170A variant nor provide evidence of a dynamical correlation between CCD and SH2 conformation through this pathway. To further elucidate the allostery pathway and show dynamical correlation, we employ both a REDAN analysis and analysis of differences in the global hydrogen bond network between kinetic macro-states.

REDAN

While a rigid backbone provides a potential pathway for signal propagation, the cumulative long-range allosteric effect is realized through short range interactions and subtle allosteric changes which must occur in concert. To identify a detailed sequence of short range interactions, REDAN analysis was employed as a means to identify residue pairs that are responsive to allosteric perturbation, followed by shortest path analysis using Dijkstra’s algorithm. Using REDAN, subtle yet highly correlated differences in the allosteric network between D170A and the wild type can be resolved allowing us to propose a concrete signal transduction network from CCD to SH2.

From the conformational analysis, Y657 was identified as the residue with highest average difference between different macro-states explored by the SH2 domain. Thus, D/A170 was selected as the starting point and Y657 was selected as the end point for the analysis. The most structurally-relevant pathway from effector residue (D/A170) to regulatory site (Y657) was identified by REDAN and is shown in Fig 4A and 4B. The pathway originates from the CCD, through the LD and to the SH2 domain, bypassing DBD (although it passes nearby α20 which was identified as a component of the rigid core). The residue pairs that connect domains are of most interest, and their distance distributions are shown in S7(C)–S7(E) Fig The average structures of each macro-state were calculated to structurally verify correlated motions in secondary structure identified by the REDAN (Fig 4C).

Fig 4. Proposed allosteric path from D/A170 to Y657 obtained from REDAN analysis and its structural details.

Fig 4

(A) Residues involved in the allosteric path from CCD to LD and to SH2 domain. The raw path can be found in (S7 Fig). (B) A close-up view of signal transduction from CCD domain to SH2 domain. The residues identified by REDAN show a path through α3 in CCD to β22 and α26 in the LD domain and finally to α32 and α33 in the SH2 domain, bypassing DBD. (C) Average structures from all the macro-states show significant reorganization in this interface (colors as in Fig 2H). β sheet configuration (adjusted in PyMol) of macro-states 2, 3, and 4 are shown to highlight rearrangement of β22 based on Ramachandran Dihedral for β-sheets of residues 514 to 517.

The secondary structure designated β22 (Fig 1B) can be observed to dynamically shift between a β sheet and α helix in macro-states 2, 3, and 4, while in macro-state 0 and 1 the α helix is stabilized (S7(F) Fig). β22 extends the α21 helix and reorganizes the loop between β22-α23. It is worth noting that residues 514 to 517 are annotated as a β sheet in the UniProt database (S1 Table), while these residues form an α helix in the reference structure used in this study (PDB ID: 6TLC). This result is not necessarily incongruous with the database designation as multiple configurations of the secondary structure are observed experimentally. Stabilization of β22 as an extension of the α21 helix in macro-states 0 and 1 stabilizes extension and a shift of the α26 helix. The REDAN analysis suggests that further interaction between α26 and α32 then positions the α32-α33 loop such that α33 adopts an extended conformation without close contact to the α32-α33 loop. On the other hand, the structural conformation analysis above suggests that α26 may more indirectly affect α32 via hydrogen bonding and hydrophobic interaction with β29.

Hydrogen bond network

The formation and breaking of hydrogen bonds plays an important role in stability of secondary structures and conformational variability of tertiary structures of a protein. To complement structural insights and REDAN analysis, the differential rate (preponderance) of hydrogen bond occurrence between different macro-states was compared (Fig 5). Hydrogen bonds were identified using Baker-Hubbard hydrogen bonding analysis, and then the differential hydrogen bond rate was calculated between macro-states 0, 1, 2, and 3, and the wild type-dominant macro-state 4. Macro-state 4 was considered as the primary active state as the wild type and D170A share high occupancy rates at this state. Note that Baker-Hubbard hydrogen bonding analysis algorithm classifies salt-bridge formation between amine and carboxylic acid as a hydrogen bond as well, thus the formation and breaking of salt bridges were also considered in the analysis.

Fig 5. Hydrogen bond frequency analysis showing key differences between macro-states.

Fig 5

The difference in the frequency of pair residue hydrogen bonds for macro-states M0–M3 compared to macro-state M4. Blue connections indicate that a hydrogen bond is present in at least 50% higher rate, while red connections indicate at least 50% lower rate with respect to macro-state M4. Terminal residues are colored in yellow. While significant changes in hydrogen bonding are evident across the protein, macro-states M0, 1, and 3 clearly indicate large changes in the pY+3 pocket, as well as consistent hydrogen-bonding changes linking the SH2 and LD domains.

Consistent with the observation from REDAN analysis, β22 has large changes in the hydrogen bond between the native (macro-state 4) and allosteric (macro-states 0 and 1) configurations. In macro-states 0 and 1, hydrogen bonds formation in β22 stabilizes the alpha helix, while the other macro-states alternate between an α helix and a β sheet. Moreover, in macro-state 3, rearrangement causes new a hydrogen bond to form between β22 and α3. These changes highlight key differences between the wild type and D170A variant as only the D170A variant occupies macro-states 0, 1 and 3 (Fig 5). These macro-states also show decoupling of LD to DBD along with alteration of interactions between LD and SH2 (Fig 6). First, a salt bridge between D566 (LD) and R335 (DBD) is lost in macro-states 0 and 1 (Fig 6A and 6B). The loss of this interaction allows for the shift of α26 observed in the macro-states (Fig 4C). Second, a new hydrogen bond is formed between I576 (LD,α26–α27 loop) and N646 (SH2, α32) in macro-state 0 (Fig 6A) and a salt bridge is formed between D570 (LD, α26) and K642 (SH2, α32) in macro-state 3 (Fig 6C). Third, loss of hydrogen bonds of E652 and/or I653 (LD, α33) with S649 (LD, α32–α33 loop) in macro-states 0, 1, and 3 (Fig 6A–6C) contributes to increased flexibility of the α33 helix, also as observed in Fig 4C.

Fig 6. A close-up view of the hydrogen bond network changes between macro-states M0–M3 and macro-state M4 (see Fig 5 for details).

Fig 6

(A—C) Differential rate of pair residue hydrogen bonds in the LD and SH2 domains. (D—F) Differential rate of pair residue hydrogen bonds in the CCD and LD domains.

Focusing on the upper part of CCD (Fig 6D–6F), it is clear that α1 interacts directly with α2 and α4 through a large interaction surface. α2 and α3 form a contiguous helix interrupted by a kink in the helix at residue 278, thus perturbation of α2 is structurally coupled with α3. A salt bridge between E229 (α2) and R306 (α5) has a high appearance rate in macro-states 0, 1, and 3 compared to macro-state 4. This salt bridge provides a means to strongly correlate the motion of α2 to α5 and mitigate perturbations by α1. These interactions perturb α3 which are translated to the rigid core interface. The residues of α3 domain can thus be used to influence the rigid core to elicit interdomain response(S4 and S5 Figs). Moreover, I252 resides at the terminal end of α3 which forms strong hydrogen bond with Q511 (LD) identified in the rigid core analysis (Fig 3C). Q511 also forms a strong hydrogen bond with W474 (DBD). This network of hydrogen bonds allows a strong correlated motion between α3 in CCD and α20/α21 in the LD domain. I252 is also adjacent to E253 which was identified by REDAN analysis as part of the allosteric pathway thus providing a chemical basis for signal transduction from CCD to SH2 via the LD domain.

Discussion

Significant differences in conformational space of the SH2 domain binding pocket (pY+3) between the wild type and D170A mutant were observed. The D170A variant explores and extends conformational space of SH2 domain, specifically with significant changes in opening of the pY+3 pocket. The six independent trajectories obtained for each variant explore fairly distinct areas of conformational space (i.e. different macro-states as identified in the clustering analysis, see Fig 2H). The long transition timescale between these states validates our kinetic clustering model, but also necessitates a “global” view of the trajectories in order to explain the full conformational dynamics of the protein. Based on the coverage of the 2D PCA space of the SH2 domain (Fig 2), we find that the six trajectories obtained are sufficient to sample the dynamics relevant to ligand binding affinity. On the other hand, the long transition timescale necessarily leads to high variability between trajectories for certain computed properties such as RMSF (S8 Fig). While not an indication of insufficiency in the global conformational view obtained from combining all trajectories, this variability instead serves as a valuable additional measure of conformational differences between kinetic macro-states. For transparency, the plots of replicates analysis are available in supplement(S2 and S6 Figs). Since the present MD simulation cannot guarantee sampling of the full conformational space of the protein, other possible allosteric pathways cannot be excluded. The mechanism proposed here is drawn with caution and the results agree with experimental observations. Thus, this work presents a theoretical foundation upon which to draw inspiration for drug design as well as mechanistic studies on the protein.

The hydrophobic environment formed by I659, W623 and F621 in pY+3 were previously shown to assist in binding of target peptide. Changes in the hydrophobic environment by increasing either hydrophobicity or aromaticity leads to hyper-activation, while introduction of polarity and reduction of hydrophobicity and aromaticity diminishes STAT3 function [27]. Furthermore, studies suggest that the side-chain Y657 interaction is important for stabilizing the ligand-protein complex [1517]. Thus, the diminished binding affinity can be attributed to increased motions in the structures surrounding the pY+3 pocket in D170A. While in principle, translation of motion through the rigid core could affect the conformation of the primary pY binding pocket, only the pY+3 pocket shows differential motions in D170A compared to the wild type. Thus, our simulations support the conclusion that mutation of the D170 residue affects an inactivation of the protein via an allosteric mechanism resulting in conformational changes primarily in the specificity determining pY+3 pocket.

The observed differences in the SH2 conformational space allow us to further characterize the mechanism of this allosteric effect. The α3 helix of the CCD domain correlates functionally with SH2 conformations, as evidenced by strong correlated motions within the rigid core. Given its positioning, it likely communicates changes in the CCD to the LD. Using rigid body analysis, a potential pathway from CCD through LD, and finally to SH2 was identified, wherein conserved hydrogen bond networks and other strong interactions firmly link a series of secondary structures (primarily α3, α20, α21, and α24). While further analysis supports this proposed mechanism (vide infra), controlled mutagenesis of these key residues could also provide experimental evidence for the importance of these interactions to the D170A allosteric pathway. The lack of a significant increase (or decrease) in dynamic motion and overall flexibility in D170A compared to the wild type (as seen in the RMSF analysis) also supports a sequence of interactions between rigid bodies as the main allosteric mechanism.

The specific allosteric pathway was further elucidated via REDAN and differential hydrogen bonding analysis. These analyses both point to a very specific mechanism (structural details shown in S9 Fig): 1) stronger interaction between α5 and α2 causes a tilt in the α2/α3 helix, 2) α3 tilt interfere the interaction between β22/α23 and α26/α27. In macro-state 0 and 1, the interference breaks the salt bridge between D566 and R335 and caused α26 shift away β22. While in macro-state 3, α26 shift toward β22, which cause a steric clash between Lys 573 and β22. 3) in turn, the movement of α26 and α26/α27 causes breakage of the hydrogen bonds between α33 and the α32-α33 loop, 4) α33 extends significantly and alters the conformation of the pY+3 pocket. We also identified conserved interactions between α26 and β29 in SH2 which may provide further coupling.

This study independently confirms that alterations of the alpha-helical rigid core impacts the SH2 domain in the absence of crystal packing interactions. This supporting prior crystallographic observations indicating that small molecules, mutations, or monobodies can drive long-range allosteric changes to the SH2 domain through subtle alteration of rigid core. We have specifically avoided an elucidation of the allosteric pathway within the CCD domain (i.e. in the immediate vicinity of the mutation site). Although fully exploring comparisons to the allosteric pathways of other effectors (e.g. MS3–6, K116, MM-206) is beyond the scope of the present work, it seems highly likely that the observed structure of the interdomain interactions and the rigid core mechanisms should result in highly similar allosteric mechanisms within the LD, DBD, and SH2 domains. Conversely, the diverse nature of these effectors likely rules out any significant commonality in the initial few steps of the allosteric pathway. Additionally, we have observed that even fairly significant alterations of the CCD structure, such as the kinked conformation of α1 explored by the D170A variant, do not correlate with changes to SH2 structure except via the rigid core. Thus, we consider this feature of the CCD structure as the “trigger” for the overall allosteric pathway.

Previously, the role of α26 in allosteric communication has been identified: nuclear magnetic resonance (NMR) studies showed that mutation of I568F was able to induce a chemical shift perturbation in SH2, DBD, and CCD [28]. Besides, D566A, D570A and D570K mutants showed profound negative effects on transcription, and also unexpectedly tyrosine phosphorylation even before interleukin (IL) 6 induction [29, 30]. Additionally, previous study shows similar allostery pathways in STAT5, however rigid core was not explored [31]. STAT family of proteins are highly similar in primary, secondary and tertiary structures and the similarity in allosteric pathways of STAT3 and STAT5 can be used to posit that STAT family of protein have a rigid core that couples allostery between their domains.

These results present us with novel ways of regulating the CCD domain whereby ligand or peptide interaction with CCD can significantly alter the helical tilt of α3 which is transmitted to the SH2 domain via the identified allosteric pathway. It is unlikely that different effectors will have identical mechanisms within CCD, as the local perturbation caused by point mutation, small molecule binding, peptide binding, etc. is radically different. However, our analysis supports the conclusion that any perturbation resulting in a change of α3 tilt should result in a similar outcome due to the strong and highly concerted motions along the proposed allosteric pathway. Potentially, alteration of helical tilt could also result in tighter binding of target peptides to the SH2 domain while, as observed here, helix-helix interactions in CCD can also promote structural changes in SH2 domain leading to reduced affinity of SH2 binding. In short, the mechanism for signal transduction has been identified through analysis of dynamic correlated motions between CCD and SH2 domain, the exact outcome of this allosteric effect is not clear. Distinguishing between two potential mechanisms: changes in substrate specificity to form heterodimer required for phosphorylation of Y705 and reduction in homodimerization due to reduced affinity towards pSTAT3, needs to be investigated. Drugs designed to specifically alter the motions of the CCD helices would yield valuable insight towards validation of proposed mechanism. Since the pY+3 pocket regulates the affinity of peptide binding, rather than being the catalytic active site, assays developed to probe allosteric regulation of SH2 via CCD should consider the identity of peptides used to target the SH2 domain in addition to overall activity.

Finally, the methods used here to identify the rigid core mechanism, specifically a combination of structural and conformation analysis (dimensionality reduction, functional clustering, and conserved Cα pair distances) with dynamical and correlative analyses (REDAN and differential hydrogen bond analysis) should allow for identification of other potential effectors targeting CCD, and more widely, in identifying similar allosteric pathways in a number of (semi-)rigid proteins.

Methods

Initial structure

The monomer STAT3 with peptide MS3–6 complex (PDB ID: 6TLC) structure was used as the template for this study. In chain A, residue 372–381 and 418–428 in DBD domain were missing, here these residues were modeled using Chimera [32]. The apo structure was created by directly deleting the MS3–6 peptide, and then the apo type was subjected to mutation using the PyMol Mutagenesis Wizard [33] to generate the D170A variant. To show DNA binding (Fig 1), the 1BG1 structure was used which contains a single STAT3 bound to DNA. However, STAT3 binds to DNA in its dimeric state; the dimeric structure was generated in PyMol to show DNA binding of STAT3 dimer. Hydrogen atoms were added to the crystal structures using PyMol. The protonation states for the histidine residues were assigned using the H++ program [34].

Molecular dynamics simulation

For each system, a rectangular periodic water box with 84736 TIP3P waters was used with a minimum distance of 10 Å between the box boundary and the protein to avoid image interactions(S12 Fig). To balance charge and provide realistic salinity, 0.15M sodium and chloride ions were added. NAMD 2.13 [35] with the CHARMM 36 force field [36] was used for energy minimization and molecular dynamics (MD) simulations. Initially, the simulation systems were subjected to 5000 steps of energy minimization to remove bad contacts and clashes. Then, systems were heated from 0 K to 300 K, heating 50 K every 200 ps, and then from 300 K to 310 K in 200 ps, with 10 ns isothermal-isobaric ensemble (NPT) short equilibration. Subsequently, six replicas of 600 ns canonical ensemble (NVT) MD simulations at 310 K were conducted. The first 100 ns simulations were discarded as equilibration and the following 500 ns for each replica, 3μs in total, was used for further analysis. The SHAKE algorithm was applied to all bonds containing hydrogen atoms. The electrostatic interaction was evaluated by the particle-mesh Ewald method, and Lennard-Jones interactions were evaluated using 10 Å as a cutoff. The NPT simulations were performed using a Nosé-Hoover Langevin piston pressure. The NVT simulations were performed using the Langevin integrator. For the integrator, a friction coefficient of 1 ps-1 was implemented. A step size of 2 fs was used.

NVT was used for the main simulation due to its propensity for greater stability over longer timescales. To reconcile the change in system from NPT to NVT, total energy, kinetic energy, potential energy, temperature and pressure of the system were examined for consistency and smoothness. Both systems were well equilibrated (S10 Fig). Sufficient sampling of the MD simulation for both wild type and D170A variants was evaluated by the pair Root Mean Square Root (RMSD) (S11 Fig), which shows that the simulations have either reached a stationary shape or there is transition between different stationary shapes. Detailed RMSD and RMSF analysis can be found in the Supplemental Information (S1 Text).

Feature characterization

Various feature characterization relevant to trajectory analysis were conducted using two open-source packages: MDTraj 1.9.3.40 [37] and MDAnalysis [38, 39].

Pair residues center of mass distance

Pair residues center of mass distance was used to characterize the SH2 domain PY pocket and PY+3 pocket. First SH2 domain was extracted using atom_slice function from MDTraj; then the residue center of mass was calculated using the center_of_mass function from MDAnalysis, and finally the Euclidean distance of pair residues center of mass was calculated.

Inter-domain pair Cα distances

The inter-domain pair Cα distances were used to carry out the rigid core analysis. Take CCD-DBD domain pair Cα distances for example, first, the neighbors Cα atoms from DBD that are within 1 nm of CCD were found using compute_neighbors function from MDTraj, from which we can find the pair residues between CCD and DBD that are within interaction ranges. then the common pair residues between wild type and D170A variant were kept, and the pair ca distance were calculated using compute_distances from MDTraj.

Alpha helix global tilt

The α helix global tilt angle was used to characterize the geometry of helices according to the procedure of Sugeta and Miyazawa [40]. After alignment the trajectories to the crystal structure, the helanal.helanal_trajectory function from MDAnalysis software was used to characterize the alpha helices. [0, 0, 1] was used as the reference axis [41].

RMSD analysis

Root-mean-square deviation (RMSD) analysis shows the conformational dynamics over the trajectory and provides an insight into the variation within the conformational space of a reference structure. After alignment to the reference structure, the RMSD values were measured using MDAnalysis.rms function.

RMSF analysis

Root-mean-square fluctuation (RMSF) allows us to probe average positional changes of each residue. RMSF measures the average deviation of a particle (and individual residue) over time from a reference position (typically the time-averaged position of the particle). The trajectories were first superposed to the backbone of first frame of each trajectory, then the RMSF values for each trajectory were measured using mdtraj.rmsf using MDTraj. Finally, the mean RMSF values and the standard error of each residue over the six replicas were calculated and plotted.

Principal components analysis

Linear Principal Components Analysis (PCA [42]) is used to transform high dimensional and often linearly-dependent data points into a low-dimensional space spanned by uncorrelated principle components. The first two principal components (PCs) were used in this work, yielding a two-dimensional reduction of the original data. Given high dimensional data represented in n (sample size) by m (variable size) matrix, the covariance of any two variables X and Y was calculated by,

cov(Xiα,Xjβ)=NN-1(Xiα-Xiα)·(Xjβ-Xjβ),

where X is the α Cartesian component of the coordinate vector for atom i. The covariance matrix C is constructed as the pairwise covariance between all variables. The eigenvectors of C are the components of PCA, while the eigenvalues measure the contribution of each PC in the dataset. The eigenvectors also provide a mapping from the high-dimensional dataset to the low-dimensional PC space: the (PC1,PC2) coordinates for each input frame are given by multiplication of the original data set by the first two important eigenvectors. For a given PC, the importance of a feature (variable) is reflected by the absolute magnitude of the corresponding entry in the eigenvector. The PCA analysis was performed by Scikit-learn [43] implemented in Python.

Markov state modeling and Perron cluster cluster analysis

Markov state models (MSMs [44, 45]) have shown great utility in modeling the transitions among functional states. It was used in this study to cluster conformations into kinetically meaningful macro-states. The conformational space was first discretized into n micro-states. Here, agglomerative clustering [46] was applied to divide the sampled conformations into 300 micro-states in the two-dimensional PCA coordinate system. Then, Cij(τ), the number of observed transitions from micro-state i to micro-state j at a lag time τ is calculated. Then the transition probability, Pij, from micro-state i to micro-state j can be estimated as Pij ≈ (Cij + Cji)/∑k(Cik + Cki). According to estimated relaxation timescale (S1(B) Fig), the count matrix and MSM transition probabilities converge beyond τ = 5 ns, which was chosen as the lag time.

Finally, Perron Cluster Cluster analysis (PCCA), implemented in the PyEmma package [47], was be used to coarse grain micro-states to “kinetically relevant” macro-states based on the well-sampled micro-state transition matrix. Structures that interconvert frequently were assumed to belong to the same functional metastable state (macro-state)[48]. Five macro-states were determined based on the band gap in the estimated relaxation timescale plot.

Relative entropy-based dynamical allosteric network

The relative entropy-based dynamical allosteric network (REDAN) model [49] was used to quantitatively characterize protein allosteric effects upon mutation. The difference between the distributions of the pair alpha carbon (Cα) distances of two residues upon perturbation is quantified by the perturbation relative entropy (PRE), which is the average relative entropy,

PRE(P||Q)=DKL(P||Q)+DKL(Q||P)2,DKL(P||Q)=0p(x)lnp(x)q(x)dx,

where p(x) is the distribution density for system P (before perturbation), and q(x) is the distribution density for system Q (after perturbation). High perturbation relative entropy values indicate that substantial allosteric effects are implied by the significantly different distance distribution of the residue pair.

Then, a weighted graph can be built based on the PRE matrix. Each node is represented by a Cα atom, and two nodes will be connected by an edge if the longest possible distance between them is less than 10 Å. Each edge is weighted as 1/PRE. Since high PRE values indicate importance in the propagation of the structural changes in the protein, the pathway with the smallest overall weight implies the most structurally relevant route and hence a possible allosteric pathway. Dijkstra’s algorithm was used to identify the shortest pathway.

Supporting information

S1 Text. RMSD and RMSF analysis.

D170A mutation induces large structural but minor dynamical changes in STAT3.Table A. Statistical differences for the α3 global tilt among different macrostates (Fig 3F). Kolmogorov-Smirnov test and T-test were done using scipy.stats.ks_2samp and scipy.stats.ttest_ind function respectively.

(DOCX)

S1 Table. Secondary structure assigned by the UniProt database by compiling structure information from multiple x-ray crystal structures.

(TIF)

S1 Fig. Additional information of PCA and MSM analysis.

(A) PCA scree plot: dot shows the cumulative explained variance of the principal components; the bar chart represents the explained values per component. (B) Relaxation timescales of MSM for SH2 domain conformational space at different lag times. (C) The first ten features that contribute the first two PC the most. The absolute value of PC1 and PC2, and square root of sum of squared PC1 and PC2 values are shown here.

(TIF)

S2 Fig. Pair Cα distance distribution between Q644 and Y657, Q644 and E638 of 6 replicas for wild type and D170A variant.

(TIF)

S3 Fig. Characterization of CCD using pair Cα distances.

(A) PCA 2D plane of CCD pair Cα distances colored by macro-state from SH2 domain results. (B) Represent structure of CCD corresponding to Figure A. (C) The coefficients of first 100 features that contribute the first two PC the most.

(TIF)

S4 Fig. The PCA analysis of pair distances of rigid core (residues:240–252, 474, 479, 511, 546, 549, 550, 562,564, 568, 610, 611).

(A) PCA 2D plot colored by different macrostates; (B) PCA 2D plot colored by systems; (C) The first ten features that contribute the first two PC the most. The absolute value of PC1 and PC2, and square root of sum of squared PC1 and PC2 values are shown here.

(TIF)

S5 Fig. The PCA analysis of pair distances of rigid core without residue 562 (residues:240–252, 474, 479, 511, 546, 549, 550, 564, 568, 610, 611).

(A) PCA 2D plot colored by different macrostates; (B) PCA 2D plot colored by systems; (C) The first ten features that contribute the first two PC the most. The absolute value of PC1 and PC2, and square root of sum of squared PC1 and PC2 values are shown here.

(TIF)

S6 Fig. CCD α3 global tilt angle distribution.

(A,B) CCD α3 global tilt angle distribution of different macrostates, plotted separately for the wild type and D170 variant. Average helix tilt angle within each macro-state is illustrated by a vertical dashed line. (C,D) CCD α3 global tilt angle distribution of different replicas for wild type and D170A variant.

(TIF)

S7 Fig. Additional REDAN analysis results.

(A) Proposed pathway from 170 to 640 shown in the protein structure; (B) Proposed pathway from 170 to 644; (C,D,E) Key pair residue distance; (F) Ramachandran Dihedral for residue 513,514, 515 and 517; (G) Summary of proposed pathways from source residue 170 to target residue 640,644 and 657.

(TIF)

S8 Fig. The mean RMSF of six replicates was plotted, with the RMSF values for each residue separated by domain.

The wild type is plotted in blue and the D170A variant in orange, with error bar indicating the stand error among the six replicates. Structures of each domain colored by wild type RMSF values are shown (low RMSF values in white to high RMSF values in red).

(TIF)

S9 Fig. Proposed specific mechanism.

(A) Representative structure macro-state 0 (light cyan) compare with macro-state 4 (salmon); (B) Representative structure macro-state 3 (green) compare with macro-state 4 (salmon) (C-G) key residue pair CA distance or contact distance (closest heavy atom distance). ILE-252 to LYS-573 distance distribution and PHE-512 to LYS-573 distance distribution show α26 shift away β22 in macro-state 0 and 1, while toward in macro-state 3; PHE-512 to LEU577 and PHE-512 to SER-649 distance distribution show α26/α27 and α32/α33 loops move away β22 in macro-state 0, 1 and 3, while in macro-state 3, α32 moves close to α26 as indicated by ASP-570 to LYS-642 distance distribution.

(TIF)

S10 Fig. System parameters (pressure, temperature) and system energy (potential energy, kinetic energy and total energy) of wild system and D170A variant system, each system has six replicas: cp1, cp2, cp3, cp4, cp5 and cp6.

(TIF)

S11 Fig

First row: Root Mean Square Deviation (RMSD) of wild system and D170A variant system, each system has six replicas: cp1, cp2, cp3, cp4, cp5 and cp6. Note: rolling average of every 100 frames was plotted here for better visualization. Second row: The pair RMSD (frames every 1 ns were extracted and used for pair RMSD calculation) of both systems for each replica.

(TIF)

S12 Fig. Figure depicting the simulation system, generated by VMD.

Water is shown as lines, ions are shown as vdw, protein is shown as cartoon.

(TIF)

S13 Fig. Root Mean Square Deviation (RMSD) analysis.

(A, B) Cross-correlation (Pearson correlation) of RMSD values of each domain for wild type and D170A variant. Here RMSD values were calculated with the first frame of reference since the correlation of dynamic changes of each domain is of interested. The cross-correlation was done by merging the rmsd of 6 copies trajectory together, and then the Pearson correlation among different domains were calculated. While worth being noted, the correlation value does not suggest the functional correlation between domains, since RMSD is an overall measurement of conformational changes with regarding to the reference structure, distinctive conformations may have same RMSD value. (C) Violin plot of RMSD values for the whole protein (core full length protein, not including NTD) and each domain. Here crystal structure was using as reference since the conformational changes difference between wild type and D170A was of interested. (D) The RMSD distribution of SH2 domain in the six replicas for D170A variant.

(TIF)

Acknowledgments

TZ was supported by a fellowship from the Department of Chemistry of Southern Methodist University. All calculations were performed on the ManeFrame II supercomputing system at Southern Methodist University.

Data Availability

All relevant data are within the manuscript and its Supporting information files. Starting structures used for simulations, the raw data of the trajectories files and pymol files are avaliable at https://osf.io/dvzq7/.

Funding Statement

The authors acknowledge funding sources, including NSF research grant No. (1753167) https://www.nsf.gov/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Lim CP, Cao X. Structure, function, and regulation of STAT proteins. Molecular BioSystems. 2006;2(11):536. doi: 10.1039/b606246f [DOI] [PubMed] [Google Scholar]
  • 2. Aaronson DS, Horvath CM. A Road Map for Those Who Don't Know JAK-STAT. Science. 2002;296(5573):1653–1655. doi: 10.1126/science.1071545 [DOI] [PubMed] [Google Scholar]
  • 3. Avalle L, Pensa S, Regis G, Novelli F, Poli V. STAT1 and STAT3 in tumorigenesis. JAK-STAT. 2012;1(2):65–72. doi: 10.4161/jkst.20045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Rani A, Murphy JJ. STAT5 in Cancer and Immunity. Journal of Interferon & Cytokine Research. 2016;36(4):226–237. doi: 10.1089/jir.2015.0054 [DOI] [PubMed] [Google Scholar]
  • 5. Stritesky GL, Kaplan MH. Changing the STATus quo in T helper cells. Transcription. 2011;2(4):179–182. doi: 10.4161/trns.2.4.16614 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Bateman A, Martin MJ, Orchard S, Magrane M, Agivetova R, Ahmad S, et al. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research. 2020;49(D1):D480–D489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Sgrignani J, Garofalo M, Matkovic M, Merulla J, Catapano CV, Cavalli A. Structural Biology of STAT3 and Its Implications for Anticancer Therapies Development. International Journal of Molecular Sciences. 2018;19(6):1591. doi: 10.3390/ijms19061591 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Yu H, Jove R. The STATs of cancer — new molecular targets come of age. Nature Reviews Cancer. 2004;4(2):97–105. doi: 10.1038/nrc1275 [DOI] [PubMed] [Google Scholar]
  • 9. Yu H, Pardoll D, Jove R. STATs in cancer inflammation and immunity: a leading role for STAT3. Nature Reviews Cancer. 2009;9(11):798–809. doi: 10.1038/nrc2734 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Johnson DE, O'Keefe RA, Grandis JR. Targeting the IL-6/JAK/STAT3 signalling axis in cancer. Nature Reviews Clinical Oncology. 2018;15(4):234–248. doi: 10.1038/nrclinonc.2018.8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Bai L, Zhou H, Xu R, Zhao Y, Chinnaswamy K, McEachern D, et al. A Potent and Selective Small-Molecule Degrader of STAT3 Achieves Complete Tumor Regression In Vivo. Cancer Cell. 2019;36(5):498–511.e17. doi: 10.1016/j.ccell.2019.10.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Arshad S, Naveed M, Ullia M, Javed K, Butt A, Khawar M, et al. Targeting STAT-3 signaling pathway in cancer for development of novel drugs: Advancements and challenges. Genetics and Molecular Biology. 2020;43(1). doi: 10.1590/1678-4685-GMB-2018-0160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Bradshaw JM, Mitaxov V, Waksman G. Mutational investigation of the specificity determining region of the src SH2 domain 1 1Edited by J. A. Wells. Journal of Molecular Biology. 2000;299(2):523–537. doi: 10.1006/jmbi.2000.3765 [DOI] [PubMed] [Google Scholar]
  • 14. Haan S, Hemmann U, Hassiepen U, Schaper F, Schneider-Mergener J, Wollmer A, et al. Characterization and Binding Specificity of the Monomeric STAT3-SH2 Domain. Journal of Biological Chemistry. 1999;274(3):1342–1348. doi: 10.1074/jbc.274.3.1342 [DOI] [PubMed] [Google Scholar]
  • 15. Dhanik A, McMurray JS, Kavraki LE. Binding Modes of Peptidomimetics Designed to Inhibit STAT3. PLoS ONE. 2012;7(12):e51603. doi: 10.1371/journal.pone.0051603 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Mandal PK, Limbrick D, Coleman DR, Dyer GA, Ren Z, Birtwistle JS, et al. Conformationally Constrained Peptidomimetic Inhibitors of Signal Transducer and Activator of Transcription 3: Evaluation and Molecular Modeling. Journal of Medicinal Chemistry. 2009;52(8):2429–2442. doi: 10.1021/jm801491w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. McMurray JS. Structural basis for the binding of high affinity phosphopeptides to Stat3. Biopolymers. 2007;90(1):69–79. doi: 10.1002/bip.20901 [DOI] [PubMed] [Google Scholar]
  • 18. Gelain A, Mori M, Meneghetti F, Villa S. Signal Transducer and Activator of Transcription Protein 3 (STAT3): An Update on its Direct Inhibitors as Promising Anticancer Agents. Current Medicinal Chemistry. 2019;26(27):5165–5206. doi: 10.2174/0929867325666180719122729 [DOI] [PubMed] [Google Scholar]
  • 19. Zhang T, Kee WH, Seow KT, Fung W, Cao X. The Coiled-Coil Domain of Stat3 Is Essential for Its SH2 Domain-Mediated Receptor Binding and Subsequent Activation Induced by Epidermal Growth Factor and Interleukin-6. Molecular and Cellular Biology. 2000;20(19):7132–7139. doi: 10.1128/mcb.20.19.7132-7139.2000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Minus MB, Liu W, Vohidov F, Kasembeli MM, Long X, Krueger MJ, et al. Rhodium(II) Proximity-Labeling Identifies a Novel Target Site on STAT3 for Inhibitors with Potent Anti-Leukemia Activity. Angewandte Chemie International Edition. 2015;54(44):13085–13089. doi: 10.1002/anie.201506889 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Huang M, Song K, Liu X, Lu S, Shen Q, Wang R, et al. AlloFinder: a strategy for allosteric modulator discovery and allosterome analyses. Nucleic Acids Research. 2018;46(W1):W451–W458. doi: 10.1093/nar/gky374 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Sala GL, Michiels C, Kükenshöner T, Brandstoetter T, Maurer B, Koide A, et al. Selective inhibition of STAT3 signaling using monobodies targeting the coiled-coil and N-terminal domains. Nature Communications. 2020;11(1). doi: 10.1038/s41467-020-17920-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Chu X, Wang J. Specificity and Affinity Quantification of Flexible Recognition from Underlying Energy Landscape Topography. PLoS Computational Biology. 2014;10(8):e1003782. doi: 10.1371/journal.pcbi.1003782 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Thorpe IF, Brooks CL. Molecular evolution of affinity and flexibility in the immune system. Proceedings of the National Academy of Sciences. 2007;104(21):8821–8826. doi: 10.1073/pnas.0610064104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Sljoka A. Probing Allosteric Mechanism with Long-Range Rigidity Transmission Across Protein Networks. In: Methods in Molecular Biology. Springer US; 2020. p. 61–75. Available from: 10.1007/978-1-0716-1154-8_5. [DOI] [PubMed] [Google Scholar]
  • 26. Ye L, Neale C, Sljoka A, Lyda B, Pichugin D, Tsuchimura N, et al. Mechanistic insights into allosteric regulation of the A2A adenosine G protein-coupled receptor by physiological cations. Nature Communications. 2018;9(1). doi: 10.1038/s41467-018-03314-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. de Araujo ED, Orlova A, Neubauer HA, Bajusz D, Seo HS, Dhe-Paganon S, et al. Structural Implications of STAT3 and STAT5 SH2 Domain Mutations. Cancers. 2019;11(11):1757. doi: 10.3390/cancers11111757 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Namanja AT, Wang J, Buettner R, Colson L, Chen Y. Allosteric Communication across STAT3 Domains Associated with STAT3 Function and Disease-Causing Mutation. Journal of Molecular Biology. 2016;428(3):579–589. doi: 10.1016/j.jmb.2016.01.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Mertens C, Haripal B, Klinge S, Darnell JE. Mutations in the linker domain affect phospho-STAT3 function and suggest targets for interrupting STAT3 activity. Proceedings of the National Academy of Sciences. 2015;112(48):14811–14816. doi: 10.1073/pnas.1515876112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Yang E, Henriksen MA, Schaefer O, Zakharova N, Darnell JE. Dissociation Time from DNA Determines Transcriptional Function in a STAT1 Linker Mutant. Journal of Biological Chemistry. 2002;277(16):13455–13462. doi: 10.1074/jbc.M112038200 [DOI] [PubMed] [Google Scholar]
  • 31. Langenfeld F, Guarracino Y, Arock M, Trouvé A, Tchertanov L. How Intrinsic Molecular Dynamics Control Intramolecular Communication in Signal Transducers and Activators of Transcription Factor STAT5. PLOS ONE. 2015;10(12):e0145142. doi: 10.1371/journal.pone.0145142 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, et al. UCSF Chimera: A visualization system for exploratory research and analysis. Journal of Computational Chemistry. 2004;25(13):1605–1612. doi: 10.1002/jcc.20084 [DOI] [PubMed] [Google Scholar]
  • 33.Schrödinger, LLC. The PyMOL Molecular Graphics System, Version 1.8; 2015.
  • 34. Anandakrishnan R, Aguilar B, Onufriev AV. H++ 3.0: automating pK prediction and the preparation of biomolecular structures for atomistic molecular modeling and simulations. Nucleic Acids Research. 2012;40(W1):W537–W541. doi: 10.1093/nar/gks375 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Phillips JC, Hardy DJ, Maia JDC, Stone JE, Ribeiro JV, Bernardi RC, et al. Scalable molecular dynamics on CPU and GPU architectures with NAMD. The Journal of Chemical Physics. 2020;153(4):044130. doi: 10.1063/5.0014475 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Best RB, Zhu X, Shim J, Lopes PEM, Mittal J, Feig M, et al. Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting Improved Sampling of the Backbone ϕ, ψ and Side-Chain χ1 and χ2 Dihedral Angles. Journal of Chemical Theory and Computation. 2012;8(9):3257–3273. doi: 10.1021/ct300400x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. McGibbon RT, Beauchamp KA, Harrigan MP, Klein C, Swails JM, Hernández CX, et al. MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophysical Journal. 2015;109(8):1528–1532. doi: 10.1016/j.bpj.2015.08.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Michaud-Agrawal N, Denning EJ, Woolf TB, Beckstein O. MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. Journal of Computational Chemistry. 2011;32(10):2319–2327. doi: 10.1002/jcc.21787 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Gowers R, Linke M, Barnoud J, Reddy T, Melo M, Seyler S, et al. MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations. In: Proceedings of the Python in Science Conference. SciPy; 2016. Available from: 10.25080/majora-629e541a-00e. [DOI]
  • 40. Sugeta H, Miyazawa T. General method for calculating helical parameters of polymer chains from bond lengths, bond angles, and internal-rotation angles. Biopolymers. 1967;5(7):673–679. doi: 10.1002/bip.1967.360050708 [DOI] [Google Scholar]
  • 41. Bansal M, Kumart S, Velavan R. HELANAL: A Program to Characterize Helix Geometry in Proteins. Journal of Biomolecular Structure and Dynamics. 2000;17(5):811–819. doi: 10.1080/07391102.2000.10506570 [DOI] [PubMed] [Google Scholar]
  • 42. Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometrics and Intelligent Laboratory Systems. 1987;2(1-3):37–52. doi: 10.1016/0169-7439(87)80084-9 [DOI] [Google Scholar]
  • 43. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]
  • 44. Husic BE, Pande VS. Markov State Models: From an Art to a Science. Journal of the American Chemical Society. 2018;140(7):2386–2396. doi: 10.1021/jacs.7b12191 [DOI] [PubMed] [Google Scholar]
  • 45. Bowman GR, Bolin ER, Hart KM, Maguire BC, Marqusee S. Discovery of multiple hidden allosteric sites by combining Markov state models and experiments. Proceedings of the National Academy of Sciences. 2015;112(9):2734–2739. doi: 10.1073/pnas.1417811112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Zepeda-Mendoza ML, Resendis-Antonio O. Hierarchical Agglomerative Clustering. In: Encyclopedia of Systems Biology. Springer New York; 2013. p. 886–887. Available from: 10.1007/978-1-4419-9863-7_1371. [DOI] [Google Scholar]
  • 47. Scherer MK, Trendelkamp-Schroer B, Paul F, Pérez-Hernández G, Hoffmann M, Plattner N, et al. PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models. Journal of Chemical Theory and Computation. 2015;11(11):5525–5542. doi: 10.1021/acs.jctc.5b00743 [DOI] [PubMed] [Google Scholar]
  • 48. Pande VS, Beauchamp K, Bowman GR. Everything you wanted to know about Markov State Models but were afraid to ask. Methods. 2010;52(1):99–105. doi: 10.1016/j.ymeth.2010.06.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Zhou H, Tao P. REDAN: relative entropy-based dynamical allosteric network model. Molecular Physics. 2018;117(9-12):1334–1343. doi: 10.1080/00268976.2018.1543904 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010794.r001

Decision Letter 0

Nir Ben-Tal, James Gallo

13 Jul 2022

Dear Ms. Zhao,

Thank you very much for submitting your manuscript "Allosteric Regulation in STAT3 Interdomains is Mediated by a Rigid Core: SH2 Domain Regulation by CCD in D170A Variant" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

James Gallo

Associate Editor

PLOS Computational Biology

Nir Ben-Tal

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Overall:

While I do not feel qualified to comment on the importance of understanding STAT3, or the current knowledge regarding this protein, I am qualified to evaluate the molecular dynamics simulations and analysis. I found this manuscript to be engaging, and I believe that this study has high scientific rigor. The authors were very diligent and careful in their simulations and analyses.

My primary criticisms, as you will find detailed below, primarily deal with the clarity of the take home message. I think there are things that should be removed from the paper because they confound the story. Also, some effort should be put into clarifying figures and text.

Significant Suggestions:

While this is a significant modification, it will be an easy suggestion to implement. In the Methods section, there is no need to give equations or elaborate on details that are taken care of by the software. For example, for RMSD, just tell the reader what the RMSD measures in a sentence or two, and then tell them the software (MDtraj) and the command(s) you ran (presumably mdtraj.rmsd). If there are cases where you actually generated values and then plugged them into formulas, then tell us, but if it’s baked in the software then why bother.

An overall suggestion that encompasses several points below is to ask the following question for each figure/subsection: Does this support and clarify the take home message by adding a new point, or is it unnecessary? If it is unnecessary, then don’t include it in the manuscript. I think in some cases (e.g., perhaps RMSD, RMSF), these analyses led you to the more advanced (e.g., MSM). However, if they don’t add anything to the story, then it would be better to include them in a supplement than the main text. Otherwise, it confounds the main story and readers are left trying to understand what they are supposed to learn from a figure. A specific example of this is the RMSD analysis in the paper. I certainly do understand why you would have done this analysis as part of the study, but it doesn’t appear to add any value to the story that isn’t more clearly illustrated later. Especially given the point you make yourself in the caption, that different conformations can have the same RSMD. I would suggest removing this, or moving it to a supplement.

In fig 3 you chose to use the deepness of the color to represent standard deviation. I would suggest two modifications to this choice. (1) Use error bars rather than color depth. This allows the reader to easily assess the significance of the data points where there is a difference between the mutant and wt. (2) Show standard error rather than standard deviation (i.e., stddev/sqrt(6)). Again, this better allows us to evaluate the significance of the differences. The standard error quantifies the precision of your RMSF values rather than the spread of the RMSF values in individual runs. Also, some more minor points regarding fig 3. Please show the location of the mutation site. The whole point of the figure is comparing wt and mutant, yet I can’t tell where the mutation occurs from your figure. In the caption I assume “stand deviation” is meant to read “standard deviation”. Also, some font sizes are too small. Increase fonts sizes for axis labels, domain/site labels and the key in the first panel. Finally, I find the domain labeling to be confusing. What does \\alpha1-\\alpha2 mean? Why not just put the \\alpha1 label within the dashed lines that represent the boundaries for that sub-domain?

In fig 4 there are some hard to read fonts. In particular, the labels in panel D, G are hard to read. The yellow and orange pY and pY+3 labels could perhaps be outlined in black to enhance readability. All labels in these panels should use larger (and perhaps bold?) fonts. Also, it is not clear why you compared to 6nuq here when you were using a different PDB structure up to this point.

I found fig 5 to be very confusing. On page 13 you state “The existence of such an interaction network is demonstrated in Figure 5B, where the inter-domain pair Cα distances are plotted and colored according to standard deviation values computed across all trajectories.” I think you mean 5A here, but it still doesn’t make sense to me. For each plot you show 5 curves (M0-M4) and they appear to be color coded by this, not the standard deviation. M0-M4 is not defined anywhere. Reading ahead it looks like you might be referring to the macrostate number here? Anyway, the figure caption and manuscript text need significant clarification. A couple more minor points, I would suggest changing your language regarding “hydrogen bonds distance”. I think what you mean to say is distances between atoms that form hydrogen bonds. The reason for this change is that your distributions reach well past what most people would consider to still be a hydrogen bond, i.e., the bonds appear to be breaking sometimes. Finally, there are again some very small fonts. In panel B, there is a sub-plot that is quite literally unreadable at any zoom level. As before, consider both font sizes and colors.

The analyses presented in fig 5 and fig 7 seem to have a lot of overlap. While fig 5 includes some rigid core analysis, the hydrogen bonding part has a similar take home message (but is less clear to me) as fig 7. It is not clear why both are needed. I’m not saying you should eliminate fig 5, but perhaps consider removing the parts/panels that are redundant.

It is mentioned that MSM analysis was performed. Fig 4 shows some clustering results, but there is no mention of the transition probabilities in the main text. Does this analysis not add anything to the story?

Minor points:

*Page 2: “amino-terminal domain ()”, should there be something in the parentheses?

*Fig 1 caption: NTD not defined, perhaps this ties into the first minor point?

*Fig 1: Consider showing the location of the mutation site D170 that you are studying. I ended up figuring it out from going to the PDB because I wanted to know when looking at this figure.

*Page 6: “RMSD stabilized around 4”. Without looking at the figure I would assume angstroms, but you need units here.

*Fig 6: Residue labels in panel A are hard to read. Making them bold might help, but part of the issue is that some are very crowded and some are obscured by the dark blue line.

Reviewer #2: In the paper “Allosteric Regulation in STAT3 Interdomains is Mediated by a Rigid Core: SH2 Domain Regulation by CCD in D170A Variant”, Davin Matthews and co-workers analyze by means of Molecular Dynamics simulations the effect of the D170A changing on the dynamics of the STAT3 protein. In the paper, they put in evidence that the two variants of the protein populate a different conformational space. In particular, a different conformational state of the helices in the Coiled-Coil domain populated in the D170A variant affects the dynamics of the pY+3 binding site in the SH2 domain. Overall, the reported data seems to be convincing, however, in my opinion, in some cases, the statistical relevance of the reported data needs to be improved to prove that the discussed picture is not a consequence of stochastical fluctuations of the simulated systems. The authors acquire six different independent trajectories for each variant; inter-trajectories variability could be inserted to make the discussion more convincing. Another weakness of the paper that could be strengthened is the lacking of a discussion of a clear effect of the mutation on the interaction with its surrounding residues. One of the conclusions of the authors is that more than a different induced flexibility, the change in the residue 170 acts through a “rigid core mechanism”. This mechanism is described by proposing a clear sequence of interactions between the coiled-coil and the SH2 domains. The table in figure S4G clearly indicates this sequence of interactions, but, if this is the case, a clear different picture in the nearest-neighbor interactions of the residue 170 in the wild-type protein and in its D170A variant should be observed. Quite surprisingly, the authors do not search for these differences; I think that these data could contribute to making the hypothesis of the authors more convincing. Finally, my last major concern regards the linking to the experimentally observed effect of the D170A changing. What is experimentally observed is a lowering in the affinity of mutants with respect to the wild type. What the authors observe in their dynamics is higher flexibility in the pY+3 binding pocket in the D170A variant. The authors would to better discuss this aspect, for example by showing that in similar systems a correlation between these so different observables has been demonstrated.

Here below, I report minor concerns/comments:

- The paper needs an editing revision. For example: i) at page 3, in the first lines of the introduction section, the acronym for amino-terminal domain is lacking; ii) in the caption of table s1, uniport instead than uniprot is reported; iii) at page 12, figure 5H does not exist, probably the authors refer to figure 5F; iv) In the methods section (in the first paragraph) the reported pdb code is wrong (6tle instead than 6tlc); v) at page 24, “is calculated” is repeated two times

- In figure 1B, residues clearly folded in helical structures are colored in yellow, a color that indicates unstructured regions (loops) according to the caption. Is the structure wrong or the problem is in the assignment?

- At page 4, in the description of the SH2 domain family, the authors mention the pY+3 interactions a as specificity region. This is probably the case for the SH2 domain in STAT3, but, this statement is far to be general, and wider regions (from -2 to +5) can be determinants for the selectivity (see for example the case of the N-SH2 domain in the SHP2 phosphatase). More in general, the discussion at page 8 seems to develop on two different planes, that of the SH2 domain in STAT3 and that of the SH2 domain in general. The authors could be more clear, defining each time at which SH2 they refer.

- - At page 4. The STAT3 regulation mechanism mainly discussed in the paper is that the phosphorylation of Y705 promotes STAT3 dimerization, by binding of the pY705 in the SH2 domain of another STAT3 protein. The general motif for the SH2 domain in STAT3 is described as pTyr-Xaa-Yaa-Gln. Quite strangely, the sequence surrounding Y705 (SAAPpYLKTKF) does not contain this motif. This point resembles my previous comment on the effect of mutation on the binding affinity. A change in the pY+3 binding pocket could reflect changes in selectivity more than in affinity. Commenting on this aspect could help the discussion.

- a brief discussion on the biological interest (if any) for the D170A variant could be of interest (e.g. is this variant causative of pathologies?).

- The authors use the RMSD from a single structure to prove that the system is well-equilibrated. This approach could be misleading (see for example: J. Chem. Phys. 1998, 109:10115 or Comput Math Methods Med. 2012; 2012:173521). Flat (sometimes high) values of the RMSD reflect only that the sampled conformations are all equally distant from the reference but little can be said about the distance between them. RMSD computed by comparing all the frames to each other (with results reported as a graphical matrix) could be more reliable. In this context, quite surprisingly, the RMSD values seem to be higher for the SH2 domain in the wild-type simulations than in those of the mutant. How do the authors comment on this evidence?

- Some regions of the protein have been modeled because absent in the pdb file. In the scheme below the figure 1A (in which the different domains are indicated) the modeled regions could be somehow put in evidence. This information is practically absent in the paper, but it can help to interpret the RMSF data.

- At page 7, the statement " D170A variant has multiple quasi-stable conformations seen as multiple peaks in the core full length protein violin plot" in the comment to figure 2c needs further comments. The time evolution of RMSD (with respect to a different structure) reported in figure S1 seems to suggest that this behavior is not homogeneous on the six replicas. Probably the RMSD distribution in the six replicas could be helpful to understand if only a few simulations are responsible for these differences or if they are observed in all the simulations

- in a comment on the RMSD data the authors write " Overall, the deviation of the SH2 domain from the crystal structure is significantly increased and several quasi-stable conformations appear, while the CCD domain becomes somewhat more rigid." RMSD takes into account differences between structures, and much less about flexibility. Under this point of view, RMSF is definitely better,. By the way, the RMSF values seem to suggest the opposite behavior.

- data in figure 2 are acquired from six different simulations. How much is the standard deviation between the different simulations? These data should be added to figures 2A and 2B. For example, the reliability of the statement "This correlation is enhanced at the expense of correlation of different domains to the linker domain, with only CCD retaining its correlations with linker in D170A variant." depends on the values of the standard deviations.

- by looking at figure S2, the distance between 644 - 657 and 638-644 can be used as an approximate representation of PC1 and PC2 (as also shown in figures 4 B, C, E, and F). The plot of these two distances in the 12 simulations could help to understand how the PC1-PC2 space is populated in the different simulations

- the sentence " The motion of Y657 and Q644 yields the largest coefficients across both PC1 and PC2, underlining their conformational importance" at page 9 is misleading. The authors refer to the addition of the two contributions to PC1 and PC2 but the distance contributes only marginally to PC2

- How much of the explained variance is described by PC1 and PC2 vectors in the PCA analysis reported in figure S3?

- At page 12 "While the rigidity of the α helices diminished the utility of PCA analysis, it also demonstrates that the dynamics of the helices are highly correlated (Figure 3)." How does figure 3 demonstrate the correlations between the dynamics of the helices?

- It is not clear if the data reported in figure 5 refer to WT, to the D17A variant, or both of them.

- figure 5B is hard to read. A table reporting the percentage of existence for all the considered interactions would be more useful

- I'm not sure that the data reported in figure 5F are statistically significant. What about these data in the single variants? Are these data similar in the six replicas for each system?

- Are REDAN analyses performed on both the proteins (wild type and D170A variant)?

- the color code used in figure S4F is not clear

- in the discussion, the authors write that " The D170A variant explores and extends conformational space of SH2 domain, specifically with significant changes in opening of the pY+3 pocket.". SASA data could be useful to strengthen this statement.

- in the methods section or in the figure 1 caption an explanation of the method used to obtain the structure in figure 1 should be inserted

Reviewer #3: In this manuscript, authors investigated the structural and dynamic features of wild type and mutated STAT3 protein using molecular dynamics simulations. The comparison between the wild and mutated simulations provided not only the structural and dynamical difference but also the information on allosteric mechanism. This work is interesting to the readers of PLOS computational biology, it would strengthen the paper if the authors could address the following points.

Major points:

1) In this paper, the result and discussion parts are separated. The results include inferences based on the other research; I think it is okay to include such descriptions. However, some of the descriptions seems to hinder readability. For example, P13 line 4-5 “the rigidity transmission was oberserved in other proteins (Sljoka 23 , Ye et al. 24 ). Thus we further hypothesized…”. I found some of the explanations in the result part difficult to follow; as I suspect a reader less familiar with this protein might have even greater difficulties, authors should make the manuscript and the figures easier to understand for readers.

2) The authors should correct the title of result section more appropriately.

3) P6-, the authors also analyzed the whole structural change. I suggest that a domain analysis (for example, dyndom, etc) using those representative structures may support your results.

4) P9, line 4-7, the sentence “From these qualitative observations, it can be inferred that the increased flexibility in CCD due to loss of a negatively charged residue D170, leads to an allosteric increase in flexibility of the DBD and a decrease in flexibility of the LD and SH2 domains.” is difficult to understand. They may add a more detailed explanation based on the results of RMSD and RMSF.

5) P11 line 2 and Figure S2C, I understand that “The ten largest coefficients of the first two PCs” means the eigenvectors of PCA. Is this okay? If so, is it correct that these values of Figure S2C positive values? (absolute values?)

6) P11 line 18, in this sentence “Besides, Zhang et. al. observed that the truncation of STAT3 to exclude α1 helix is fully capable of DNA binding upon tyrosine phosphorylation by Src kinase in vitro, suggesting the preservation of a functional conformation of the pY pocket. This agrees with our observation of a stable pY pocket in the D170A variant”, can the author explain the relationship between pY+3 pocket structure in simulations and this experimental data?

7) P12 last part -P13, I think that the sentence “Significant differences in global helical tilt for each of the macro-states of the SH2 domain is observed in α3 showing functional correlative differences among macro-states (Figure 5H). The α3 helix also interfaces with most of the other domains through its C-terminal helical turn. Residues at this interface could transmit motion through interacting residues from other domains.” is very important point. Does this data (Figure 5H->F?) show statistically reliable differences? Authors should add a more detailed explanation on this.

8) P13 line 7-9, in the sentence “The existence of such an interaction network is demonstrated in Figure 5B, where the inter-domain pair Cα distances are plotted and colored according to standard deviation values computed across all trajectories”, I suggest that authors may add a more detailed description on analysis of inter-domain pair Ca distances.

9) In methods (P23~), the authors should add a more detailed description (number of water molecules, etc) and figure on the simulation system. In addition, they should also add the figure to explain locations of the D170A, pY pocket, and pY+3 pocket

10) P23, why did the authors perform NVT simulations as production runs.

11) P24, the authors should add a more detailed description of cross-correlation and RMSF (how to fit, etc). The information is very important to understand the results.

Minor points

I think that there are several mistakes in writing in this manuscript (see below). Please check out the manuscript.

1) P23, Molecular dynamics simulation, “3ms” � “3 μs” ? It causes misunderstanding.

2) P3, amino-terminal domain()

3) P23, initial structure, “6TLE”�”6TLC”?

4) authors should define abbreviation of interleukin (IL)

5) P12, Figure 5H �5F.

6) P14, authors should add the descriptions on M0-M6 in caption of Figure 5.

7) P14, Figure 5B, the figure and letters are too small to understand.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No: The authors could share the coordinates of the starting structures used for simulations and figures as also the raw data of the trajectories file. This is not a usual procedure, but this depends on the policy of the journal

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010794.r003

Decision Letter 1

Nir Ben-Tal, James Gallo

3 Sep 2022

Dear Ms. Zhao,

Thank you very much for submitting your manuscript "Allosteric Regulation in STAT3 Interdomains is Mediated by a Rigid Core: SH2 Domain Regulation by CCD in D170A Variant" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers found the revised manuscript much improved; however, there are some points raised that require your attention (outlined in their comments). We are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

James Gallo

Academic Editor

PLOS Computational Biology

Nir Ben-Tal

Section Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Thank you for carefully addressing the concerns raised by myself and the other reviewers. I have no further concerns.

Reviewer #2: The general quality of the revised version of the paper “Allosteric Regulation in STAT3 Interdomains is Mediated by a Rigid Core: SH2 Domain Regulation by CCD in D170A Variant” from Devin Matthews and co-workers is surely improved with respect to the previous version, and many concerns raised from the reviewers, mine included, have been faced and solved. However, the new data leaves unanswered two central questions. First, are the six simulations truly representative of the protein dynamics? And second, is the mechanism connecting the tilt of the α3 helix in the CCD and the dynamics of the pY+3 pocket in the SH2 domain rigorously demonstrated? Let me start with the second question. I think that figure S7, mainly the B panel (that above on the right, as letters are reported in the caption but not in the figure) clearly shows that this correlation does not exist. In all the states the tilt of the helix populates the same distribution (with a slight exception for the distribution in the M0 state), and I really don’t understand how the P-values reported in table S2 can reach so low numbers (e.g. 2.03E-306, between M2 and M3), how many freedom degrees have been considered? By the way, but this is a minor point, I definitively prefer to distinguish between p < 0.05 for statistically significant correlations and < 0.001 for highly significant ones, (What is the real difference between 2.03E-306 for M2/M3 and 0 for M1/M2?). In addition, also by looking at figure 4, it seems that the conformation of the α3 helix is the same in the five states. I think that the authors need to search for more robust pieces of evidence of the correlation between α3 helix and SH2 dynamics, otherwise, different explanations have to be proposed. The sentence “The rigidity of the α helices diminished the utility of PCA analysis”, is surely true for inter-domain PCA, but I think that PCA calculated on a sub-space (e.g the α3 helix in the CCD domain and the SH2 domain could enforce the proposed picture) could be helpful. Beyond this aspect, in my opinion, concerns remain on the efficacy of sampling. The authors write (page 9 line 276): “The long transition timescale between these states validates our kinetic clustering model, but also necessitates a “global” view of the trajectories in order to explain the full conformational dynamics of the protein". If long transition timescales characterize the system, the importance of each state is truely difficult to be predicted, as also cannot be ruled out that other states (not sampled in the six simulations) play a role. Furthermore, if stochastic transitions characterize the dynamics in the simulations, how can the authors exclude the possibility that wild-type will populate the M0, M1, and M3 states in further simulations? Probably for so complex system, enhanced sampling techniques could be preferable.

A few minor points remain:

1) In the answer to my question 20, in which I asked for SASA analysis, the authors answer that “we don’t have the facility to do it”, but SASA can be calculated with different free software without particular difficulties. I invite again the authors to calculate these data.

2) In the caption of figure 1 “there us” instead of “there is”

3) The figures in SI are recalled quite randomly.

4) Page 6 line 154 “The hypothesis was tested by comparing corresponding helical tilt at the CCD of different macro-states identified in Conformational differences in the pY+3 binding pocket correlate to the decreased binding affinity from wild type to D170A variant for the pY and pY+3 pockets.” The sentence is not clear to me.

5) Caption figure 4 indicates α21, whilst in the figure α27 is highlighted

Reviewer #3: The manuscript has been much improved.

This paper is an important contribution and I recommend that it be accepted for publication.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010794.r005

Decision Letter 2

Nir Ben-Tal, James Gallo

28 Sep 2022

Dear Ms. Zhao,

Thank you very much for submitting your manuscript "Allosteric Regulation in STAT3 Interdomains is Mediated by a Rigid Core: SH2 Domain Regulation by CCD in D170A Variant" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers.

Your second revision has improved the paper, yet there is still an issue raised by a reviewer concerning a correlation between the α-helical tilt and the dynamics of the binding site. Please address this issue in a revised manuscript. 

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

James Gallo

Academic Editor

PLOS Computational Biology

Nir Ben-Tal

Section Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #2: In the previous revision, I had two main concerns. Concerning the first one, the authors have added a sentence of caution regarding the efficacy of the sampling; considering the complexity of the system, this can be considered sufficient.

On the contrary, practically nothing has been done to demonstrate a correlation between the movement of α3 helix in the CCD domain and the binding site on the SH2 domain. In my opinion, data reported in figure 3F and in figure S4 show that a correlation is absent. The authors wrote that “as example, small rotation of Ile252 (located at α3) could shift Gln511 (located at α21) because of the conserved the hydrogen network, which end up pushing away ILE569(located at α26) from β22.”. This suggests that probably a correlation exists but it does not necessarily involve the movements of the α3 helix (that is a global variable, probably less affected by small variations). To support their statement the authors refer to data in table S2. In this table now the degrees of freedom are reported. These numbers could be an overestimation of the real degrees of freedom; for example, it is unclear to me if the considered conformations have been sampled with an interval of time greater than the time decay of the autocorrelation of the helical tilt. If this is not the case, the considered conformations would not be completely independent (as a part of a trajectory), the real degrees of freedom could be significantly lower and, in turn, the p-values sensitively higher than those reported. In summarizing, I remain of the idea that a correlation between the α-helical tilt and the dynamics of the binding site is not demonstrated. As stated above, a correlation between some structural features in the CDD domain and the dynamics of the binding site in the SH2 domain could exist, but this correlation probably involves features more punctual than the helical tilt.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010794.r007

Decision Letter 3

Nir Ben-Tal, James Gallo

5 Dec 2022

Dear Ms. Zhao,

We are pleased to inform you that your manuscript 'Allosteric Regulation in STAT3 Interdomains is Mediated by a Rigid Core: SH2 Domain Regulation by CCD in D170A Variant' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

James Gallo

Academic Editor

PLOS Computational Biology

Nir Ben-Tal

Section Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #2: The modified paper and the comments of the authors answer my questions and the paper is in my opinion ready for publication

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010794.r008

Acceptance letter

Nir Ben-Tal, James Gallo

15 Dec 2022

PCOMPBIOL-D-22-00920R3

Allosteric Regulation in STAT3 Interdomains is Mediated by a Rigid Core: SH2 Domain Regulation by CCD in D170A Variant

Dear Dr Zhao,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofi Zombor

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. RMSD and RMSF analysis.

    D170A mutation induces large structural but minor dynamical changes in STAT3.Table A. Statistical differences for the α3 global tilt among different macrostates (Fig 3F). Kolmogorov-Smirnov test and T-test were done using scipy.stats.ks_2samp and scipy.stats.ttest_ind function respectively.

    (DOCX)

    S1 Table. Secondary structure assigned by the UniProt database by compiling structure information from multiple x-ray crystal structures.

    (TIF)

    S1 Fig. Additional information of PCA and MSM analysis.

    (A) PCA scree plot: dot shows the cumulative explained variance of the principal components; the bar chart represents the explained values per component. (B) Relaxation timescales of MSM for SH2 domain conformational space at different lag times. (C) The first ten features that contribute the first two PC the most. The absolute value of PC1 and PC2, and square root of sum of squared PC1 and PC2 values are shown here.

    (TIF)

    S2 Fig. Pair Cα distance distribution between Q644 and Y657, Q644 and E638 of 6 replicas for wild type and D170A variant.

    (TIF)

    S3 Fig. Characterization of CCD using pair Cα distances.

    (A) PCA 2D plane of CCD pair Cα distances colored by macro-state from SH2 domain results. (B) Represent structure of CCD corresponding to Figure A. (C) The coefficients of first 100 features that contribute the first two PC the most.

    (TIF)

    S4 Fig. The PCA analysis of pair distances of rigid core (residues:240–252, 474, 479, 511, 546, 549, 550, 562,564, 568, 610, 611).

    (A) PCA 2D plot colored by different macrostates; (B) PCA 2D plot colored by systems; (C) The first ten features that contribute the first two PC the most. The absolute value of PC1 and PC2, and square root of sum of squared PC1 and PC2 values are shown here.

    (TIF)

    S5 Fig. The PCA analysis of pair distances of rigid core without residue 562 (residues:240–252, 474, 479, 511, 546, 549, 550, 564, 568, 610, 611).

    (A) PCA 2D plot colored by different macrostates; (B) PCA 2D plot colored by systems; (C) The first ten features that contribute the first two PC the most. The absolute value of PC1 and PC2, and square root of sum of squared PC1 and PC2 values are shown here.

    (TIF)

    S6 Fig. CCD α3 global tilt angle distribution.

    (A,B) CCD α3 global tilt angle distribution of different macrostates, plotted separately for the wild type and D170 variant. Average helix tilt angle within each macro-state is illustrated by a vertical dashed line. (C,D) CCD α3 global tilt angle distribution of different replicas for wild type and D170A variant.

    (TIF)

    S7 Fig. Additional REDAN analysis results.

    (A) Proposed pathway from 170 to 640 shown in the protein structure; (B) Proposed pathway from 170 to 644; (C,D,E) Key pair residue distance; (F) Ramachandran Dihedral for residue 513,514, 515 and 517; (G) Summary of proposed pathways from source residue 170 to target residue 640,644 and 657.

    (TIF)

    S8 Fig. The mean RMSF of six replicates was plotted, with the RMSF values for each residue separated by domain.

    The wild type is plotted in blue and the D170A variant in orange, with error bar indicating the stand error among the six replicates. Structures of each domain colored by wild type RMSF values are shown (low RMSF values in white to high RMSF values in red).

    (TIF)

    S9 Fig. Proposed specific mechanism.

    (A) Representative structure macro-state 0 (light cyan) compare with macro-state 4 (salmon); (B) Representative structure macro-state 3 (green) compare with macro-state 4 (salmon) (C-G) key residue pair CA distance or contact distance (closest heavy atom distance). ILE-252 to LYS-573 distance distribution and PHE-512 to LYS-573 distance distribution show α26 shift away β22 in macro-state 0 and 1, while toward in macro-state 3; PHE-512 to LEU577 and PHE-512 to SER-649 distance distribution show α26/α27 and α32/α33 loops move away β22 in macro-state 0, 1 and 3, while in macro-state 3, α32 moves close to α26 as indicated by ASP-570 to LYS-642 distance distribution.

    (TIF)

    S10 Fig. System parameters (pressure, temperature) and system energy (potential energy, kinetic energy and total energy) of wild system and D170A variant system, each system has six replicas: cp1, cp2, cp3, cp4, cp5 and cp6.

    (TIF)

    S11 Fig

    First row: Root Mean Square Deviation (RMSD) of wild system and D170A variant system, each system has six replicas: cp1, cp2, cp3, cp4, cp5 and cp6. Note: rolling average of every 100 frames was plotted here for better visualization. Second row: The pair RMSD (frames every 1 ns were extracted and used for pair RMSD calculation) of both systems for each replica.

    (TIF)

    S12 Fig. Figure depicting the simulation system, generated by VMD.

    Water is shown as lines, ions are shown as vdw, protein is shown as cartoon.

    (TIF)

    S13 Fig. Root Mean Square Deviation (RMSD) analysis.

    (A, B) Cross-correlation (Pearson correlation) of RMSD values of each domain for wild type and D170A variant. Here RMSD values were calculated with the first frame of reference since the correlation of dynamic changes of each domain is of interested. The cross-correlation was done by merging the rmsd of 6 copies trajectory together, and then the Pearson correlation among different domains were calculated. While worth being noted, the correlation value does not suggest the functional correlation between domains, since RMSD is an overall measurement of conformational changes with regarding to the reference structure, distinctive conformations may have same RMSD value. (C) Violin plot of RMSD values for the whole protein (core full length protein, not including NTD) and each domain. Here crystal structure was using as reference since the conformational changes difference between wild type and D170A was of interested. (D) The RMSD distribution of SH2 domain in the six replicas for D170A variant.

    (TIF)

    Attachment

    Submitted filename: response.docx

    Attachment

    Submitted filename: response.docx

    Attachment

    Submitted filename: response.docx

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting information files. Starting structures used for simulations, the raw data of the trajectories files and pymol files are avaliable at https://osf.io/dvzq7/.


    Articles from PLOS Computational Biology are provided here courtesy of PLOS

    RESOURCES