Graphical abstract
Keywords: Structure-mechanics statistical learning, Network theory, Rigidity graph, Protein dynamics, All-atom molecular dynamics simulation, Scale-free, Serine protease, PDZ3
Highlights
-
•
Statistical learning from protein dynamics unravels rigidities in interaction network.
-
•
Backbone and side-chain mechanical couplings exhibit scale-free network properties.
-
•
Graphical depiction of network rigidities captures sequence co-evolution patterns.
-
•
Functional sites at secondary structure peripheries are mechanical hotspots.
-
•
Our rigidity scores are compelling metrics for residue biological significance.
Abstract
A backbone-side-chain elastic network model (bsENM) is devised in this contribution to decipher the network of molecular interactions during protein dynamics. The chemical details in 5 μs all-atom molecular dynamics (MD) simulation are mapped onto the bsENM spring constants by self-consistent iterations. The elastic parameters obtained by this structure-mechanics statistical learning are then used to construct inter-residue rigidity graphs for the chemical components in protein amino acids. A key discovery is that the mechanical coupling strengths of both backbone and side chains exhibit heavy-tailed distributions and scale-free network properties. In both rat trypsin and PDZ3 proteins, the statistically prominent modes of rigidity graphs uncover the sequence-specific coupling patterns and mechanical hotspots. Based on the contributions to graphical modes, our residue rigidity scores in backbone and side chains are found to be very useful metrics for the biological significance. Most functional sites have high residue rigidity scores in side chains while the biologically important glycines are generally next to mechanical hotspots. Furthermore, prominent modes in the rigidity graphs involving side chains oftentimes coincide with the co-evolution patterns due to evolutionary restraints. The bsENM specifically devised to resolve the protein chemical character thus provides useful means for extracting functional information from all-atom MD.
1. Introduction
Proteins exhibit remarkable properties such as thermal stability, specific molecular binding, and catalytic activities. These functionally important features are sensitive to mutation and can trace their origin to both the polypeptide backbone that frames the structure and the side chains that define the chemical specificity [1], [2], [3]. Deciphering the contributions of these two components to functional properties, though, remains a fundamental challenge [4], [5], [6]. The folded topology was recognized to host a variety of sequences in structural families [7], [8] and has inspired artificial protein engineering and design [9], [10], [11]. Constructing potential energy function with the structural network, such as the elastic network model (ENM) [12], [13], [14] and the Gō model [15], [16], [17], is very useful in studying functional motions [12], [13], [14], protein folding [15], [16], [17], and allosteric wiring [18], [19], [20]. In addition to the Hamiltonian-based methods, graphical analysis [21], [22], [23] is frequently applied to analyze the protein structural network. If the distance between a residue pair is within a cutoff, their edge in the adjacency matrix is typically set to one, and the diagonal degree matrix records the residue contact numbers. This topology-based approach corresponds to using a universal spring constant in ENM. The Laplacian matrix () [24], [25] was found to offer good approximation for low-frequency motions [13], [14], and the structural network is often used to study the collective vibrations that are not very sensitive to the sequence specificity due to side chains. [26], [27], [28], [29].
Yet, beyond the backbone-framed protein structure, how to delineate the networks of molecular interactions, what are their differences in comparison to the structural network, and what are the manifestations of the side-chain sequence and dynamical motions? In particular, if the interaction network of side chains could be studied, its properties are expected to differ from that of the backbone. To address these key issues of the sequence-structure-function relationship, a graph-theoretic methodology is devised here to compute the mechanical interactions mediated by side chains and backbone from all-atom molecular dynamics (MD) simulations in an explicit solvent.
The guiding principle is that given the unique structural position and chemical environment of a protein residue, its couplings with surrounding atoms would assume specific strength during dynamical motions. A mechanical-coupling dynamics perspective is thus taken to unravel the network behaviors of physical interactions. In particular, the elastic parameters in a model of backbone and side-chain nodes connected by harmonic springs, i.e., the backbone-side-chain elastic network model, bsENM, proposed in this work and illustrated in Fig. 1A, is used to represent the effective interaction strengths. Our design of bsENM is to explicitly represent the protein chemical components for resolving the backbone and side-chain contributions in the mechanical coupling network. In particular, the scope of bsENM spring constants is expanded for an unexplored context: to statistically learn the mechanical coupling strengths of backbone and side chains from all-atom MD simulations. The effective elasticities calculated by self-consistent iterations [30], [31], [32] are then used to construct inter-residue graphs. As will be shown later, this bsENM-graph approach can be used to map the protein dynamics into distinct rigidity groupings, namely the backbone-backbone (BB), the backbone-side-chain (BS), and the side-chain-side-chain (SS). Investigating them using spectral analysis allows us to uncover their unique mechanical topologies for comparing with experimental observables such as residue conservation and co-evolution in multiple sequence alignment (MSA), mutation sensitivity, residue flexibility profile, and signals reflecting residue micro-environments. Under this structure-mechanics statistical learning framework, in contrast to the aforementioned topology-only approach, even interaction pairs of similar distance separation can have very different coupling strengths. Using the elastic properties statistically learned from an all-atom MD trajectory as the edge weights in and thus offers a new perspective—the rigidity graphs of protein dynamics. To reveal the impact of chemical details, the specifically designed scheme is dividing the bsENM harmonic potentials into (a) skeleton springs as those linking the nearest and second nearest residues and (b) non-skeleton springs as the rest. With and , we first establish that the low-frequency modes of the skeleton Laplacian () are exceedingly insensitive to the variation in strength during protein dynamics. Analysis of mechanical couplings is hence focused on non-skeleton springs. A key finding is that the non-skeleton signless Laplacian () reveals the specific patterns very clearly. The statistically prominent features of can thus be extracted from the all-atom MD trajectory to reveal the mechanical topologies of different interactions. To illustrate this approach, we choose as case studies a serine protease family member rat trypsin (RT) [33], [34] and PDZ3 (the third PDZ signaling domain) [35], [36], [37] because they both are -strand rich and have comprehensive data for site-specific mutagenesis and functional activity experiments. In addition, the wealth of sequence information allows the calculation of inter-residue co-evolution from statistical coupling analysis (SCA) [38], [39], [40] or direct coupling analysis (DCA) [41], [42] from MSA and provides complementary insights. The BB (backbone-backbone), SS (side chain-side chain), and BS (backbone-side chain) rigidity graphs can thus be compared with these observables for linking physical interactions with biological functions and evolutionary pressure.
Fig. 1.
Rigidity graphs of the protein mechanical coupling network statistically learned from all-atom MD, see Materials and Methods for details. (A) A sampled atomic structure is mapped onto the coordinates of backbone and side-chain sites in the coarse-grained (CG) representation of bsENM. Left: a ribbon representation for the atomic structure of RT bound with BPTI. The catalytic triad is highlighted and a zoomed-in view showcases the atomic-to-CG mapping in Table S1. Right: the mapped CG configuration. (B) Dividing the 5 μs all-atom MD trajectory into consecutive 10 ns windows, and the work flow for computing the bsENM parameters of each segment by structure-mechanics statistical learning.
2. Materials and methods
The elastic parameters in bsENM (’s) are calculated from an all-atom MD trajectory by matching the fluctuations of inter-site distances. Effectively, this structure-mechanics statistical learning integrates out the other degrees of freedom by self-consistent iteration with normal mode analysis (NMA) [30], [31], [32]. Moreover, mechanical coupling strengths represented by the spring constants are used to construct BB, BS, and SS rigidity graphs. This computational framework is detailed in the following using RT as the example.
2.1. All-atom MD simulation
The X-ray structure of BPTI bound RT (PDB ID: 3TGI) is used to construct its all-atom model [33] whereas for PDZ3, the apo X-ray structure (PDB ID: 1BFE) [37] is used. All systems are solvated in orthorhombic dodecahedron TIP3P water boxes and neutralized with NaCl ions at 0.15 M. The CHARMM36 all-atom force field [43] is used to compute the potential energy and the GROMACS software [44] is used for MD runs. The production run for both the RT and PDZ3 systems is at 300 K and 1 atm for 5 μs, during which a snapshot is saved every 1 ps for analysis. The other details are reported in SI.
2.2. Structure-mechanics statistical learning of bsENM parameters
With the raw data of protein dynamics generated with full atomic details, the CG sites in bsENM serve to read out the statistics of inter-site elasticity. The goal is to capture the significant mechanical couplings that can survive thermal noise, and the CG sites are thus located where specific molecular interactions are typically observed in the trajectories. In particular, the backbone is represented by two coarse-grained (CG) sites at the amide nitrogen and carbonyl oxygen positions as they are the loci of hydrogen bonding, Fig. 1A. For side chains, a CG site is placed at the position of a representative atom at which specific interactions are formed. For example, the alanine side-chain site is at and that of lysine is at . For hydrophobic side-chains, the center of mass of heavy atoms is used. Table S1 lists the details of this atomic-to-CG mapping, which is used to convert each frame in the MD trajectory to a bsENM configuration.
Given a set of bsENM configurations, the spring length between sites and is their averaged distance. To parametrize , the variance of distance fluctuation, , in the all-atom MD data is the targeted value. At each iteration step, NMA of the bsENM gives the predicted distance fluctuation, and is adjusted to match the targeted value [31], [32]. Since the bsENM springs are connected in the structure, their coupled fluctuations are handled by self-consistent iterations in this fluctuation matching. Other details are reported in SI. To construct the starting model (initial guess) for statistical learning, a cutoff distance is used to include a harmonic potential in the bsENM for every inter-site pair with . The cutoff is thus an adjustable parameter and is determined by scanning the value and comparing the resulting residue RMSF (root-of-mean-squared-fluctuation) and the low-frequency vibrational modes of the statistically learned bsENM with those calculated from all-atom MD. As discussed in Fig. S1, consistent behaviors are observed over a wide range of cutoff values since the bsENM springs are trained to match the all-atom MD target data. The default is set to 7.8 Å as shown in Fig. S1. After convergence of the first round, connectivity trimming followed by another round of fluctuation matching is conducted to prevent having excessive springs in the network; other details are reported in SI. An approximation of bsENM is using a universal value for all springs within the cutoff. This -order construction solely relying on the structural network is denoted bsENM0. To highlight the effects of chemical details within the native topology, the bsENM statistically learned from all-atom MD is compared with the bsENM0 of the same equilibrium structure and .
2.3. Construction of inter-residue rigidity graphs from bsENM
In our graphical representation of inter-residue interaction networks, the edge weights between residue nodes and in the adjacency matrix are , the sum over the bsENM spring constants linking their CG sites. For bsENM0, is set to 1 for the springs within . In this graphical theory, the degree matrix components record the total coupling strength of each residue , or, in the case of bsENM0, the residue contact number of CG sites within . Since bsENM springs can be categorized by the types (backbone or side chain) of their CG sites, the graphs of different rigidity groups can be constructed accordingly. For example, , , and are the signless Laplacian of the non-skeleton springs in backbone-backbone, backbone-side-chain, and side-chain-side-chain groups, respectively. For , disulfide bond springs are exceedingly strong and are skipped to focus on non-covalent mechanical couplings.
2.4. Comparison of rigidity graphs
A protein of residues can thus have different rigidity graphs that are symmetric and positive-semidefinite or positive-definite. The eigenvectors, which also form an orthonormal basis set, are ordered according to their eigenvalues in a descending order with mode index . The eigenvectors of a rigidity graph are the specific patterns of mechanical couplings between protein residues. To quantify whether different rigidity graphs have similar behaviors, the following mode-based procedure is developed.
The similarity of graph with respect to a reference graph along its mode is defined as , i.e., by finding the eigenvector in that the dot product with mode in has the largest magnitude. Here, and are the eigenvectors of the and rigidity graphs, respectively. For example, if for , the bsENM Laplacian, with respect to , the bsENM0 Laplacian, then has an identical counterpart as mode in . The mode of the compared graph that delivers the phase, , may not be the same as in the reference graph, since the respective mode rankings may differ.
2.5. Statistical analysis of protein rigidity graphs
With the 5 μs production runs of RT and PDZ3, a single bsENM using the entire trajectory only provides an equilibrium harmonic approximation for structural fluctuations, omitting the potentially interesting and informative fluctuations on the interaction-network level. This limitation is overcome by dividing the trajectory into consecutive 10-ns windows to compute a series of rigidity graphs (see Fig. 1B for an illustration of windowed trajectory segmentation for statistical analysis). The 10-ns window appears to be a reasonably robust window size in terms of extracting mechanical coupling parameters. This is illustrated in Fig. S2, which shows that the distribution of the inter-site interaction strength, ’s, appears unchanged in 5-ns, 10-ns, or 20-ns window sizes. Therefore, the coupling network (e.g., the rigidity graph) extracted from a 10-ns window trajectory is used as an element in further statistical analyses.
2.5.1. Mean-modes of fluctuating rigidity graphs
The rigidity graph of a protein fluctuates and evolves over time as the protein experiences thermal fluctuations or interacts with the surrounding molecules. To capture and quantify these fluctuations on the network level using the rigidity graph, we begin by defining “mean-modes” which are to be understood as the average eigenmodes of an otherwise fluctuating rigidity graph. They are computed as follows: From the bsENM of each trajectory window indexed by , the non-skeleton springs are used to calculate the off-diagonal and the diagonal terms in the inter-residue rigidity graph . Averaging the graphs over the temporal segments gives the mean rigidity graph . From the mean rigidity graph , one finds eigenvectors, ’s, each of which is a mean-mode with a corresponding eigenvalue —the coupling strength for the mean-mode. Note that hereafter a superscripted index such as is used for variables and functions derived from the mean rigidity graph, .
2.5.2. Content of a mean-mode in an analysis time window
Each mean-mode is a unit vector for the residues of the rigidity graph, and can be understood as an -vector mechanical coupling pattern. In this light, in addition to the coupling strength, another important property is how much the pattern of a particular mean-mode is retained in the mechanical coupling network of each of the analysis window. This property is determined by first calculating the mean-mode content in each trajectory window following the description in Section 2.4 as , where is an eigenvector of indexed by . A high mean-mode content of indicates that the pattern of stays the same in the trajectory window. Averaging the mean-mode content over all trajectory windows then gives the averaged mean-mode content . It measures the extent to which the eigenvector-pattern is retained throughout the entire all-atom MD trajectory.
2.5.3. Prominent modes of a rigidity graph
With the coupling strength and the average content defined for a given mechanical-coupling pattern (designated by the mean-mode ), we next ask which mean-mode features are most prominent as the protein undergoes dynamical structural fluctuations. For the present work, we define a “prominent mode” as a mean-mode that exhibits both (a) strong strength in the mechanical coupling and (b) high averaged content during the protein dynamics. More quantitatively, for (a), mean-modes exhibiting strong couplings are defined as those showing higher than the upper fence of an empirically defined quantity, , within the distribution (see Fig. S3). For (b), mean-modes having high contents during the protein dynamics are based on the cumulative density function (CDF) of . The cutoff for high-content designation is assigned empirically depending on the type of mechanical coupling. For and , only those at top 25% of values are considered the candidates of prominent modes, while the empirical percentile cutoff is top 32% for the modes (see Fig. S3 for summary plots). As shown in Figs. S4–S6, the prominent modes of rigidity graphs exhibit pointed patterns in residue weights (), and mechanical hotspots are filtered out as the residues having significant population.
2.6. Residue rigidity scores for backbone and side chain
The set of prominent modes in a rigidity-graph, , contains the strong and high-content mechanical coupling patterns during protein dynamics and are potentially important for functional activities. Therefore, key residues in the prominent modes likely have significant biological importance, thereby providing yet further refined chemical specificity. With this insight, the participation of each protein residue in the prominent modes is used to compute a quantitative metric in the mechanical coupling network—the residue rigidity score. In a rigidity graph, we start by finding the characteristic mode for each residue indexed by . The protein residue is first associated with a rigidity-graph mode that its weight is the highest, and the of is set to if the weight is significant ().
| (1) |
In this equation, if residue does not play a significant role in any of the prominent modes (), is one of the rest eigenvectors () that it has the highest weight. As shown in Figs. S4–S6, the empirical is used to identify the significantly populated residues in a prominent mode. Next, the residue rigidity score in the graph is defined as , i.e., the mechanical strength weighted by the averaged mean-mode content. If the of residue is in , its residue rigidity score is high, while if has a weak strength and/or low averaged content during protein dynamics, is low. The residue rigidity score in backbone is defined as , the maximum score the residue delivers though its backbone, and is compared only if residue contributes backbone in the mode. Similarly, the residue rigidity score in side-chain is calculated as , and is only considered if residue participates by its side-chain.
3. Results and discussion
With the all-atom MD simulation, structure-mechanics statistical learning, and rigidity graph analysis for both the RT and the PDZ3 proteins, the main text primarily uses RT for introducing the rich and quantitative information made available by our new approach. The two systems illustrate the common general features of sparse mechanical coupling network, scale-free network coupling strengths, hotspots in both backbone and side-chain interaction networks, and the residue rigidity scores during protein-dynamics as a new metric for biochemical functions. In what follows, the mechanical coupling network is analyzed in detail.
3.1. The protein mechanical coupling network is sparse
Our structure-mechanics statistical learning with bsENM provides a way to map the chemical details during protein dynamics onto values. The bsENM calculated from a trajectory segment thus contains a list of springs each with a length and a positive-definite elastic constant . The diverse mechanical coupling strengths can be seen in the distribution from a RT trajectory window, Fig. 2A. The skeleton springs between residues neighbors and disulfide bonds are exceedingly strong and will be discarded later in the analysis of rigidity graphs. In Fig. 2B, the number of inter-site pairs in the protein structure separated by the value in each bin is shown with its fraction of springs that converge to >0 after the self-consistent iterations. With increasing , it can be seen that the fraction of positive-definite springs drops further, and the protein mechanical coupling network is thus progressively sparser than the structural contact network. Sparsity of the mechanical coupling network signifies that the protein fold can afford sequence variations to accommodate different functions, and the network properties of mechanical couplings during protein dynamics can potentially serve to capture the functionally important interactions.
Fig. 2.
The elastic spring parameters of a bsENM statistically learned from a trajectory window of RT. (A) Normalized histogram of coupling strengths. Between CG sites and , is the coupling strength. Disulfide springs that connect the S atoms of a cysteine pair are very strong. Skeleton-1 includes the springs within a residue and between the nearest residues. The very high strengths in skeleton-1 are peptide bonds, whereas the other springs in this category are on the left. Skeleton-2 is the springs between the second nearest residues. Non-skeleton springs are the rest with disulfide bonds also excluded. (B) In each 0.5 Å bin of the spring length , the number of pairs in the protein structure (left) versus the number of springs converging to a non-zero in the fluctuation matching of a trajectory window (right).
Although the inter-residue Laplacian matrix () of topological contacts is useful in mimicking collective modes, capturing the specific mechanical coupling patterns may call for a different representation. This aspect is illustrated by using the of bsENM0 from topological contacts (no chemical details) as the reference. For the lowest-frequency modes of from all-atom MD, the similarity being close to 1 with respect to those of (Fig. S7A) shows that the collective vibrations in are indeed insensitive to the chemical details. If considering only the skeleton springs, the modes are essentially identical to those of (Fig. S7A), whereas the modes of from non-skeleton springs exhibit lower values with respect to those of . Robustness of the low-frequency modes in thus mostly comes from the skeleton springs.
To better reveal the molecular specificities in inter-residue elasticities (’s), we seek to the rigidity graph. The similarity of the modes with respect to those of shows that the lowest-frequency modes of still exhibit specific behaviors as the values are low, Fig. S7. Even for skeleton springs, exhibit significant differences comparing to . Furthermore, inspection of the eigenmodes for non-skeleton springs (Fig. S8) shows that they reveal clear signals for the patterns of non-covalent ’s. Therefore, we employ the rigidity graph and its BB, BS, and SS parts in the following to uncover the interplay of backbone and side chains in the protein mechanical coupling network.
3.2. Protein mechanical coupling networks have scale-free behaviors
The non-skeleton spring constants are used to construct the off-diagonal inter-residue coupling strength in , and the diagonal is the total coupling strength of residue . In the graphical theory terms of , is the edge weight between the and nodes, and is the degree of node . For a network, the degrees exhibiting power-law scaling in its high-value tale is an indicator of the scale-free property [45], [46], [47]. Such heavy-tailed profiles are due to most nodes having low values, but a small fraction of hotspots exhibits high couplings [48], [49], [50]. As a counter example, residue contact number in the graph of the bsENM0 is not scale-free given the packing density in a native fold, Fig. 3A. The mechanical coupling network of the bsENM springs statistically learned from all-atom MD, on the other hand, behaves fundamentally different. The probability density exhibits a long tail and fits quite well with the classical Lomax distribution [51] for heavy-tailed profiles, Fig. 3B. As such, the mechanical coupling strengths during protein dynamics exhibit power-law scaling despite the data having more complicated patterns, and the exponent is in the typically encountered range (2<<3) of real-world scale-free networks [45], [46], [47], [48], [49], [50]. The protein structural network, however, lacks such scale-free property. The data in Fig. 3 include the bsENM and bsENM0 of every 10-ns trajectory window in the 5-s production run of RT.
Fig. 3.
Mechanical couplings during protein dynamics exhibit a heavy tailed distribution and scale-free network behavior. The diagonal components of (bsENM) and (bsENM0) in every trajectory window of the 5 μs production run of RT are included. Top panel: , the probability density distribution of , the residue coupling strength due to non-skeleton springs in bsENM. Insert: , where the diagonal is the residue contact number. It follows a Gaussian distribution given the packing density in the protein structure. Bottom panel: in the log-log scale. The orange line is the best-fit Lomax distribution. The red line in the insert is a power-law fit with the scaling exponent .
The scale-free behavior indicates that structural contacts within similar distances have highly diverse coupling strengths. Our finding of this network property of protein dynamics is consistent with the observation of mutational tolerance [52], [53] that only a certain percentage of mutations would impact the phenotype. Protein rigidity graphs, similarly, have just a fraction of inter-residue edges carrying significant weights, and can potentially serve as the molecular-scale mechanistic basis for mutation sensitivity. Next, the functional connection is analyzed by first focusing on the specific properties of backbone and side-chain mechanical coupling networks.
A key advantage in our design of bsENM is that the CG sites are either the backbone or side-chain type. The backbone-backbone, backbone-side-chain, and side-chain-side-chain rigidity graphs can thus be constructed to uncover their separate behaviors in the mechanical coupling network, and . This decomposition of the mechanical coupling network shows that the rigidity graphs of different chemical components all exhibit heavy tails and scale-free behaviors, Fig. 4. Containing the overall weaker non-polar interactions, Fig. S9, the exponent of power-law scaling for the values in is steeper (), Fig. 4. The high-value outliers in , though, exhibit complicated behaviors that deviate from the simple power-law equation. In an alternative representation by spectral analysis, the rigidity graph eigenvalues (’s) are the coupling strengths of different modes, and they also exhibit heavy-tailed distributions and power-law scaling. For the distribution of , the 2.39 exponent is similar to that of and , Fig. 4, indicating synergistic combination of inter-residue couplings in the collective modes.
Fig. 4.
The (left panels) and (right panels) of , , and rigidity graphs on a log-log scale. The rigidity graphs of all trajectory windows in the 5 μs production run of RT are used. The orange line is the best-fit Lomax distribution for the heavy-tailed profiles. The red line in the insert is a power-law fit with the scaling exponent .
For the protein mechanical coupling networks of a fixed number of nodes (residues) to have scale-free behaviors, their edge weights exhibit high-strength tails as shown in Fig. S9. To analyze the dependence of network properties on chemical differences, the off-diagonal terms of (’s) are grouped according to the secondary structure (sheet, helix, and loop) while those of are divided into polar and nonpolar groups. The couplings are all polar since backbone is involved. If either residue or is not in a helix or sheet, the pair is counted as in loop. The residue composition of RT secondary structures is listed in Fig. 5A. The power-law scalings of these categories are indeed different, and each case has evident higher and/or lower-valued outliers deviating from the simple formula. The helix ’s are the highest populated in the 5-10 kcal/mol/Å2 range, but they do not have any instance of very strong strengths (>15 kcal/mol/Å2) and are the least-tailed group, Fig. S9. On the other hand, sheets have much higher chances of exhibiting exceptional strengths during the dynamical motions and have the lowest . Heavy-tailed distributions of strengths are also seen in the BS, BB-loop, and SS-polar couplings. For the nonpolar side chains in RT, its tail is shorter and the power-law scaling exponent is higher.
Fig. 5.
In RT, the prominent backbone-only mechanical couplings during protein dynamics. (A) A ribbon representation the structure and residue composition of secondary structures. (B) Prominent mode residues in are in licorice. The BPTI inhibitor is labeled as brown ribbon. The index, eigenvalue, and residues of prominent modes are listed.
In the 5 μs dynamics of PDZ3, scale-free behaviors of the BB, BS, and SS rigidity graphs are similarly observed, Fig. S10, and the BB-helix and SS-nonpolar couplings are also less heavy-tailed, Fig. S11. The BB-helix ’s, though, have a better fit with the Lomax distribution than those in RT. Interestingly, the polar components exhibit a much extended tail in PDZ3, and strengths even higher than those of sheet show up, Fig. S11, illustrating protein specific behaviors in the mechanical coupling networks.
The results of both RT and PDZ3 show that the mechanical couplings of backbone and side chains have different network properties. To further illustrate this point, specific patterns in the rigidity graphs of non-skeleton ’s are characterized by spectral decomposition. This analysis also provides the data in identifying the mechanical hotspots based on the contributions of residue backbone and side chains.
3.3. Backbone and side chains exhibit specific mechanical hotspots
Graphical analysis of the bsENM quantitatively reveals the backbone and side-chain contributions in the mechanical coupling network. The high-strength tails in the eigenvalue distributions imply that their mechanical coupling patterns are more resistant to thermal noises during protein dynamics. The averaged mean-mode contents during protein dynamics, , for the eigenmodes of , , or are calculated following the description in 2.5. The eigenvectors that show a statistically prominent and over the RT trajectory are then identified in Fig. S3, and their pointed patterns inform the participating residues, Figs. S4–S6. For the 5 μs trajectory of RT bound with BPTI, the rigidity graph includes the inhibitor residues and the Ca2+ ion in RT is also treated as an additional residue. It is thus straightforward to adapt the bsENM-graph framework for studying complex protein systems. We focus on the modes of RT residues, and those of BPTI only are not presented.
The trypsin fold of RT containing NT and CT barrels (Fig. 5A) is a useful structural template for therapeutic design [54], [55]. The high-strength eigenvectors indeed have high values and Fig. 5B shows the mechanical wiring of the 18 prominent modes. The eigenvector components of prominent modes often pick up residue pairs with very strong hydrogen bonds, such as the oxyanion hole G193 coupling to BPTI in , yet more collective patterns (=2, 7, and 14) are also observed. Most of the prominent modes disperse in separate strand-rich regions, and a noticeable pattern is the strongest coupling locating at the cluster center with few nearby modes containing residues at secondary structure peripheries (edge residues of a strand or helix) as boldfaced in Fig. 5B. Out of the 34 hotspot residues in prominent modes, 20 are in secondary structures, 9 are at peripheries, and 5 are in loops, Fig. 6B. The catalytic triad S195, H57, and D102 are within 2-3 residues to 11, 4, and 7, respectively, and are considered at their peripheries. The backbone coupling of S195 with G43 as in links the NT-barrel and CT-barrel.
Fig. 6.
In RT, the prominent mechanical couplings during protein dynamics involving a side chain. The prominent modes in (A) and (B) rigidity graphs. The BPTI inhibitor is in brown ribbon. The RT residues in hydrogen bonds and salt bridges are in licorice and those in hydrophobic couplings are in ball-and-stick. The index, eigenvalue, and residues of prominent modes are listed. The prominent mode residues are hotspots in the mechanical coupling network, and the numbers within a secondary structure, interior, at a secondary structure periphery, periphery, or in a loop are reported.
Spectral analysis of illustrates a different network topology, Fig. 6A and Fig. S6. In the SS prominent modes, dual-residue patterns often appear at the interface between NT and CT barrels due to very strong hydrogen bonds or salt bridges, such as between H57 and D102. Polar and nonpolar side chains, though, do not mix in the same prominent modes, demonstrating mechanical coupling separation due to chemical differences. Mostly locating in strands, the eigenvectors populated by hydrophobic residues tend to involve more mechanically linked partners and can still emerge as prominent modes even the individual values may be lower. Comparing to , the prominent modes have higher percentages of periphery and loop residues, Fig. 6B. As for , the prominent modes are scattered hydrogen bonds primarily in loops, some at secondary structure peripheries, but very few within a secondary structure. In RT, the very strong couplings primarily occur in the CT barrel that contains the activation domain, Fig. 1A and Fig. 6B. Backbone and side-chain mechanical couplings thus exhibit specific patterns in the structure.
The above results of backbone and side chains having distinct mechanical coupling networks provide molecular basis for their separate adjustability as empirically adopted in protein engineering and design [9], [10], [11]. Whether backbone or side chains are more important in shaping the folding funnel is also an unresolved debate [3], [4], [5], [6]. Rather than lumping each residue as a single unit [12], [13], [14], [15], [16], [17], our strategy is explicit representation of backbone and side chains. With their rigidity graphs computed from protein dynamics, this framework provides a refined way for delineating the free-energy landscape around the structure. Next, whether the mechanical couplings would exhibit extended patterns for understanding protein allostery is addressed.
3.4. Emergence of extensive mechanical couplings
The RT rigidity graphs reveal that the prominent couplings between RT and BPTI lie in the hydrogen bonding modes , , and (Fig. 5 and Fig. 6). They are next to several prominent modes within the trypsin fold, including , , of the triad, and the C42-C58 disulfide bond at the S1’ site. With such spatial arrangement, the prominent modes and indeed come out as long-range mechanical couplings containing BPTI A16, S1’ site C42, the catalytic triad, oxyanion hole, A56, and S1 site S214 and T229, Fig. 6A. From the activation domain to active site, Fig. 5B, extensive prominent modes also emerge as , , and , Fig. 6B. A mystery of the serine protease family is that substrate variation or mutation at sites away from the triad still impact the of cleavage [56], [57], [58], [59], [60]. Our result is a first demonstration that under thermal noise, specific molecular interactions can integrate into significant mechanical signals across distal sites.
During the 5 μs all-atom trajectory of PDZ3, the mechanical hotspots are also captured as the significantly populated residues in the prominent modes, Figs. S12–S15. Similarly, the backbone and side-chain mechanical coupling networks of PDZ3 exhibit different patterns, Figs. S16–S17. Most BB prominent modes are in the -sandwich with certain extensive patterns like . The prominent BS modes of PDZ3, on the other hand, are more scattered and contain the extensive , , and that link the -sandwich and CT-extension. Most of the prominent SS modes in PDZ3 are hydrophobic and rather extensive, while the salt bridge at the -sandwich facing CT-extension is exceeding strong, Fig. S17. Similar to RT, the prominent mode residues of PDZ3 have a significantly higher percentage at secondary structure peripheries, such as the two residues of , than those of do. The mechanical hotspots of PDZ3 also very frequently occur at secondary structure peripheries (Fig. S17) rather than in loops as in the case of RT (Fig. 6B). While consistent overall patterns in mechanical coupling networks are observed, the two protein systems exhibit specific features in their prominent modes of rigidity graphs.
At the core of allosteric communication in proteins is the physical interactions that are persistent under thermal noise for connecting distal sites [4], [5], [6]. In attempting to capture such functionally important long-range couplings, many approaches are based on positional covariance [18], [19], [20], structural contacts [21], [22], [23], or sequence co-evolution [38], [39], [40], [41], [42]. However, a fundamental difficulty is that the observed signals do not necessarily correspond to molecular interactions. For example, in using low-frequency vibrational modes to study intra-protein communication, positional fluctuations of unconnected, distal residues can be highly correlated due to the structural topology [12], [13], [14]. From protein dynamics, our computational framework of identifying the prominent modes of rigidity graphs thus provides a way to capture the molecularly specific patterns that can survive the stochastic fluctuations. In both the RT and PDZ3 protein systems, extensive mechanical couplings composed of physical interactions are identified.
3.5. Residue rigidity scores in backbone and side chains during protein dynamics as metrics for biological functions
Being the strong mechanical couplings persistent through protein dynamics, the prominent modes of , , and likely have important implications in biological functions. They also represent the routes through which protein backbone and side chains are wired in the mechanical coupling network. Based on this mechanistic insight, we propose to quantify the biological importance of residues by deducing the residue rigidity scores in , , and . Since their eigenvectors exhibit pointed patterns, each residue is specifically populated in few modes. The residue rigidity score for in a particular rigidity graph thus comes from the characteristic eigenvector that the residue is most representative (see Eq. 1 discussed in 2.6). The residue rigidity score is then the mechanical strength weighted by the averaged content of mode during protein dynamics, . By putting together the results of , , and , the residue rigidity score in backbone, , is the largest mechanical contribution from the backbone of residue , and is that from its side chain. Therefore, if residue plays a significant role in a prominent mode in , , and/or , it would have high and/or . On the other hand, if a residue only appears in modes having low strength and/or averaged content, it would have low residue rigidity scores. The residue rigidity scores of RT backbone and side chains in the all-atom MD simulation are shown in Fig. 7; glycine residues listed in the bottom are provided with a minimal as they do have side chain and hence the corresponding score.
Fig. 7.
For RT, the residue rigidity score during protein dynamics in backbone (, x-axis) and in side chain (, y-axis) for residues indexed by . Since glycine residues do not have a side-chain score, they are provided with a minimal and hence locate at the bottom. Red circles denote mechanical hotspots: the significantly weighted residues in the prominent modes of rigidity graphs. Dashed lines are the hotspot boundary due to and prominent modes, and mechanical hotspots are mostly enclosed in the band except G69, H71, and S190 that contribute backbone to the prominent modes. Labelled
as “functional residue” are those listed in Table S2 that have experimentally verified function or ultra high conservation in MSA. Filled orange as “prominently coupled” are the mechanical hotspots of a single rigidity graph mode that contain least one functional residue. Filled yellow as “prominently coupled, next” are the hotspots next to a functional residue in sequence or non-hotspot functional residues next to a hotspot. A residue is labeled
or
for high co-evolution in SCA or DCA, respectively. Grey circles are non-hotspot residues.
In contributing to the mechanical coupling network, Fig. 7 illustrates that the protein amino acids have diverse residue rigidity scores in backbone and side chains. Mechanical hotspots are the significantly populated residues in the prominent mode of rigidity graphs as defined in 2.6, and the dashed lines in Fig. 7 are the boundaries due to and . The general importance of the trypsin fold in RT has led to a variety of functional characterization including mutagenesis at different sites. Combining the residues with experimentally verified function and ultra high conservation in MSA provides the functional residues of RT as summarized in Table S2 and marked on Fig. 7. It can be seen that the mechanical hotspots cover most of the functional residues as well as their prominently coupled associates in the rigidity graphs. Although glycine residues do not have a side chain are often flexible, few of them still emerge as mechanical hotspots through backbone such as the oxyanion hole G193. Even though several functional glycines are not in a prominent mode of the rigidity graphs, their signature in the networks is being sequence neighbors of mechanical hotspots, Fig. 7. While the rigidity scores are based on non-skeleton springs, sequence neighbors are prominently coupled through skeleton connections. Therefore, the ”prominently coupled, next” category that the aforementioned glycine residues reside also includes the mechanical hotspots that are next to functional residues. Moreover, many mechanical hotspots exhibit strong signals in SCA and/or DCA [39], [40]; Table S3 summarizes the residues having high co-evolution as marked in Fig. 7. For example, the residues in the aforementioned and prominent modes that form a spatially extensive set of mechanical couplings are all in a SCA sector, which also includes the mechanical hotspots in (Fig. 6 and Table S3).
From the 5 μs all-atom MD data of PDZ3, the mechanical hotspots covering most of its functional residues [38] is also observed in its - plot, Fig. S18. The mechanical hotspot F325 important for substrate recognition has been shown to co-evolve with another hotspot H372 at a distal site with A347 and L353 on the communication pathway, but the underlying physical interactions are unclear [61]. As a molecular mechanism for this SCA-based prediction, Fig. S17 shows that F325 is in a cluster of prominent hydrophobic modes () that together with mechanically link the residue with H372. One of the mechanical hotspots in these modes is I341, which has been proposed as an alternative route in a thermal-diffusion MD study [62]. Our rigidity graph analysis based on all-atom MD simulations thus offers a unified mechanistic picture for the various data on intra-PDZ3 communication.
Although most of the characterized residues are in the -sandwich [38] of PDZ3, several mechanical hotspots are found at the interface contacting CT-extension. For example, D357 coupling to Y392 at the interface in is the most conserved residues in the -sandwich [61]. It would thus be valuable to specifically examine the functional roles of such residues in inter-domain communication [63], [64]. Overall, the residue rigidity scores in backbone and side chains are very useful metrics for the functional importance of RT and PDZ3 sites. This establishment opens a new door for using molecular simulation to study the mechanistic basis of biological activities and evolutionary restraints.
4. Conclusions
Given a protein fold, residue contact numbers center around a value due to the packing density. But then, what is the manifestation of sequence specificities in the structure? This question is addressed here by developing a bsENM with structure-mechanics statistical learning to compute the elastic parameters from 5 μs all-atom MD simulation in explicit solvent. To analyze the network behaviors of the complicated molecular interactions in structural fluctuations, the newly devised graph-theoretic framework introduces the concept of protein rigidity graphs. A key discovery is that the chemical details during protein dynamics render scale-free network properties in the mechanical coupling strengths of both backbone and side chains. In the nano-scale network of a single protein, exhibition of small-world-like features has not been shown to the best of our knowledge. The significantly populated residues in the statistically prominent modes of rigidity graphs are thus recognized as mechanical hotspots.
Furthermore, our bsENM-graph approach enables the direct comparison of backbone and side-chain mechanical couplings to accentuate their differences. Such outcomes point to an important notion that protein residues have diverse combinations of backbone and side-chain contributions to the mechanical coupling network as seen in the - plot of RT (Fig. 7) and PDZ3 (Fig. S18). Encouragingly, functional residues of the two protein systems are largely mechanical hotspots themselves or next to one in sequence as for glycine. While most functional residues have high residue rigidity scores for their side chains, some also have prominent backbone couplings as in the top-right corner of Fig. 7 and Fig. S18. Only a specific set of sites having top residue rigidity scores in both the side chains and backbone indicates sophisticatedly tuned interaction network and has implications in shaping the folding funnel and in rendering proper conformational flexibilities for function. Another finding is that a significant portion of side-chain related mechanical hotspots locate at secondary peripheries (Fig. 6 and Fig. S17), i.e., the edges of foldons [65], and are potentially important factors in adopting the foldon inspired models [65], [66] for protein folding. Although the all-atom MD data depend on empirical force fields, specific conditions, and duration, the structure-mechanics statistical learning and graphical analysis schemes as well as the concepts therein can be readily applied to different cases. The mechanical hotspots and prominent rigidity graph modes identified in molecular simulation also bear similarities with the co-evolution in MSA, which also suffers from statistical noises. For RT, the physically contiguous residues in and also exhibit prominent co-evolution signals in MSA as a single sector, Fig. 6 and Table S3. For PDZ3, the multiple prominent modes involving F325 () primarily involve sector residues, Fig. S18. This work thus suggests a molecular mechanism for the coupled sequence variation due to evolutionary restraints, that it be associated with having prominent patterns in the protein mechanical coupling network.
CRediT authorship contribution statement
Nixon Raj: Conceptualization, Methodology, Software, Writing - original draft. Timothy Click: Methodology, Software. Haw Yang: Conceptualization, Writing - review & editing. Jhih-Wei Chu: Conceptualization, Methodology, Software, Writing - original draft, Writing - review & editing, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was supported by the Princeton University (to HY), the Ministry of Science and Technology of Taiwan (109-2113-M-009-023- and 110-2113-M-A49-024-), and the Ministry of Education of Taiwan through the IDS2B center and the ”Smart Platform of Dynamic Systems Biology for Therapeutic Development” project in The Featured Areas Research Center Program. The National Center for High-Performance Computing of Taiwan supported part of the computational resources.
Footnotes
Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.csbj.2021.09.004.
Supplementary data
The following are the Supplementary data to this article:
References
- 1.Dill K.A. Dominant forces in protein folding. Biochemistry. 1990;29(31):7133–7155. doi: 10.1021/bi00483a001. [DOI] [PubMed] [Google Scholar]
- 2.Onuchic J.N., Wolynes P.G. Theory of protein folding. Curr Opin Struct Biol. 2004;14(1):70–75. doi: 10.1016/j.sbi.2004.01.009. [DOI] [PubMed] [Google Scholar]
- 3.Rose G.D., Fleming P.J., Banavar J.R., Maritan A. A backbone-based theory of protein folding. Proc Natl Acad Sci USA. 2006;103(45):16623–16633. doi: 10.1073/pnas.0606843103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mittermaier A.K., Kay L.E. Observing biological dynamics at atomic resolution using NMR. Trends Biochem Sci. 2009;34(12):601–611. doi: 10.1016/j.tibs.2009.07.004. [DOI] [PubMed] [Google Scholar]
- 5.Nussinov R., Tsai C.-J. Allostery in disease and in drug discovery. Cell. 2013;153(2):293–305. doi: 10.1016/j.cell.2013.03.034. [DOI] [PubMed] [Google Scholar]
- 6.Lee A.L. Contrasting roles of dynamics in protein allostery: NMR and structural studies of CheY and the third PDZ domain from PSD-95. Biophys Rev. 2015;7(2):217–226. doi: 10.1007/s12551-015-0169-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Dawson N.L., Lewis T.E., Das S., Lees J.G., Lee D., Ashford P., Orengo C.A., Sillitoe I. CATH: an expanded resource to predict protein function through structure and sequence. Nucl Acids Res. 2017;45(D1):D289–D295. doi: 10.1093/nar/gkw1098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Chandonia J.-M., Fox N.K., Brenner S.E. SCOPe: classification of large macromolecular structures in the structural classification of proteins-extended database. Nucl Acids Res. 2019;47(D1):D475–81. doi: 10.1093/nar/gky1134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Siegel J.B., Zanghellini A., Lovick H.M., Kiss G., Lambert A.R., St Clair J.L. Computational design of an enzyme catalyst for a stereoselective bimolecular diels-alder reaction. Science. 2010;329(5989):309–313. doi: 10.1126/science.1190239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kazlauskas R. Engineering more stable proteins. Chem Soc Rev. 2018;47(24):9026–9045. doi: 10.1039/C8CS00014J. [DOI] [PubMed] [Google Scholar]
- 11.Nisthal A., Wang C.Y., Ary M.L., Mayo S.L. Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc Natl Acad Sci USA. 2019;116(33):16367–16377. doi: 10.1073/pnas.1903888116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Tirion. Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys Rev Lett 1996;77(9):1905–8. doi:10.1103/PhysRevLett.77.1905. [DOI] [PubMed]
- 13.Haliloğlu T., Bahar I., Erman B. Gaussian dynamics of folded proteins. Phys Rev Lett. 1997;79(16):3090–3093. doi: 10.1103/PhysRevLett.79.3090. [DOI] [Google Scholar]
- 14.Atilgan A.R., Durell S.R., Jernigan R.L., Demirel M.C., Keskin O., Bahar I. Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys J. 2001;80(1):505–515. doi: 10.1016/S0006-3495(01)76033-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Taketomi H., Ueda Y., Gō N. Studies on protein folding, unfolding and fluctuations by computer simulation. I. The effect of specific amino acid sequence represented by specific inter-unit interactions. Int J Pept Protein Res. 1975;7(6):445–459. doi: 10.1111/j.1399-3011.1975.tb02465.x. [DOI] [PubMed] [Google Scholar]
- 16.Gō N. Theoretical studies of protein folding. Annu Rev Biophys Bioeng. 1983;12:183–210. doi: 10.1146/annurev.bb.12.060183.001151. [DOI] [PubMed] [Google Scholar]
- 17.Takada S. Gō model revisited. Biophys Physicobiol. 2019;16:248–255. doi: 10.2142/biophysico.16.0_248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zheng W.J., Brooks B.R., Thirumalai D. Low-frequency normal modes that describe allosteric transitions in biological nanomachines are robust to sequence variations. Proc Natl Acad Sci USA. 2006;103(20):7664–7669. doi: 10.1073/pnas.0510426103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Li H., Chang Y.-Y., Lee J.Y., Bahar I., Yang L.-W. DynOmics: dynamics of structural proteome and beyond. Nucl Acids Res. 2017;45(W1):W374–W380. doi: 10.1093/nar/gkx385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Thirumalai D., Hyeon C., Zhuravlev P.I., Lorimer G.H. Symmetry, rigidity, and allosteric signaling: from monomeric proteins to molecular machines. Chem Rev. 2019;119(12):6788–6821. doi: 10.1021/acs.chemrev.8b00760. [DOI] [PubMed] [Google Scholar]
- 21.Brinda K.V., Vishveshwara S. A network representation of protein structures: implications for protein stability. Biophys J. 2005;89(6):4159–4170. doi: 10.1529/biophysj.105.064485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Di Paola L., De Ruvo M., Paci P., Santoni D., Giuliani A. Protein contact networks: an emerging paradigm in chemistry. Chem Rev. 2013;113(3):1598–1613. doi: 10.1021/cr3002356. [DOI] [PubMed] [Google Scholar]
- 23.Kayikci M., Venkatakrishnan A.J., Scott-Brown J., Ravarani C.N.J., Flock T., Babu M.M. Visualization and analysis of non-covalent contacts using the protein contacts atlas. Nat Struct Mol Biol. 2018;25(2):185–194. doi: 10.1038/s41594-017-0019-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kannan N., Vishveshwara S. Identification of side-chain clusters in protein structures by a graph spectral method. J Mol Biol. 1999;292(2):441–464. doi: 10.1006/jmbi.1999.3058. [DOI] [PubMed] [Google Scholar]
- 25.Pavlopoulos GA, Secrier M, Moschopoulos CN, Soldatos TG, Kossida S, Aerts J, et al. Using graph theory to analyze biological networks. BioData Min. 4 (UNSP 10). doi:10.1186/1756-0381-4-10. [DOI] [PMC free article] [PubMed]
- 26.Ma J. Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. Structure. 2005;13(3):373–380. doi: 10.1016/j.str.2005.02.002. [DOI] [PubMed] [Google Scholar]
- 27.Bahar I., Lezon T.R., Bakan A., Shrivastava I.H. Normal mode analysis of biomolecular structures: functional mechanisms of membrane proteins. Chem Rev. 2010;110(3):1463–1497. doi: 10.1021/cr900095e. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Vashisth H., Brooks I.C.L. Conformational sampling of maltose-transporter components in cartesian collective variables is governed by the low-frequency normal modes. J Phys Chem Lett. 2012;3(22):3379–3384. doi: 10.1021/jz301650q. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.López-Blanco J.R., Chacón P. New generation of elastic network models. Curr Opin Struct Biol. 2016;37:46–53. doi: 10.1016/j.sbi.2015.11.013. [DOI] [PubMed] [Google Scholar]
- 30.Chu J.-W., Voth G.A. Coarse-grained modeling of the actin filament derived from atomistic-scale simulations. Biophys J. 2006;90(5):1572–1582. doi: 10.1529/biophysj.105.073924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Click T, Raj N, Chu J-W. Calculation of enzyme fluctuograms from all-atom molecular dynamics simulation. In: Voth GA, editor. Computational approaches for studying enzyme mechanism Part B, vol. 578 of Meth. Enzymol. Academic Press; 2016. Ch. 14. p. 327–42. doi:10.1016/bs.mie.2016.05.024. [DOI] [PubMed]
- 32.Chen Y.-T., Yang H., Chu J.-W. Structure-mechanics statistical learning unravels the linkage between local rigidity and global flexibility in nucleic acids. Chem Sci. 2020;11(19):4969–4979. doi: 10.1039/D0SC00480D. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Pasternak A., Ringe D., Hedstrom L. Comparison of anionic and cationic trypsinogens: The anionic activation domain is more flexible in solution and differs in its mode of BPTI binding in the crystal structure. Protein Sci. 1999;8(1):253–258. doi: 10.1110/ps.8.1.253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Page M.J., Di Cera E. Serine peptidases: classification, structure and function. Cell Mol Life Sci. 2008;65(7–8):1220–1236. doi: 10.1007/s00018-008-7565-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Lee H-J, Zheng JJ. PDZ domains and their binding partners: structure, specificity, and modification. Cell Commun Signal 8(8). doi:10.1186/1478-811X-8-8. [DOI] [PMC free article] [PubMed]
- 36.Hung A.Y., Sheng M. PDZ domains: structural modules for protein complex assembly. J Biol Chem. 2002;277(8):5699–5702. doi: 10.1074/jbc.R100065200. [DOI] [PubMed] [Google Scholar]
- 37.Doyle D.A., Lee A., Lewis J., Kim E., Sheng M., MacKinnon R. Crystal structures of a complexed and peptide-free membrane protein-binding domain: molecular basis of peptide recognition by PDZ. Cell. 1996;85(7):1067–1076. doi: 10.1016/s0092-8674(00)81307-0. [DOI] [PubMed] [Google Scholar]
- 38.McLaughlin J, Richard N, Poelwijk FJ, Raman A, Gosal WS, Ranganathan R. The spatial architecture of protein function and adaptation. Nature 2012;491(7422):138–U163. doi:10.1038/nature11500. [DOI] [PMC free article] [PubMed]
- 39.Halabi N., Rivoire O., Leibler S., Ranganathan R. Protein sectors: evolutionary units of three-dimensional structure. Cell. 2009;138(4):774–786. doi: 10.1016/j.cell.2009.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Rivoire O., Reynolds K.A., Ranganathan R. Evolution-based functional decomposition of proteins. PLoS Comput Biol. 2016;12(6) doi: 10.1371/journal.pcbi.1004817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Morcos F., Pagnani A., Lunt B., Bertolino A., Marks D.S., Sander C., Zecchina R., Onuchic J.N., Hwa T., Weigt M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci USA. 2011;108(49):E1293–E1301. doi: 10.1073/pnas.1111471108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Cocco S, Monasson R, Weigt M. From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction. PLoS Comput Biol 9(8). doi:10.1371/journal.pcbi.1003176. [DOI] [PMC free article] [PubMed]
- 43.Best R.B., Zhu X., Shim J., Lopes P.E.M., Mittal J., Feig M. Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone phi, psi and side-chain chi1 and chi2 dihedral angles. J Chem Theory Comput. 2012;8(9):3257–3273. doi: 10.1021/ct300400x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Abraham M.J., Murtola T., Schulz R., Páll S., Smith J.C., Hess B., Lindahl E. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1–2:19–25. doi: 10.1016/j.softx.2015.06.001. [DOI] [Google Scholar]
- 45.Albert R., Jeong H., Barabási A.-L. Diameter of the World-Wide Web. Nature. 1999;401(6749):130–131. doi: 10.1038/43601. [DOI] [Google Scholar]
- 46.Strogatz S.H. Exploring complex networks. Nature. 2001;410(6825):268–276. doi: 10.1038/35065725. [DOI] [PubMed] [Google Scholar]
- 47.Jeong H., Mason S.P., Barabási A.L., Oltvai Z.N. Lethality and centrality in protein networks. Nature. 2001;411(6833):41–42. doi: 10.1038/35075138. [DOI] [PubMed] [Google Scholar]
- 48.Albert R., Barabási A.-L. Statistical mechanics of complex networks. Rev Mod Phys. 2002;74(1):47. doi: 10.1103/RevModPhys.74.47. [DOI] [Google Scholar]
- 49.Caldarelli G., Capocci A., De Los Rios P., Muñoz M.A. Scale-free networks from varying vertex intrinsic fitness. Phys Rev Lett. 2002;89(25) doi: 10.1103/PhysRevLett.89.258702. [DOI] [PubMed] [Google Scholar]
- 50.Cimini G., Squartini T., Saracco F., Garlaschelli D., Gabrielli A., Caldarelli G. The statistical physics of real-world networks. Nat Rev Phys. 2019;1(1):58–71. doi: 10.1038/s42254-018-0002-6. [DOI] [Google Scholar]
- 51.Lomax K.S. Business failures: another example of the analysis of failure data. J Am Stat Assoc. 1954;49:847–852. doi: 10.2307/2281544. [DOI] [Google Scholar]
- 52.Thyagarajan B, Bloom JD. The inherent mutational tolerance and antigenic evolvability of influenza hemagglutinin. eLife 3. [DOI] [PMC free article] [PubMed]
- 53.Fuller Z.L., Berg J.J., Mostafavi H., Sella G., Przeworski M. Measuring intolerance to mutation in human genetics. Nat Genet. 2019;51(5):772–776. doi: 10.1038/s41588-019-0383-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Craik C.S., Page M.J., Madison E.L. Proteases as therapeutics. Biochem J. 2011;435:1–16. doi: 10.1042/BJ20100965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Batt A.R., St Germain C.P., Gokey T., Guliaev A.B., Baird T., Jr Engineering trypsin for inhibitor resistance. Protein Sci. 2015;24(9):1463–1474. doi: 10.1002/pro.2732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Sprang S.R., Fletterick R.J., Gráf L., Rutter W.J., Craik C.S. Studies of specificity and catalysis in trypsin by structural analysis of site-directed mutants. Crit Rev Biotechnol. 1988;8(3):225–236. doi: 10.3109/07388558809147559. [DOI] [PubMed] [Google Scholar]
- 57.Evnin L.B., Vásquez J.R., Craik C.S. Substrate specificity of trypsin investigated by using a genetic selection. Proc Natl Acad Sci USA. 1990;87(17):6659–6663. doi: 10.1073/pnas.87.17.6659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Vindigni A., Di Cera E. Role of P225 and the C136–C201 disulfide bond in tissue plasminogen activator. Protein Sci. 1998;7(8):1728–1737. doi: 10.1002/pro.5560070807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Krem M.M., Prasad S., Di Cera E. Ser214 is crucial for substrate binding to serine proteases. J Biol Chem. 2002;277(43):40260–40264. doi: 10.1074/jbc.M206173200. [DOI] [PubMed] [Google Scholar]
- 60.Hedstrom L. Serine protease mechanism and specificity. Chem Rev. 2002;102(12):4501–4523. doi: 10.1021/cr000033x. [DOI] [PubMed] [Google Scholar]
- 61.Lockless S.W., Ranganathan R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science. 1999;286(5438):295–299. doi: 10.1126/science.286.5438.295. [DOI] [PubMed] [Google Scholar]
- 62.Ota N., Agard D.A. Intramolecular signaling pathways revealed by modeling anisotropic thermal diffusion. J Mol Biol. 2005;351(2):345–354. doi: 10.1016/j.jmb.2005.05.043. [DOI] [PubMed] [Google Scholar]
- 63.Zhang J., Petit C.M., King D.S., Lee A.L. Phosphorylation of a PDZ domain extension modulates binding affinity and interdomain interactions in postsynaptic density-95 (PSD-95) protein, a membrane-associated guanylate kinase (MAGUK) J Biol Chem. 2011;286(48):41776–41785. doi: 10.1074/jbc.M111.272583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Zhang J., Lewis S.M., Kuhlman B., Lee A.L. Supertertiary structure of the MAGUK core from PSD-95. Structure. 2013;21(3):402–413. doi: 10.1016/j.str.2012.12.014. [DOI] [PubMed] [Google Scholar]
- 65.Englander S.W., Mayne L., Krishna M.M.G. Protein folding and misfolding: mechanism and principles. Quart Rev Biophys. 2007;40(4):287–326. doi: 10.1017/S0033583508004654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Rollins G.C., Dill K.A. General mechanism of two-state protein folding kinetics. J Am Chem Soc. 2014;136:11420–11427. doi: 10.1021/ja5049434. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.








