Abstract
Two-component signaling (TCS) is the primary means by which bacteria sense and respond to the environment. TCS involves two partner proteins working in tandem, which interact to perform cellular functions whereas limiting interactions with non-partners (i.e., cross-talk). We construct a Potts model for TCS that can quantitatively predict how mutating amino acid identities affect the interaction between TCS partners and non-partners. The parameters of this model are inferred directly from protein sequence data. This approach drastically reduces the computational complexity of exploring the sequence-space of TCS proteins. As a stringent test, we compare its predictions to a recent comprehensive mutational study, which characterized the functionality of 204 mutational variants of the PhoQ kinase in Escherichia coli. We find that our best predictions accurately reproduce the amino acid combinations found in experiment, which enable functional signaling with its partner PhoP. These predictions demonstrate the evolutionary pressure to preserve the interaction between TCS partners as well as prevent unwanted cross-talk. Further, we calculate the mutational change in the binding affinity between PhoQ and PhoP, providing an estimate to the amount of destabilization needed to disrupt TCS.
Keywords: statistical inference, mutational phenotypes, interaction specificity, epistasis, fitness landscape, bacterial signaling.
Introduction
Early theoretical work on protein folding postulated that proteins have evolved to be minimally frustrated (Bryngelson and Wolynes 1987; Bryngelson et al. 1995; Onuchic et al. 1997), i.e., evolved to have favorable residue-residue interactions that facilitate folding into the native state whereas having minimal non-native energetic traps. The principle of minimal frustration provides intuition as to why protein sequences are not random strings of amino acids. The evolutionary constraint to fold into a particular, stable three-dimensional structure whereas minimizing the number of frustrated interactions greatly restricts the sequence-space of a protein (Leopold et al. 1992; Bryngelson et al. 1995; Onuchic et al. 1997). Satisfaction of these constraints results in correlated amino acid identities within the sequences of a protein family. These correlated identities occur between different positions in a protein such as, for example, native contacts (Gobel et al. 1994; Neher 1994; Shindyalov et al. 1994). We refer to these quantifiable amino acid correlations as coevolution.
Of course, coevolution does not only arise from the constraint to fold. Proteins also fulfill cellular functions, which act as additional constraints on the sequences of proteins (Ferreiro et al. 2014; Sikosek and Chan 2014; Wolynes 2015). In the context of signal transduction, proteins have evolved to be able to preferentially bind to a signaling partner(s) as well as catalyze the chemical reactions associated with signal transfer. An important example is two-component signaling (TCS) (Hoch 2000; Stock et al. 2000; Laub and Goulian 2007; Casino et al. 2010; Szurmant and Hoch 2010; Capra and Laub 2012), which serves as the primary means for bacteria to sense the environment and carry out appropriate responses. TCS consists of two partner proteins working in tandem: a histidine kinase (HK) and a response regulator (RR). Upon the detection of stimulus by an extracellular sensory domain, the HK generates a signal via autophosphorylation. Its RR partner can then transiently bind to the HK and receive the signal (i.e., phosphoryl group), thereby activating its function as a transcription factor. The HK has also evolved to catalyze the reverse signal transfer reaction (i.e., phosphatase activity), acting as a sensitive switch to turn off signal transduction. To prevent signal transfer with the wrong partner (i.e., cross-talk), TCS partners have mutually evolved amino acids at their respective binding surfaces that confer interaction specificity (Laub and Goulian 2007; Szurmant and Hoch 2010; Capra and Laub 2012). Thus, the collection of protein sequences of TCS partners contains quantifiable coevolution between the HK and RR sequences.
In principle, the determinants of interaction specificity for TCS can be quantified by a probabilistic model of sequence selection (Cheng et al. 2014). Assuming that nature has sufficiently sampled the sequence-space of TCS proteins, the collection of protein sequences of TCS partners can be viewed as being selected under quasi-equilibrium from a Boltzmann distribution:
(1) |
where is the concatenated amino acid sequence of a HK and RR protein, P is the probability of selecting , and −H is proportional to the additive fitness landscape that governs the evolutionary sequence selection for TCS partners. Specifically, equation (1) was previously derived using simple models of evolutionary biology (Sella and Hirsh 2005), where , is the population size of the genotypes and is the additive fitness (i.e., the negative log of the fitness); See “Materials and Methods” section for more details. H is referred to in our work as a coevolutionary landscape.
Recently, maximum entropy-based approaches referred to as Direct Coupling Analysis (DCA) (Weigt et al. 2009; Morcos et al. 2011; Ekeberg et al. 2013) have been successfully applied to infer the parameters of H (a Potts model) that governs the empirical amino acid sequence statistics. This has allowed for the direct quantification of the coevolution in protein sequence data [See Review: (de Juan et al. 2013)]. Early work using DCA to study TCS primarily focused on identifying the key coevolving residues between the HK and RR (Weigt et al. 2009). Highly coevolving residue pairs have been used as docking constraints in a molecular dynamics simulation to predict the HK/RR signaling complex (Schug et al. 2009), the autophosphorylation structure of a HK (Dago et al. 2012), and the homodimeric form (transcription factor) of the RR (dos Santos et al. 2015). DCA has also been applied to quantify the determinants of interaction specificity between TCS proteins (Procaccini et al. 2011; Cheng et al. 2014), building on earlier coevolutionary approaches (Li et al. 2003; Burger and van Nimwegen 2008). In particular, DCA was used to predict the effect of point mutations on TCS phosphotransfer in vitro as well as demonstrate the reduced specificity between HK and RR domains in hybrid TCS proteins (Cheng et al. 2014).
The experimental effort to determine the molecular origin of interaction specificity in TCS proteins [See Reviews: (Laub and Goulian 2007; Casino et al. 2010; Szurmant and Hoch 2010; Podgornaia and Laub 2013)] precedes the recent computational efforts. Full knowledge of the binding interface between HK and RR was made possible through X-ray crystallography (Casino et al. 2009). Scanning mutagenesis studies (Tzeng and Hoch 1997; Qin et al. 2003; Capra et al. 2010) provided insight on the subset of important interfacial residues that determine specificity. These key residues were mutated to enable a TCS protein to preferentially interact with a non-partner in vitro (Skerker et al. 2008; Capra et al. 2010). However, the extent of possible amino acid identities that allow TCS partners to preferentially interact in vivo has remained elusive until recent comprehensive work by Podgornaia and Laub (2015). Their work focused on the PhoQ/PhoP TCS partners in E. coli, which control the response to low magnesium stress. PhoQ (HK) phosphorylates and dephosphorylates PhoP (RR) under low and high magnesium concentrations, respectively. Using exhaustive mutagenesis of four residues of PhoQ (204 = 160,000 mutational variants) at positions that form the binding interface with PhoP, Podgornaia and Laub (2015) were able to characterize all mutants based on their functionality in E. coli. It was found that roughly 1% of all PhoQ mutants were functional, enabling E. coli to exhibit comparable responses to magnesium concentrations as the wild type PhoQ. This finding uncovered a broad degeneracy in the sequence-space of the HK protein that still maintained signal transfer efficiency as well as interaction specificity with its partner.
We ask whether amino acid coevolution inferred using DCA could capture the functional mutational variants observed in the comprehensive mutational study of PhoQ and if so, to what extent? Capturing this functionality requires that information gleaned from coevolution is sufficient to estimate the effect of mutations to PhoQ on its interactions with PhoP as well as on unwanted cross-talk. Hence, our question is important to determine if coevolutionary methods can be extended from studying two interacting proteins to studying an interaction network (e.g., systems biology). Further, this question is of particular interest to those who want to engineer novel mutations in TCS proteins that can maintain or encode the interaction specificity of a TCS protein to its partner or a non-partner, respectively.
To answer this question, we first infer a Potts model, H (eq. 1), which forms the basis for quantifying how mutations affect the interaction between a HK and RR protein. Focusing on the parameters of H that are related to interprotein coevolution, we construct a coevolutionary landscape to quantify TCS interactions, , for a given sequence of an HK and RR protein. serves as a proxy for signal transfer efficiency, allowing us to quantify the effect on fitness of the interaction between any HK and RR protein. Further, we can assess how mutations affect fitness due to changes in the HK/RR interaction by computing the mutational change in between the mutant sequence, , and the wild type sequence, :
(2) |
Considering the concatenated sequence of PhoQ and PhoP, we compute equation (2) for the 204 PhoQ mutational variants. We find that mutants with the most favorable (e.g., most negative) were classified as functional HKs by Podgornaia and Laub (2015)—i.e., true positive predictions. Next, we focus on mutations predicted to be favorable by equation (2) that were classified as non-functional in experiment. Expanding our analysis of the PhoQ mutants beyond its interaction with PhoP, we consider how mutations affect the signal transfer efficiency, , between PhoQ and all of the RR proteins in E. coli. We find that many of these non-functional mutants exhibit cross-talk interactions according to our model, accounting for their non-functionality. If we exclude these promiscuous variants, we can better isolate the true positive predictions that are functional from false positives that are non-functional. Our predictions also capture context-dependent mutational affects that were observed in experiment, i.e., epistasis. Finally, we estimate the mutational change in binding affinity in the PhoQ/PhoP bound complex using the Zone Equilibration of Mutants (ZEMU) method (Dourado and Flores 2014), a combined physics- and knowledge-based approach for free energy calculations. Consistent with what we would expect, we find that mutations that destabilize the HK/RR interaction tend to be non-functional with very high statistical significance. Non-functional mutants are on average destabilized by ∼2 kcal/mol with respect to functional mutants.
The work described herein demonstrates that a coevolutionary model (i.e., additive fitness landscape) built from sequence data can directly connect molecular details at the residue-level to mutational phenotypes in bacteria. This has broad applications in systems biology, but also in synthetic biology since our computational framework can be used to select mutations that enhance or suppress interactions between TCS proteins. A more detailed description of our computational approaches can be found in the “Materials and Methods” section.
Results
Mutational Change in Coevolutionary Landscape, , for PhoQ/PhoP Interaction
We first focus on the parameters of the inferred Potts model (eq. 3) that describe the coevolution between the Dimerization and Histidine phosphotransfer domain (DHp) and the Receiver (REC) domain (fig. 1A), which form the HK/RR binding interface (fig. 1B). The interprotein statistical couplings (fig. 1C) of equation (3) are used to construct a coevolutionary landscape, (eq. 4), as a proxy for signal transfer efficiency. For each of the 1,659 functional and 158,341 non-functional PhoQ-mutational variants identified by Podgornaia and Laub (2015), we compute the mutational change, , between PhoQ and PhoP. As an initial step, we only consider the PhoQ/PhoP sequence, i.e., we do not yet consider other RR proteins than PhoP. A histogram of is generated for all mutational variants (fig. 2A). The distribution of the functional mutants tends more towards favorable than the distribution of non-functional mutants, but more interestingly, the most favorable predictions of our model contain mostly functional mutations. This is made clear by a plot of the Positive Predictive Value (PPV) for the top N mutational variants ranked by (fig. 2B) from most favorable to most deleterious. The top 25 mutational variants ranked by contain 20 functional mutants and 5 non-functional mutants (i.e., PPV = 0.8).
System-Level Analysis Using : Functional Mutants Limit Cross-Talk
Mutations that may enhance signal transfer efficiency between PhoQ and PhoP in vitro may still result in a non-functional PhoQ/PhoP system in vivo. This would occur if the mutations to PhoQ sufficiently encoded it to preferentially interact with another RR in E. coli. For this reason, we focused our computational analysis on the subset of mutational variants that preserve PhoQ/PhoP specificity by limiting cross-talk according to our coevolutionary model.
We first calculate the proxy for signal transfer efficiency, , between the wild type PhoQ sequence and all of the non-hybrid RR proteins in E. coli (fig. 3A). We find that for wild type PhoQ, the most favorable (most negative) is with its known signaling partner, PhoP. As a consistency check, we also plot for different combinations of the cognate partner TCS proteins in E. coli (supplementary fig. S1, Supplementary Material online). This result is consistent with previous computational predictions that used information-based quantities (Procaccini et al. 2011; Cheng et al. 2014) to quantify interaction specificity.
Extending upon figure 3A, we assess cross-talk in our model by calculating between each PhoQ mutant and all of the non-hybrid RR in E. coli. We exclude all mutant-PhoQ variants that have a more favorable with a non-partner RR. These excluded mutants are excellent candidates for engineering specificity in E. coli. Applying our exclusion criterion, we find that only 181 functional and 1,532 non-functional variants remain, i.e., 89% and 99% of the functional and non-functional variants, respectively, were removed. A histogram of the remaining (cross-talk excluded) mutants as a function of (fig. 3B) shows that a filter based on interaction specificity is better able to isolate the true positive (functional) variants. Notably, the first 17 ranked variants are all functional. Once again, ranking the filtered variants by from the most favorable to the least favorable, we can plot the PPV (fig. 3C) for the top N ranked variants. We find that the cross-talk excluded PPV tends to lie above the original PPV from figure 2B.
Mutational Context-Dependence in the Coevolutionary Landscape
A significant finding of the Podgornaia and Laub study was the context-dependent nature of the many of the mutations. For example, individual mutations that may result in a non-functional phenotype may result in a functional phenotype when combined. It is well known that mutations may exhibit such a context dependence, or epistasis, i.e., mutations introduced together have an effect on fitness that is not simply the combined effect of each mutation alone. Such effects would restrict the connectivity between functional mutations and act as constraints on TCS evolution (Podgornaia and Laub 2015).
We find that the functional predictions in figure 3 tend to be 3- and 4-point mutations, highlighting their non-trivial nature (fig. 4A). Whereas the effect of mutating a HK protein when its partner RR is mutated (and vice versa) is naturally captured in our model through interprotein statistical couplings (fig. 1C), the intraprotein couplings that capture the context-dependence of HK only or RR only mutations are not explicitly contained within equation 4. Even with these limitations, the model is still able to distinguish between functional multi-point mutations that are composed of non-functional single-point mutations for the most favorable predictions, in accordance with experiment (Podgornaia and Laub 2015). One example is provided in figure 4B for the mutation of WT PhoQ from AVST at residues 284, 285, 288 and 289, respectively, to SVGY (i.e., a 3-point mutation). For single point mutations from AVST, correctly predicts that SVST and AVSY are functional whereas AVGT is non-functional. For two point mutations, correctly predicts that SVSY is functional whereas SVGT is non-functional. More interestingly, finds that when the non-functional 2-point mutation SVGT is combined with a Tyrosine mutation to position 289 (i.e., SVGY), the functionality is recovered. It should be noted that the two-point mutation AVGY is predicted to have a favorable between PhoQ and PhoP, but is predicted to undergo cross-talk, and thus, is discarded despite being functional in experiment.
We next consider a model where epistasis between the 4-mutational sites explored by Podgornaia and Laub are explicitly introduced into equation (4) (See eq. 5 in the “Materials and Methods” section). The epistatic effects of multiple HK mutations are captured by the statistical couplings of the Potts model that describe the pairwise coevolution between the HK mutational sites. This is illustrated schematically in fig. 4C. We find that such a model, (eq. 5), is consistent with the original model, , in terms of its predictions and predictive quality (fig. 4D). In particular, the prediction of functional phenotypes for the example in figure 4B yields identical results. Nevertheless, this model provides additional epistatic mutational effects that are not simply the added sum of the individual mutational effects.
Finally, we examine a model that only considers HK coevolution in the SI Main Text. This model naturally includes the epistasis between the 4-mutational sites explored in experiment (Podgornaia and Laub 2015). From this model, we find that coevolution between HK residues alone is insufficient for capturing the functional phenotypes observed in experiment.
Mutational Change in the Binding Affinity Using a Combined Physics- and Knowledge-Based Approach
We used ZEMu (See “Materials and Methods” section) to compute the mutation-induced change in the binding affinity, , between PhoQ and PhoP. The calculation converged for 42,985 mutants (702 functional and 42,283 non-functional) from a randomly selected subset of the 204 variants. We first examined a scatter plot of vs. (supplementary fig. S6, Supplementary Material online) to observe whether a functional relationship can be deduced from the computational data alone. The mutational change in the coevolutionary landscape, , would in principle exhibit a non-linear dependence on , whereas also depending on many other quantities including the binding affinity with other proteins. We find no obvious relationship between the two quantities in our current analysis.
A histogram of is plotted for the 42,985 mutants in figure 5. A histogram of for the same subset of mutants is shown in supplementary figure S7, Supplementary Material online. On the population level, functional mutations exhibit a mean of 1.76 ± 0.06 kcal/mol lower than that of the non-functional mutants, with a Wilcoxon rank-sum test P-value < . This indicates that destabilizing mutations of ∼2 kcal/mol are sufficient for disrupting TCS. Furthermore, destabilizing mutations that are more than two standard deviations greater than the mean for functional variants are significantly less likely to be functional, with a P-value < 10−6 computed from a cumulative binomial distribution (based on the 6157 mutants above this threshold, 19 of which are functional).
We next examine the potentially deleterious effect of mutations that overly stabilize the binding affinity between PhoQ and PhoP. Although we find that all 56 mutants with are non-functional (fig. 5), this has no statistical significance (P-value ∼ 0.4). Figure 5 illustrates our point that the functional mutations tend to have a neutral affect on , whereas high is strongly associated with the loss of function. Although very low may visually appear to be enriched with non-functional mutants, this is based on a small number of mutants and does not have statistical significance.
Discussion
Treating a large collection of amino acid sequence data for TCS partner proteins as independent samples from a Boltzmann equilibrium distribution, we infer a coevolutionary landscape, . Specifically, is proportional to the negative of the additive fitness landscape, which captures the coevolving amino acid combinations that give rise to interaction specificity in TCS systems. In the past, we were able to predict how a point mutation to a TCS protein affects its ability to transfer signal to its partner in vitro (Cheng et al. 2014). Our present work shifts the paradigm of coevolution-based analysis towards systems biology by extending our analysis to include how those mutations affect cross-talk in a bacterial organism. We demonstrate that our most favorable predictions for multiple site mutations can accurately capture in vivo TCS functionality, consistent with the comprehensive mutational study of Podgornaia and Laub (2015). This is not a trivial computational task, since inferring coevolutionary information from sequence data is highly underdetermined and estimates ∼107 parameters (for eq. 3) from ∼103 sequences of TCS partners. Adding to the problem complexity, it is plausible that the full functional sequence space has not yet been explored by evolutionary process (Capra and Laub 2012; Echave, et al. 2016). Despite this, the coevolutionary landscape is predictive and identifies mutational variants that are not found in nature, e.g., none of the mutational sequences are included as input data in our model. We have demonstrated the feasibility of generating predictions using coevolution and the predictive power of such an approach will only systematically improve as more sequences of TCS partners are collected.
Similar predictions to those discussed herein can readily be used to engineer novel protein–protein interactions in TCS systems. Such a strategy would potentially complement already existing strategies to match novel inputs with outputs via modular engineering (Tabor et al. 2011; Whitaker et al. 2012; Ganesh et al. 2013; Schmid et al. 2014; Hansen and Benenson 2016). The strength of our coevolutionary approach is that it makes possible an efficient search of sequence-space for mutations at arbitrary positions in either the HK or RR that desirably enhance or suppress its interaction with a RR or HK, respectively. It can also readily be applied to study the in vivo, system-level effect of mutating a TCS protein on insulating its interaction with a desired partner or enabling cross-talk with non-partners. Our study highlights an intuitive but key principle for selecting mutations to a TCS protein that encodes specificity in vivo: mutations must be selected to enhance protein–protein interactions with a desired partner whereas limiting protein–protein interactions with undesired partners. Although intuitive, we also demonstrate that mutations that significantly destabilize the binding affinity result in the loss of signaling. Further, we estimate that destabilization of ∼2 kcal/mol in the binding affinity between TCS partners is sufficient to disrupt TCS.
It is also important to note that coevolutionary methods described here for identifying mutational phenotypes (e.g., response to magnesium stress) are not particular to TCS systems. This framework is transferable to other systems where molecular interactions coevolve to preserve function, opening the window to a large set of open problems in molecular and systems biology. Our results further extend the idea that a combination of coevolutionary based methods, molecular modeling and experiment can be used to identify the proper amino acids sites and identities that can be used to identify mutational phenotypes. Our study highlights the important role of coevolution in maintaining protein–protein interactions, as in the case of bacteria signal transduction. Statistical methods that probe coevolution not only allow us to connect molecular, residue-level details to mutational phenotypes, but also to explore the evolutionary selection mechanisms that are employed by nature to maintain interaction specificity, e.g., negative selection (Zarrinpar et al. 2003). Further investigations of other systems that are evolutionarily constrained to maintain protein–protein interactions could elucidate the extent at which our methods can be used in alternative systems. One potential example is the toxin-antitoxin protein pairs in bacteria, which was the focus of recent experimental work (Aakre et al. 2015) elucidating the determinants of interaction specificity.
Materials and Methods
Sequence Database for HK and RR Inter-Protein Interactions: DHp and REC
We obtain multiple sequence alignments (MSA) from Pfam (Finn et al. 2014) (Version 28), focusing on the DHp (PF00512) and REC (PF00072) domains of the HK and RR, respectively (fig. 1A). The first four positions (columns) of PF00512 were removed due to poor alignment of the PhoQ sequence at those positions. The remaining DHp MSA has a length of . Each REC MSA had a default length of . Here, we considered HK proteins that have the same domain architecture as the PhoQ kinase from E. coli, i.e., DHp domain sandwiched between an N-terminal HAMP domain (PF00672) and a C-terminal ATPase domain (PF02518). The remaining HK (DHp) sequences were paired with a TCS partner RR (REC) by taking advantage of the observation that TCS partners are typically encoded adjacent to one another under the same operon (Skerker et al. 2005; Yamamoto et al. 2005), i.e., ordered locus numbers differ by 1. Further, we exclude all TCS pairs that are encoded adjacent to multiple HKs or RRs. Each DHp and REC sequence that was paired in this fashion was concatenated into a sequence (fig. 1A), of total length L where is the amino acid at position i which is indexed from 1 to for the 20 amino acids and MSA gap. The DHp sequence is indexed from positions 1 to and REC sequence from positions to the total length of . Our remaining dataset consisted of 6,519 non-redundant concatenated sequences.
Inference of Parameters of Coevolutionary Model
An amino acid sequence for a protein or interacting proteins can be viewed as being selected from a Boltzmann equilibrium distribution, i.e., . The Boltzmann form of P was previously derived for an evolving population in the limit where the product of the population size and mutation rate is very small (Sella and Hirsh 2005). Specifically, it was shown (Sella and Hirsh 2005) that , where is the population size and is the additive fitness landscape (i.e., log of the fitness). A high population size suggests many viable sequences that a protein can mutate to, which makes the population more robust to deleterious mutations. Related work (Halpern and Bruno 1998) modeled site-specific selection of sequences, which has been extended by numerous works (Tamuri et al. 2012; Spielman and Wilke 2015; Bloom 2016).
Under certain limiting conditions, appears to share a correspondence with the energetics of protein folding [See review: (Sikosek and Chan 2014)]. Assuming that the sequence diversity is completely due to stability considerations, where is the energy of the folded protein with respect to the unfolded state and is the inverse of the evolutionary selection temperature from protein folding theory (Pande et al. 1997, 2000; Morcos et al. 2014). Several studies have reported strong linear correlation between mutational changes in with mutational changes in protein stability (Lui and Tiana 2013; Morcos et al. 2014; Contini and Tiana 2015). However, may not be an appropriate approximation for proteins that have evolved with interacting partners, for which sequence selection is plausibly influenced by additional factors such as binding affinities as well as binding/unbinding rates.
Often it is of interest to solve the inverse problem of inferring an appropriate when provided with an abundant number of protein sequences. Typical approaches to this problem have applied the principle of maximum entropy to infer a least biased model that is consistent with the input sequence data (Weigt et al. 2009; Morcos et al. 2011), e.g., the empirical single-site and pairwise amino acid probabilities, and , respectively. The solution of which is the Potts model:
(3) |
where is the amino acid at position i for a sequence in the MSA, is the pairwise statistical couplings between positions i and j in the MSA with amino acids and , respectively, and is the local field for position i . We estimate the parameters of the Potts model, , using the pseudo-likelihood maximization Direct Coupling Analysis (plmDCA) [See Ref: (Ekeberg et al. 2013) for full computational details]. It is important to note that the mutational context-dependence (epistasis) between residues i and j is naturally captured in the model through the statistical couplings, .
Previous studies have applied DCA to a number of problems in structural biology. DCA has been used to identify highly coevolving pairs of residues to predict the native state conformation of a protein (Marks et al. 2012; Sulkowska et al. 2012; Sutto et al. 2015), including repeat proteins (Espada et al. 2015), as well as identify additional functionally relevant conformational states (Morcos et al. 2013; Malinverni et al. 2015; Sutto et al. 2015) and multi-meric states (Schug et al. 2009; Weigt et al. 2009; Morcos et al. 2011; dos Santos et al. 2015; Malinverni et al. 2015). Structural and coevolutionary information share complementary roles in the molecular simulations of proteins [See review: (Noel et al. 2016)]. The Potts model (eq. 3) obtained from DCA has been related to the theory of evolutionary sequence selection (Morcos et al. 2014) as well as mutational changes in protein stability (Lui and Tiana 2013; Morcos et al. 2014; Contini and Tiana 2015). Additional work has applied DCA to protein folding to predict the effect of point mutations on the folding rate (Mallik et al. 2016) as well as construct a statistical potential for native contacts in a structure-based model of a protein (Cheng et al. 2016) to better capture the transition state ensemble.
DCA and inference methods have also been applied to study problems in systems biology, such as the identification of relevant protein-protein interactions in biological interaction networks (Procaccini et al. 2011; Feinauer et al. 2016). Recently, a number of studies have focused on inferring quantitative landscapes that capture the effects of mutations on biological phenotypes (Ferguson et al. 2013; Cheng et al. 2014; Figliuzzi et al. 2016) by constructing models from sequence data (eq. 3). Two separate studies, which focused on antibiotic drug resistance in E. coli (Figliuzzi et al. 2016) and viral fitness of HIV-1 proteins (Ferguson et al. 2013), respectively, inferred a Potts model (i.e., additive fitness landscape), H, and calculated mutational changes as ΔH. This approach is analogous to the approach adopted in this study. Likewise, a study examining the mutational effects on TCS phosphotransfer (Cheng et al. 2014) constructed a mutational landscape from an information-based quantity. All of these approaches capture epistatic effects and rely on the accuracy of the inferred Potts model (eq. 3) from sequence data.
Mutational Changes in Coevolutionary Landscape
For the concatenated sequences of HK (DHp) and RR (REC) (fig. 1A), we infer a Potts model (eq. 3). We focus on a subset of parameters in our model consisting of the interprotein couplings, , between positions in the DHp and REC domains (fig. 1C) that are in close proximity in a representative structure of the TCS complex (fig. 1B). All local fields terms, , are included to partially capture the fitness effects that give rise to the amino acid composition observed at each site. These considerations allow us to construct coevolutionary landscape for TCS, which is a negative, additive fitness landscape:
(4) |
where is the concatenated sequence of the DHp and REC domains, the double summation is taken over all interprotein statistical couplings between the DHp and REC domains, is a Heaviside step function, c is the a cutoff distance of 16 Å which was determined in a previous study (Morcos et al. 2014), and is the minimum distance between residues i and j in the representative structure. Mutational changes in equation (4) are then computed as . Hence, mutational changes in the signal transfer efficiency are approximated from mutational changes in the additive fitness.
An additional model is considered to analyze the epistatic effects of the 4-point mutations explored in experiment (Podgornaia and Laub 2015). Whereas equation (4) naturally captures the epistatic effect of mutating a residue in the HK when the RR has also been mutated (or vice versa), it does not explicitly contain the statistical couplings that capture the epistatic effects of HK only mutations. Hence, the statistical couplings between the 4-mutational sites of the HK are explicitly added to equation (4):
(5a) |
(5b) |
In equation (5a), the summation contains the six statistical couplings between the 4-mutational sites explored by Podgornaia and Laub, i.e., positions 14, 15, 18, and 19 in our MSA of TCS partners, which map to positions 284, 285, 288, and 289 of PhoQ.
Zone Equilibration of Mutants (ZEMu) Calculation
ZEMu consists of a multiscale minimization by dynamics, restricted to a flexibility zone of five residues about each substitution site (Dourado and Flores 2014), which is followed by a mutational change in stability using FoldX (Guerois et al. 2002). ZEMu has been used to explain the mechanism of Parkinson’s disease associated mutations in Parkin (Caulfield et al. 2014; Fiesel et al. 2015). The minimization is done in MacroMoleculeBuilder (MMB), a general-purpose internal coordinate mechanics code also known for RNA folding (Flores and Altman 2010), homology modeling (Flores et al. 2010), morphing (Tek et al. 2016), and fitting to density maps (Flores 2014).
We use the ZEMu (Dourado and Flores 2014) method to predict the mutational change in binding energy between PhoQ and PhoP. ZEMu first treats mutations as small perturbations on the structure by using molecular dynamics simulations [See Ref. Dourado and Flores (2014) for full computational details] to equilibrate the local region around mutational sites. ZEMu can then estimate the binding affinity between the mutant-PhoQ/PhoP, , and the wild type-PhoQ/PhoP, , using the knowledge-based FoldX (Guerois et al. 2002) potential. This allows for the calculation of the mutational change in binding affinity as: .
ZEMu calculation was performed according to Ref: (Dourado and Flores 2014), with the following two differences. First, due to the large number of mutations we capped the computer time permitted to 3 core-hours per mutant, whereas in (Dourado and Flores 2014) the limit was 48 h. This meant that of 122,802 mutants attempted, 42,923 completed within the time limit, whereas in (Dourado and Flores 2014), almost all mutants converged. The major reason for non-convergence in the current work involved mutation to bulky or constrained residues. Steric clashes produced by such residues force the error-controlled integrator (Flores et al. 2011) to take small time steps and hence use more computer time. Exemplifying this, the amino acids F, W, and Y are the most common residues for non-converging mutations at positions 285 and 288 in PhoQ. The second difference was that we permitted flexibility in the neighborhood of all four possible mutation sites, even when not all of them were mutated, whereas in (Dourado and Flores 2014) only the mutated positions were treated as flexible. This allowed us to compare all of the mutational energies to a single wild-type simulation, also performed with flexibility at all four sites.
Database of TCS partners, Potts model, and code for calculating equation (4) can be obtained from: http://utdallas.edu/~faruckm/PublicationDatasets.html; last accessed August 10, 2016.
Supplementary Material
Supplementary figures S1–S7 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
Acknowledgments
We would like to thank Drs Michele Di Pierro and Lena Simine for helpful comments. This research was supported by the NSF INSPIRE award (MCB-1241332) and the NSF-funded Center for Theoretical Biological Physics (PHY-1427654). SF and ON acknowledge funds from eSSENCE (http://essenceofescience.se/), Uppsala University, and the Swedish Foundation for International Cooperation in Research and Higher Education (STINT). We also acknowledge a generous allocation of supercomputer time from the Swedish National Infrastructure for Computing (SNIC) at Uppmax, and applications assistance from Drs Rudberg, Karlsson, and Freyhult.
References
- Aakre CD, Herrou J, Phung TN, Perchuk BS, Crosson S, Laub MT. 2015. Evolving new protein-protein interaction specificity through promiscuous intermediates. Cell 163:594–606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bloom JD. 2016. Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models. bioRxiv: Cold Spring Harbor Labs Journals. Advance Access published January 22, 2016, doi:http://dx.doi.org/10.1101/037689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG. 1995. Funnels, pathways, and the energy landscape of protein-folding—a synthesis. Proteins–Struct Funct Genet. 21:167–195. [DOI] [PubMed] [Google Scholar]
- Bryngelson JD, Wolynes PG. 1987. Spin-glasses and the statistical-mechanics of protein folding. Proc Natl Acad Sci U S A. 84:7524–7528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burger L, van Nimwegen E. 2008. Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method. Mol Syst Biol. 4:165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Capra EJ, Laub MT. 2012. Evolution of two-component signal transduction systems. Annu Rev Microbiol. 66:325–347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Capra EJ, Perchuk BS, Lubin EA, Ashenberg O, Skerker JM, Laub MT. 2010. Systematic dissection and trajectory-scanning mutagenesis of the molecular interface that ensures specificity of two-component signaling pathways. PLoS Genetics. 6:e1001220.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casino P, Rubio V, Marina A. 2010. The mechanism of signal transduction by two-component systems. Curr Opin Struct Biol. 20:763–771. [DOI] [PubMed] [Google Scholar]
- Casino P, Rubio V, Marina A. 2009. Structural insight into partner specificity and phosphoryl transfer in two-component signal transduction. Cell 139:325–336. [DOI] [PubMed] [Google Scholar]
- Caulfield TR, Fiesel FC, Moussaud-Lamodiere EL, Dourado DFAR, Flores SC, Springer W. 2014. Phosphorylation by PINK1 releases the UBL domain and initializes the conformational opening of the E3 ubiquitin ligase Parkin. PLoS Comput Biol. 10:e1003935.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng RR, Morcos F, Levine H, Onuchic JN. 2014. Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci U S A. 111:E563–E571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng RR, Raghunathan M, Noel JK, Onuchic JN. 2016. Constructing sequence-dependent protein models using coevolutionary information. Protein Sci. 25:111–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Contini A, Tiana G. 2015. A many-body term improves the accuracy of effective potentials based on protein coevolutionary data. J Chem Phys. 143:025103.. [DOI] [PubMed] [Google Scholar]
- Dago AE, Schug A, Procaccini A, Hoch JA, Weigt M, Szurmant H. 2012. Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis. Proc Natl Acad Sci U S A. 109:E1733–E1742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Juan D, Pazos F, Valencia A. 2013. Emerging methods in protein co-evolution. Nat Rev Genet. 14:249–261. [DOI] [PubMed] [Google Scholar]
- dos Santos RN, Morcos F, Jana B, Andricopulo AD, Onuchic JN. 2015. Dimeric interactions and complex formation using direct coevolutionary couplings. Sci Rep. 5:13652.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dourado DF, Flores SC. 2014. A multiscale approach to predicting affinity changes in protein-protein interfaces. Proteins Struct Funct Bioinform. 82:2681–2690. [DOI] [PubMed] [Google Scholar]
- Echave J, Spielman SJ, Wilke CO. 2016. Causes of evolutionary rate variation among protein sites. Nat Rev Genet. 17:109–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ekeberg M, Lovkvist C, Lan YH, Weigt M, Aurell E. 2013. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E. 87:012707.. [DOI] [PubMed] [Google Scholar]
- Espada R, Parra RG, Mora T, Walczak AM, Ferreiro DU. 2015. Capturing coevolutionary signals inrepeat proteins. BMC Bioinformatics 16:207.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feinauer C, Szurmant H, Weigt M, Pagnani A. 2016. Inter-protein sequence co-evolution predicts known physical interactions in bacterial ribosomes and the Trp operon. PLoS One. 11:e0149166.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferguson AL, Mann JK, Omarjee S, Ndung'u T, Walker BD, Chakraborty AK. 2013. Translating HIV sequences into quantitative fitness landscapes predicts viral vulnerabilities for rational immunogen design. Immunity 38:606–617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferreiro DU, Komives EA, Wolynes PG. 2014. Frustration in biomolecules. Q Rev Biophys. 47:285–363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fiesel FC, Caulfield TR, Moussaud-Lamodiere EL, Ogaki K, Dourado DFAR Flores SC, Ross OA, Springer W. 2015. Structural and functional impact of Parkinson disease-associated mutations in the E3 Ubiquitin ligase Parkin. Hum Mutat. 36:774–786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Figliuzzi M, Jacquier H, Schug A, Tenaillon O, Weigt M. 2016. Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1. Mol Biol Evol. 33:268–280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, et al. 2014. Pfam: the protein families database. Nucleic Acids Res. 42:D222–D230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flores SC. 2014. Fast fitting to low resolution density maps: elucidating large-scale motions of the ribosome. Nucleic Acids Res 42:e9.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flores SC, Altman RB. 2010. Turning limited experimental information into 3D models of RNA. RNA 16:1769–1778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flores SC, Sherman MA, Bruns CM, Eastman P, Altman RB. 2011. Fast flexible modeling of RNA structure using internal coordinates. IEEE-ACM Trans Comput Biol Bioinform. 8:1247–1257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flores SC, Wan Y, Russell R, Altman RB. 2010. Predicting RNA structure by multiple template homology modeling. Pac Symp Biocomput. 216–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ganesh I, Ravikumar S, Lee SH, Park SJ, Hong SH. 2013. Engineered fumarate sensing Escherichia coli based on novel chimeric two-component system. J Biotechnol. 168:560–566. [DOI] [PubMed] [Google Scholar]
- Gobel U, Sander C, Schneider R, Valencia A. 1994. Correlated mutations and residue contacts in proteins. Proteins 18:309–317. [DOI] [PubMed] [Google Scholar]
- Guerois R, Nielsen JE, Serrano L. 2002. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol. 320:369–387. [DOI] [PubMed] [Google Scholar]
- Halpern AL, Bruno WJ. 1998. Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol Biol Evol. 15:910–917. [DOI] [PubMed] [Google Scholar]
- Hansen J, Benenson Y. 2016. Synthetic biology of cell signaling. Nat Comput. 15:5–13. [Google Scholar]
- Hoch JA. 2000. Two-component and phosphorelay signal transduction. Curr Opin Microbiol. 3:165–170. [DOI] [PubMed] [Google Scholar]
- Laub MT, Goulian M. 2007. Specificity in two-component signal transduction pathways. Annu Rev Genet. 41:121–145. [DOI] [PubMed] [Google Scholar]
- Leopold PE, Montal M, Onuchic JN. 1992. Protein folding funnels—a kinetic approach to the sequence structure relationship. Proc Natl Acad Sci U S A. 89:8721–8725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li L, Shakhnovich EI, Mirny LA. 2003. Amino acids determining enzyme-substrate specificity in prokaryotic and eukaryotic protein kinases. Proc Natl Acad Sci U S A. 100:4463–4468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lui S, Tiana G. 2013. The network of stabilizing contacts in proteins studied by coevolutionary data. J Chem Phys. 139:155103.. [DOI] [PubMed] [Google Scholar]
- Malinverni D, Marsili S, Barducci A, De Los Rios P. 2015. Large-scale conformational transitions and dimerization are encoded in the amino-acid sequences of Hsp70 chaperones. PLoS Comput Biol. 11:e1004262.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mallik S, Das S, Kundu S. 2016. Predicting protein folding rate change upon point mutation using residue-level coevolutionary information. Proteins Struct Funct Bioinform. 84:3–8. [DOI] [PubMed] [Google Scholar]
- Marks DS, Hopf TA, Sander C. 2012. Protein structure prediction from sequence variation. Nat Biotechnol. 30:1072–1080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morcos F, Jana B, Hwa T, Onuchic JN. 2013. Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci U S A. 110:20533–20538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M. 2011. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A. 108:E1293–E1301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morcos F, Schafer NP, Cheng RR, Onuchic JN, Wolynes PG. 2014. Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc Natl Acad Sci U S A. 111:12408–12413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neher E. 1994. How frequent are correlated changes in families of protein sequences?. Proc Natl Acad Sci U S A. 91:98–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noel JK, Morcos F, Onuchic JN. 2016. Sequence co-evolutionary information is a natural partner to minimally-frustrated models of biomolecular dynamics. F1000Res 5:F1000 Faculty Rev-106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Onuchic JN, Luthey-Schulten Z, Wolynes PG. 1997. Theory of protein folding: the energy landscape perspective. Annu Rev Phys Chem. 48:545–600. [DOI] [PubMed] [Google Scholar]
- Pande VS, Grosberg AY, Tanaka T. 2000. Heteropolymer freezing and design: towards physical models of protein folding. Rev Modern Phys. 72:259–314. [Google Scholar]
- Pande VS, Grosberg AY, Tanaka T. 1997. Statistical mechanics of simple models of protein folding and design. Biophys J. 73:3192–3210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Podgornaia AI, Laub MT. 2013. Determinants of specificity in two-component signal transduction. Curr Opin Microbiol. 16:156–162. [DOI] [PubMed] [Google Scholar]
- Podgornaia AI, Laub MT. 2015. Pervasive degeneracy and epistasis in a protein-protein interface. Science 347:673–677. [DOI] [PubMed] [Google Scholar]
- Procaccini A, Lunt B, Szurmant H, Hwa T, Weigt M. 2011. Dissecting the specificity of protein-protein interaction in bacterial two-component signaling: orphans and crosstalks. PLoS One. 6:e19729.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qin L, Cai S, Zhu Y, Inouye M. 2003. Cysteine-scanning analysis of the dimerization domain of EnvZ, an osmosensing histidine kinase. J Bacteriol. 185:3429–3435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmid SR, Sheth RU, Wu A, Tabor JJ. 2014. Refactoring and optimization of light-switchable Escherichia coli two-component systems. ACS Synth Biol. 3:820–831. [DOI] [PubMed] [Google Scholar]
- Schug A, Weigt M, Onuchic JN, Hwa T, Szurmant H. 2009. High-resolution protein complexes from integrating genomic information with molecular simulation. Proc Natl Acad Sci U S A. 106:22124–22129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sella G, Hirsh AE. 2005. The application of statistical physics to evolutionary biology. Proc Natl Acad Sci U S A. 102:9541–9546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shindyalov IN, Kolchanov NA, Sander C. 1994. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng 7:349–358. [DOI] [PubMed] [Google Scholar]
- Sikosek T, Chan HS. 2014. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface. 11:20140419.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skerker JM, Perchuk BS, Siryaporn A, Lubin EA, Ashenberg O, Goulian M, Laub MT. 2008. Rewiring the specificity of two-component signal transduction systems. Cell 133:1043–1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skerker JM, Prasol MS, Perchuk BS, Biondi EG, Laub MT. 2005. Two-component signal transduction pathways regulating growth and cell cycle progression in a bacterium: a system-level analysis. PLoS Biol. 3:1770–1788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spielman SJ, Wilke CO. 2015. The relationship between dN/dS and scaled selection coefficients. Mol Biol Evol. 32:1097–1108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stock AM, Robinson VL, Goudreau PN. 2000. Two-component signal transduction. Annu Rev Biochem. 69:183–215. [DOI] [PubMed] [Google Scholar]
- Sulkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN. 2012. Genomics-aided structure prediction. Proc Natl Acad Sci U S A. 109:10340–10345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sutto L, Marsili S, Valencia A, Gervasio FL. 2015. From residue coevolution to protein conformational ensembles and functional dynamics. Proc Natl Acad Sci U S A. 112:13567–13572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szurmant H, Hoch JA. 2010. Interaction fidelity in two-component signaling. Curr Opin Microbiol. 13:190–197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tabor JJ, Levskaya A, Voigt CA. 2011. Multichromatic control of gene expression in Escherichia coli. J Mol Biol. 405:315–324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tamuri AU, dos Reis M, Goldstein RA. 2012. Estimating the distribution of selection coefficients from phylogenetic data using sitewise mutation-selection models. Genetics 190:1101–1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tek A, Korostelev AA, Flores SC. 2016. MMB-GUI: a fast morphing method demonstrates a possible ribosomal tRNA translocation trajectory. Nucleic Acids Res. 44:95–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tzeng Y-L, Hoch JA. 1997. Molecular recognition in signal transduction: the interaction surfaces of the Spo0F response regulator with its cognate phosphorelay proteins revealed by alanine scanning mutagenesis. J Mol Biol. 272:200–212. [DOI] [PubMed] [Google Scholar]
- Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. 2009. Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci U S A. 106:67–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whitaker WR, Davis SA, Arkin AP, Dueber JE. 2012. Engineering robust control of two-component system phosphotransfer using modular scaffolds. Proc Natl Acad Sci U S A. 109:18090–18095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wolynes PG. 2015. Evolution, energy landscapes and the paradoxes of protein folding. Biochimie. 119:218–230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yamamoto K, Hirao K, Oshima T, Aiba H, Utsumi R, Ishihama A. 2005. Functional characterization in vitro of all two-component signal transduction systems from Escherichia coli. J Biol Chem. 280:1448–1456. [DOI] [PubMed] [Google Scholar]
- Zarrinpar A, Park S-H, Lim WA. 2003. Optimization of specificity in a cellular protein interaction network by negative selection. Nature 426:676–680. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.