Biophysical ambiguities prevent accurate genetic prediction

Xianghua Li; Ben Lehner

doi:10.1038/s41467-020-18694-0

. 2020 Oct 1;11:4923. doi: 10.1038/s41467-020-18694-0

Biophysical ambiguities prevent accurate genetic prediction

Xianghua Li ¹, Ben Lehner ^1,^2,^3,^✉

PMCID: PMC7529754 PMID: 33004824

Abstract

A goal of biology is to predict how mutations combine to alter phenotypes, fitness and disease. It is often assumed that mutations combine additively or with interactions that can be predicted. Here, we show using simulations that, even for the simple example of the lambda phage transcription factor CI repressing a gene, this assumption is incorrect and that perfect measurements of the effects of mutations on a trait and mechanistic understanding can be insufficient to predict what happens when two mutations are combined. This apparent paradox arises because mutations can have different biophysical effects to cause the same change in a phenotype and the outcome in a double mutant depends upon what these hidden biophysical changes actually are. Pleiotropy and non-monotonic functions further confound prediction of how mutations interact. Accurate prediction of phenotypes and disease will sometimes not be possible unless these biophysical ambiguities can be resolved using additional measurements.

Subject terms: Mutation, Quantitative trait, Systems biology, Epistasis

In quantitative genetics, it is widely assumed that mutations combine additively or epistasis can be predicted with statistical or mechanistic models. Here, the authors use the phage lambda repressor model to show how biophysical ambiguity and non-monotonic functions confound phenotypic prediction.

Introduction

A fundamental challenge across diverse fields of biology including human genetics, animal and plant breeding, and evolutionary theory is to predict how changes in genotypes result in changes in phenotypes and fitness. Accurate prediction of phenotypes from sequence entails two sub-challenges: predicting the mutations that individually affect a trait of interest and by how much, and predicting the joint effects when multiple mutations are combined in an individual. Progress is being made in both systematically identifying^1–3 and predicting^4–6 the mutations that impact traits of interest. Moreover, the extent to which mutations combine additively or with genetic (epistatic) interactions is being systematically quantified across diverse systems and phenotypes^7,8.

However, a more fundamental question remains that is not addressed in any of these studies. Even if we have perfect measurements of the individual effects of a set of mutations on a trait and a very good mechanistic understanding of a system, can we always predict what happens when two mutations are combined?

In this study, we use a simple biophysical system to address this question. We show that, for diverse biological systems, the answer to this question will often be no. The fundamental reason for this is that different combinations of biophysical parameters can give rise to the same phenotypic value⁹.

The phage lambda repressor, CI, is one of the best-understood proteins in biology and a classic model for gene regulation, protein biophysics and systems biology^10–14. CI regulates transcription from two divergent promoters with well-established dose–response curves: it represses transcription from the P_R promoter via a monotonic function but induces and then represses transcription from the P_RM promoter via a non-monotonic peaked function. The molecular mechanisms that underlie these regulatory responses are well-understood^10,15,16 and thermodynamic models that incorporate them accurately predict the behaviour of the system^17–20. Specifically, Ackers’ statistical thermodynamic model predicts the probabilities of the ON and OFF configuration states of the P_R and P_RM promoters as a function of the total repressor concentration¹⁷. To predict how mutations that affect the stability of CI combine to affect gene regulation, Ackers’ model can be combined with a thermodynamic model of protein folding¹⁹.

Like most proteins²¹, CI is multifunctional: in order to regulate transcription it must fold correctly^22–25, form a dimeric complex²⁶, bind to DNA at multiple operator sites^27,28 and also form a higher-order tetrameric complex^29,30 on the genome (Fig. 1a). Mutations in CI can affect any of these biophysical activities, making CI a good model for investigating how mutations with different biophysical effects interact to alter cellular phenotypes.

a CI binds three operators as a dimer with two dimers also forming a tetrameric complex. Cyan and yellow distinguish the two monomers of each dimer. b Statistical thermodynamic model of gene regulation by the lambda repressor (CI). CI exists as unfolded, folded monomer, free dimer and dimers that are bound to operators. The partitioning of these molecules depends on Gibbs free-energy differences between states. c Dose–response curves of the P_R and P_RM promoters. d Mutations result in additive changes in the free energy of protein folding, dimerization, DNA binding and tetramerization. When only one free-energy term is altered, gene expression is altered by the eight plotted relationships. Dotted vertical black lines denote ∆∆G = 0 (wild type). See also Supplementary Fig. 1. Source data are provided as a Source data file.

However, mutations in a CI, like mutations in other proteins, can actually affect more than one biophysical parameter at the same time. For example, of 12 mutations that alter the binding affinity of CI to DNA, six (50%) also affected the stability of the protein^27,31–33. Such biophysical pleiotropy is common, for example, mutations that alter enzymatic activity often reduce protein stability³⁴. Similarly, mutations that alter protein binding affinities also frequently impact stability^31,35 and in allosteric proteins changes in the affinity of binding at one site will alter the binding affinity at a second site³⁶.

Here, using gene regulation by the lambda repressor model, we show that, even for a very simple biophysical system, it is often impossible to predict what happens when two mutations are combined even if we have perfect measurements of their effects on a trait. The cause of this apparent paradox is the one-to-many mapping between phenotypes and the underlying biophysical parameter changes that can cause them. When combining mutations, the outcome can be very different depending upon what these unidentified biophysical changes actually are. Our results illustrate how accurate genetic prediction of phenotypes and disease will often not be possible unless additional measurements are made to resolve the biophysical ambiguities in genotype–phenotype maps.

Results

Combining mutations in a thermodynamic model

To better understand how genetic variants with different biophysical effects combine to alter phenotypes, we investigated how mutations in a model transcription factor, the lambda repressor (CI), alter the expression of two target genes using an extensively validated thermodynamic model (Fig. 1b)^17–20. We first considered mutations that affect the folding or stability of CI. Changes in protein stability are one of the most frequent effects of amino acid changes and a major cause of genetic disease^22–25. The fraction of a protein in its natively folded state depends on the difference in Gibbs free energy (∆G) between its folded and unfolded states. Unless they are energetically coupled³⁷, mutations have effects on stability that are additive at the level of free energy but non-additive for changes in protein concentration and expression from the P_R and P_RM promoters, which are our two phenotypic traits of interest (Fig. 1c, d)^19,38,39.

Genetic prediction for mutations affecting protein stability

If two mutations that only affect protein stability are combined, the change in expression from P_R is often non-additive (i.e. there is substantial epistasis)¹⁹. However, the phenotype of the double mutant can normally be unambiguously predicted from the phenotypes of the two constituent single mutants because the free-energy-phenotype function is monotonic⁴⁰ (Fig. 2a). The exception is when mutations have phenotypes that map to the top or bottom plateaus of the free-energy-phenotype function where the gradient approaches zero (Fig. 1d and Supplementary Fig. 1b–e) and measurement imprecision results in ambiguity in the underlying causal free-energy changes.

Fig. 2 — a–d Double mutant P_R expression when combining CI mutations both affecting the same biophysical parameter: protein folding (a), dimerization (b), DNA binding (c) or tetramerization (d). e–h Double mutant P_RM expression when combining CI mutations affecting the same biophysical parameter: protein folding (e), dimerization (f), DNA binding (g) or tetramerization (h). Top row panels show number of possible P_RM expression phenotypes when combining two single mutant phenotypes. Bottom row panels (e–g) show the range of possible P_RM phenotypes. Bottom row of (h) shows P_RM expression since there is no ambiguous prediction. i–k Examples showing how three mutations with known P_RM expression phenotypes combine with second mutations with known phenotypes to result in up to four different expression levels in the double mutant. Source data are provided as a Source data file.

For expression from the P_RM promoter, however, this is not the case. Combining two mutations with measured effects on P_RM expression can result in more than one P_RM expression value, depending upon what the hidden underlying free-energy changes are^19,40. The cause of this ambiguity in the phenotype of a double mutant is the non-monotonic input–output function of P_RM (Fig. 1c, d), which means that many phenotypic values can map to two different underlying changes in the free energy of protein folding (Fig. 1d). Thus, when combining mutations of known phenotypic effect, there can be up to four different valid phenotypic outcomes in the double mutant (Fig. 2e) and these outcomes can differ by almost the entire phenotypic range (Fig. 2e, i). Thus, even if mutations only affect protein folding, non-monotonic input–output functions and plateaus in free-energy-phenotype functions can make it impossible to predict how two mutations of known effect will combine to alter a phenotype.

Mutations with other known biophysical effects

Mutations in proteins can, however, affect more than their stability. For example, mutations in CI can alter the binding affinity of the protein for itself (dimerization)²⁶, its affinity for DNA^27,28 and the affinity between two dimers to form a tetramer^29,30. As for mutations affecting protein stability, mutations causing additive changes in the free energy of these molecular interactions (Fig. 1d) often combine to cause non-additive changes in expression from the two target promoters (Fig. 2b–d), generating substantial epistasis. However, for expression from P_R there is again no ambiguity in the double mutant phenotypes, with the exception of uncertainty created by imprecise measurements at the plateaus of the free-energy-phenotype functions (Fig. 1d and Supplementary Fig. 1b, c). However, as when combining mutations that only affect protein folding, pairs of mutations of known phenotypic effect that both only affect either dimerization or DNA binding can combine to have up to four different P_RM phenotypes as double mutants (Fig. 2f–k, Supplementary Fig. 2). Similar conclusions are obtained if the two mutations individually affect two different (but known) biophysical parameters: P_RM expression often cannot be unambiguously predicted, including when one of the mutations affects tetramerization (Supplementary Fig. 2b, c), while P_R expression can always be predictable without ambiguity (Supplementary Fig. 2a).

Prediction for mutations with unknown biophysical effects

So far, we have considered cases where we know the identity of the biophysical parameter affected by each mutation. But normally we actually do not know which biophysical property of a protein is altered by a mutation. For example, any measured change in P_R expression resulting from a mutation in CI could be caused by a mutation that affects folding, DNA binding or dimerization (Fig. 1d, mutations that affect tetramerization have a more limited range of phenotypic outcomes).

We therefore considered what happens when two mutations combine and each of these mutations might have altered one of two different biophysical parameters, for example either protein stability or DNA-binding affinity. Now, even when considering expression from P_R as the phenotype of interest, there is always ambiguity when predicting the phenotypes of double mutants (Fig. 3a–f and Supplementary Fig. 3a–f). For example, there are now four valid phenotypic outcomes when combining two mutations if each can alter either stability or DNA binding (but not both, Fig. 3a–f). Considering expression from P_RM as the phenotype of interest, there are now many valid phenotypes for each double mutant when combining mutations of known effect (Fig. 3g–l and Supplementary Fig. 3g–l).

Fig. 3 — a–c P_R expression when combining two mutations that affect either protein folding (Mutation A) or another biophysical parameter (Mutation B) but not both: dimerization (a), DNA binding (b) or tetramerization (c). Number (left) and range (right) of possible double mutant P_R phenotypes (left). d–f Examples showing how a mutation with a known phenotype combines with other mutations, leading to 1 to 4 possible double mutant P_R expression levels. g–i Number (left) and range (right) of double mutant P_RM expression levels when mutations can affect folding or another biophysical parameter. j–l Examples showing how a mutation with a known P_RM phenotype can combine with other mutations to result in many different P_RM phenotypes. m, n Maximum number (left) and range (right) of double mutant phenotypes when two mutations can each affect one of the indicated number of different biophysical properties. Horizontal lines denote the mean of the data points. n = 4, 6, 4 and 1, respectively, for the groups with number of possible biophysical parameters equal to 1, 2, 3 and 4. Source data are provided as a Source data file.

If mutations can affect any one of the four biophysical parameters, the number of possible double mutant phenotypes can be very large indeed (Fig. 3m, n and Supplementary Fig. 3m, n). For example, two mutations with known effect on P_RM expression can combine to produce up to 15 different double mutant phenotypes if each mutation can affect any one (and only one) of the four possible free-energy terms (Fig. 3n). Thus, when we do not know the biophysical property of a protein that is altered by each mutation, it becomes impossible to predict the phenotypes of double mutants from the phenotypes of single mutants alone.

Biophysical pleiotropy further confounds genetic prediction

In reality, the situation can actually be worse than this because mutations can affect more than one biophysical parameter at the same time. For example, of 12 mutations changing the binding affinity of CI to DNA, half also altered the stability of the protein^27,31–33. We define these situations when one mutation influences two or more biophysical parameters as biophysical pleiotropy.

Allowing one (Fig. 4a, b, Supplementary Fig. 4) or both (Fig. 4f, j and Supplementary Fig. 4) mutations in CI to be pleiotropic and to alter two different free-energy terms results in the possible double mutant outcomes now covering a continuous range of values (Fig. 4 and Supplementary Fig. 4). Thus, when mutations are biophysically pleiotropic, we cannot predict the phenotype of a double mutant containing two mutations of precisely measured individual effects.

Biophysical ambiguity confounds genetic prediction

To illustrate how these diverse double mutant phenotypes arise when combining pairs of mutations with identical phenotypic effects, we plot in Fig. 4c–f how the expression from P_R changes as a function of changes in the free energy of folding (∆∆G_F) and DNA binding (∆∆G_B). Non-pleiotropic mutations that only alter folding are horizontal movements in this space, mutations that only affect DNA binding are vertical movements and pleiotropic mutations are diagonal movements. All of the changes in free energy that result in the same phenotype form a phenotype isochore, for example the grey dashed curves in Fig. 4c–f represent all parameter changes that can produce a 4-fold increase (2 in log(2) scale) in P_R expression.

When two non-pleiotropic mutations that cause this same phenotypic change (lie on the same phenotype isochore) are combined together there are three possible combinations of free-energy changes (the two mutations alter DNA binding, folding, or one alters folding and the other binding) and two possible resulting double mutant phenotypes (Fig. 4c). When a non-pleiotropic mutation affecting DNA binding is combined with a pleiotropic mutation affecting both free-energy terms, there are many possible combinations of free-energy terms but, because of the topology of the free energy-phenotype landscape, all of the double mutants have very similar phenotypes (Fig. 4d). In contrast, when a non-pleiotropic mutation affecting folding is combined with a pleiotropic mutation, the possible double mutants do not fall on an isochore but now cover a range of possible phenotypes (Fig. 4e). Finally, when two pleiotropic mutations are combined, the possible double mutants are widely spread in the free-energy landscape (red shaded area in Fig. 4f) and take many different phenotypic values (Fig. 4f). The equivalent free-energy-phenotype landscape is plotted for P_RM in Fig. 4g–j and for other combinations of free-energy terms in Supplementary Fig. 4. It is both the monotonicity and symmetry of these landscapes that determines the degree of ambiguity when combining mutations.

When mutations can alter three or more free-energy terms, these landscapes become difficult to visualise (Fig. 5). For example, if each mutation in CI can alter stability, DNA binding or dimerization, each mutation with a known phenotype potentially maps to any position on a surface of combinations of causal parameter changes. Combining two mutations with precisely measured phenotypic effects can combine to have phenotypes that span nearly the entire range of possible phenotype values (Fig. 5). This is because, without additional information, the actual parameter changes in the double mutant can take many values within a 3D volume of possibilities. There is now nearly complete ambiguity in the predicted phenotype of the double mutant (Fig. 5).

Biophysical ambiguity in even simpler systems

Finally, although gene regulation by the lambda repressor is a relatively simple biological system, we note that biophysical ambiguity also confounds the prediction of double mutant phenotypes in even simpler systems. For example, consider a protein whose only function is to bind another molecule (a ligand), with the concentration of the bound complex directly proportional to the phenotype of interest (Fig. 6a). In such a minimal system mutations can only alter protein stability or the binding affinity to the ligand. The outcome in a double mutant can still differ depending upon which free-energy terms are individually affected in each single mutant (Fig. 6b, c). Again, allowing pleiotropic mutations further thwarts the ability to predict the phenotypes of double mutants from the phenotypes of single mutants (Fig. 6d, e). Similar conclusions are obtained using a model in which a protein’s only function is to bind to itself to form a dimer (Supplementary Fig. 5). Thus, even in these most basic biological systems of a single binding reaction of a macromolecule, it is often impossible to predict what happens when single mutants of known phenotype are combined without additional measurements or inferences.

Fig. 6 — a Statistical thermodynamic model of a protein binding to a ligand. The protein X exists in three states: unfolded, folded, and folded and bound to the ligand. The partitioning of these molecules depends on the Gibbs free-energy differences between states. b Mutations result in additive changes in the free energy of protein folding and binding, altering the concentration of the protein–ligand complex. c–f Free-energy-phenotype landscapes for mutations that affect the free energy of folding (x-axis) and/or binding energy (y-axis). Phenotypic isochores are drawn with an interval of 1 in log(2) scale. A continuous range of free-energy changes can underlie an observed phenotype (dashed isochore). Combining two mutations with the same effect can result in a range of double mutant phenotypes (red shaded areas in (f)). Example double mutant outcomes are shown when neither (c), one (d, e) or both (f) mutations are pleiotropic. See also Supplementary Fig. 5. Source data are provided as a Source data file.

Discussion

Taken together, our results show that, even for a simple biological system—the regulation of gene expression by a single transcription factor—it is often impossible to unambiguously predict how two mutations of known phenotypic effect will combine together to alter the same phenotype in a double mutant.

The fundamental cause of this uncertainty is the one-to-many relationship between a measured phenotype and the underlying causal changes in biophysical parameters. Mutations can affect multiple biophysical properties of a system—for example, the stability and binding affinities of proteins—and many different changes in biophysical parameters can cause the same observed change in a trait. However, the phenotype of a double mutant depends on which of these biophysical properties is actually altered in each single mutant and so can take multiple values. Pleiotropic biophysical effects and non-monotonic input–output functions create further ambiguity when predicting how mutations of known effect combine to alter a phenotype.

The extent to which biophysical ambiguities will thwart the prediction of different phenotypes will depend on the number of parameters that can be affected by mutations, their biophysical pleiotropy, and monotonicity of input–output functions. The distributions of mutational effects on multiple biophysical parameters have been quantified for very few systems, but for both the lambda repressor and other proteins, mutations frequently affect both stability^41,42 and binding to interaction partners^41,43,44 with biophysical pleiotropy and non-monotonic functions also common^31,35,45. In other words, we expect biophysical ambiguity to confound phenotypic prediction in other systems including heteromeric complexes and beyond transcription factor-mediated repression.

To resolve ambiguities and accurately predict how mutations combine to alter phenotypes, additional information will always be required. Although ultimately it may be possible to predict from sequence how a particular mutation affects all the biophysical parameters of a protein, for the foreseeable future resolving ambiguities will require additional measurements to be made. High-throughput methods to quantify the effects of mutations on protein stability⁴², binding^41,44,46 and activity⁴⁷ will help in this endeavour, particularly when used in combination to disentangle biophysical effects. Moreover, quantifying how individual mutations interact with many other mutations in a system may allow the underlying causal changes in biophysical parameters to be inferred, at least when only two different parameters can be affected³⁵. Quantifying intermediate molecular phenotypes such as protein concentrations and additional higher-level phenotypes may also be useful (e.g., quantifying expression from P_R is sufficient to resolve the ambiguities resulting from the non-monotonicity of the P_RM dose–response curve), and experimentally quantifying the dose–response curves of individual mutations can also sometimes help to distinguish mutations with different biophysical effects⁴⁸.

However, the fundamental conclusion remains: even in this simple biological system (and in even simpler ones, Fig. 6 and Supplementary Fig. 5) it can be impossible to predict the combined effect of two mutations, even if we have perfect measurements of their individual effects on a trait. In such cases, additional information or measurements will always be required to accurately predict how genetic variants combine to alter phenotypes and cause disease.

Methods

Methods overview

Our model is based on Ackers’ thermodynamic model of lambda repressor binding to its operator sites (O_R1, O_R2 and O_R3)¹⁷. Briefly, this model describes eight possible operator configuration states (c1–c8) in which the CI dimer can bind to the operators (Fig. 1b). Based on statistical thermodynamics, the downstream gene expression from promoters P_R and P_RM is determined by the probabilities of the ON and OFF cis-regulator configuration states¹⁷.

To examine CI coding mutants’ effects on gene expression from P_R and P_RM promoters, we extended Ackers’ model by including CI folding because many mutations destabilise proteins^22–25. Destabilising mutations will decrease the fraction of the folded functional protein, and thus change gene expression from the downstream P_R or P_RM promoter. In other words, compared to Ackers’ model, we have one more protein state—CI unfolded state CI_(U) and the corresponding additional parameter—protein-folding energy ∆G_(F) (Supplementary Tables 1 and 2). The rest of our model is the same as Ackers’ model. We consider the system as a single equilibrium, i.e. protein folding and dimerization are coupled reactions.

Below are the details of the model, which follow simple statistical thermodynamics.

CI configuration states

The total CI (CI_(Total)) molecule amount is the sum of all the CI molecules in the 10 different possible states as shown in Eq. (1). These different states include unfolded CI_(U), folded monomer CI_(M), free dimer CI₂ and seven operator-bound CI dimer states (Fig. 1b and Supplementary Table 1). The unit of molecule amount per cell is M in all the equations in our model.

{CI}_{(Total)} = {CI}_{(U)} + {CI}_{(M)} + 2 \cdot {CI}_{2} + 2 \cdot {OR}_{(Total)} \sum_{i = 2}^{7} (k \cdot f_{i}) .

Above, ${OR}_{(Total)}$ is the molecule amount of the operators, f_i is the relative probability that each of the seven cis-configuration states where CI is bound to operators occurs in relation to the not-bound state. i is the index for each cis-configuration state, and k is the number of CI dimers in the corresponding cis-configuration state (Supplementary Table 1). The amount of CI molecule for each operator-bound state is calculated based on the statistical thermodynamics but also multiplying the number of CI dimers (k) in each state and a factor 2 to account for two molecules for each dimer (Supplementary Table 1).

All the parameters in the model for wild-type CI are taken from literature (Supplementary Table 2).

Equilibrium between CI unfolded and folded monomer states

CI monomer folds in a simple folded CI_(M) and unfolded CI_(U) two-state fashion⁴⁹ that can be described as in the equation below:

\frac{{CI}_{(M)}}{{CI}_{(U)}} = \exp (\frac{- Δ G_{F}}{R T}) .

ΔG_F is the free-energy difference between the folded monomer and unfolded states of CI molecule. R is the gas constant (R = 1.98 × 10⁻³ kcal per M) and T is the absolute temperature for 37 °C (310.15 Kelvin).

Equilibrium between folded CI monomer and free dimer states

\frac{{CI}_{2}}{{CI}_{(M)}^{2}} = \exp (\frac{- Δ G_{D}}{R T}) .

Equilibrium between free CI dimer and operator-bound states

We use Ackers’ model to describe these relationships. Briefly, the likelihood of each configuration state (c1–c8 based on the cis-regulatory state) is a function of the binding energies and the free CI protein dimer concentration.

The probability that each of the eight cis-configuration states $(f_{i})$ occurs is:

f_{i} = \frac{\exp (\frac{- Δ G_{i}}{R T}) {CI}_{2}^{k}}{\sum_{i} \exp (\frac{- Δ G_{i}}{R T}) {CI}_{2}^{k}} .

Where $Δ G_{i}$ is the total free energy of lambda repressor dimers in the respective cis-configuration state i ∈ [1, 8] (Supplementary Table 1, where ΔG is free energy, with ΔG_T referring to the cooperation energy for two dimers binding to the adjacent operator sites); the exponent k ∈ [0,1,2] is the total number of the lambda repressor dimers in the corresponding cis-configuration state i. As stated earlier, all the parameters are kept as originally described in Ackers’ model (Supplementary Table 2).

CI distribution based on statistical thermodynamics

By combining Eqs. (1)–(4), we can describe the total expression level of CI_(Total) as a function of CI free dimer concentration and Gibbs free energies:

\begin{matrix} {CI}_{(Total)} = \exp (\frac{Δ G_{D} + Δ G_{F}}{R T}) {CI}_{2}^{0.5} + 2 {CI}_{2} \\ + \frac{2 OR (\sum_{i = 2}^{4} \exp (\frac{- Δ G_{i}}{R T}) {CI}_{2} + 2 \times \sum_{i = 5}^{7} \exp (\frac{- Δ G_{i}}{R T}) {CI}_{2}^{2} + 3 \exp (\frac{- Δ G_{8}}{R T}) {CI}_{2}^{3})}{\sum_{i = 2}^{4} \exp (\frac{- Δ G_{i}}{R T}) {CI}_{2} + \sum_{i = 5}^{7} \exp (\frac{- Δ G_{i}}{R T}) {CI}_{2}^{2} + \exp (\frac{- Δ G_{8}}{R T}) {CI}_{2}^{3}} \end{matrix} .

Probability of P_R—ON

CI represses expression from the P_R promoter by binding to the operator sites that overlap with the RNA polymerase sigma factor binding site (Fig. 1b)¹⁷. Based on Ackers’ model, two out of the eight cis-configuration states fail to repress gene expression from P_R—when CI is not bound to any operators (c1) and when CI only binds to the low-affinity O_R3 (c2) (Fig. 1b, Supplementary Table 1). Therefore, the probability of the P_R promoter to be active (P_pr) is the sum of the probabilities of the two configuration states in which promoter P_R is not repressed $(\sum_{i = \{1, 2\}} f_{i})$ , as shown in Eq. (6)¹⁷.

P_{pr} = f_{1} + f_{2} = \frac{\exp (\frac{- Δ G_{1}}{R T}) {CI}_{2}^{0} + \exp (\frac{- Δ G_{2}}{R T}) {CI}_{2}^{1}}{\sum_{i = 1}^{8} (\exp (\frac{- Δ G_{i}}{R T}) {CI}_{2}^{k})} .

Probability of P_RM—ON

CI not only suppresses P_R promoter but also activates or suppresses the divergently transcribed P_RM promoter in response to changes in the CI concentration in the cell (Fig. 1c)^10,50. When CI is present and binds to O_R2, it activates the P_RM promoter, while binding to O_R1 per se does not have any effects on P_RM activity^10,16. On the contrary, once CI binds to the low-affinity O_R3, it blocks the access of RNA polymerase sigma factor, repressing expression from P_RM⁵¹. Therefore, gene expression from P_RM is activated only when CI is bound to O_R2 and not bound to O_R3 (corresponding to the two cis-configuration states: c3 and c7) (Fig. 1b and Supplementary Table 1). Using Ackers’ model and Eq. (4)¹⁷, we describe the probability that the P_RM promoter is activated as follows:

P_{prm} = f_{3} + f_{7} = \frac{\exp (\frac{- Δ G_{3}}{R T}) {CI}_{2}^{1} + \exp (\frac{- Δ G_{7}}{R T}) {CI}_{2}^{2}}{\sum_{i = 1}^{8} (\exp (\frac{- Δ G_{i}}{R T}) {CI}_{2}^{k})} .

Calculating free dimer concentration

As seen from Eq. (5), we can easily calculate CI_(Total) from CI₂ for a given set of free energies but not CI₂ from CI_(Total). Therefore, we performed a parameter search for CI₂ values with each set of known biophysical parameters (∆G values) that minimizes the absolute differences between the provided CI_(Total) value and CI_(Total) calculated based on Eq. (5). The Optimize⁵² function in R was used for the parameter search, with the tol parameter set to 1e−23. We refer to this process using Eq. (8), where $Δ G_{s}$ are all the Gibbs free energies of the system.

{CI}_{2} = f ({CI}_{(Total)}, Δ G_{s}) .

Biophysical changes to phenotypes

The probabilities of the two promoters’ ON-states as phenotypes can be calculated using a set of biophysical parameters (free energies) and CI_(Total). We call this process a Forward Function (see Code availability). This function is composed of two steps: (1) parameter search for CI₂ for the given CI as described in the previous section (Calculating free dimer concentration) using Eq. (8); (2) calculating P_PR and P_PRM based on Eqs. (6) and (7).

Phenotypes to free energy for non-pleiotropic mutations

Mutations in the CI protein can affect protein-folding energy (ΔG_F), dimerization energy (ΔG_D), binding energy to the operator sites (ΔG_OR1–OR3) and tetramerization energy (ΔG_T) at the biophysical level. We assume that mutations in CI that alter the free energy of DNA binding do so by the same magnitude for all three operators (ΔΔG_B = ΔΔG_OR1 = ΔΔG_OR2 = ΔΔG_OR3). To calculate only one biophysical change that can lead to the phenotype, we reversed the Forward Function described in the previous section. The Reverse Function for both P_PR and P_PRM is composed of two sub-functions. The first sub-function is the above-mentioned Forward Function, which calculates phenotypes from biophysical changes. This function is written in the form of y = f(x), where y is the phenotype and x is a set of biophysical parameters including the total expression level of CI. The second sub-function is an Inverse Function that finds all roots for an equation in the form of y – f(x) = 0. A root-finding process is performed using the uniroot.all function in the R package rootSolve⁵³. Specifically, for each perturbation of biophysical parameter (∆∆G), we looked for all the roots within a range of −2–10 kcal per mol, and returned the ∆∆G values that produce the phenotypes while the other biophysical parameters are not perturbed.

Mutational effects are modelled at a fixed expression level of CI $({CI}_{(Total)} = 8.4 e - 7 M)$ that corresponds to ~99% repression of the P_R promoter and the CI concentration in a lysogen^17,19. To calculate changes in the biophysical parameters for single mutants with known effects on expression from P_R or P_RM, we first generated 136 evenly spaced phenotypes (with an interval of 0.1 in log(2) scale from −13.5 to 0). Then, for a given phenotype, we calculated corresponding changes in any of the four free-energy terms (biophysical parameters), each time allowing only one biophysical parameter to change using the Reverse Function explained in in the previous paragraph.

Phenotypes to free energy for pleiotropic mutations

For any given phenotype, we systematically searched for combinations of biophysical changes that can produce the phenotype. Taking a pleiotropic mutation affecting both protein-folding energy (ΔG_F) and DNA-binding energy (ΔG_B) as an example, we first generated a fixed range of ΔΔG_F (−1 to 5 kcal per mol with an interval of 0.05 kcal per mol). Then, for each ΔΔG_F, we calculated ΔΔG_B that produces the given phenotype using the Reverse Function as described for non-pleiotropic mutations. For mutations affecting three biophysical parameters (protein-folding energy ΔG_F, dimerization energy ΔG_D and DNA-binding energy ΔG_B), we first generated all possible two-way combinations of ΔΔG_F and ΔΔG_D, each from defined ranges of −1 to 5 kcal per mol with an interval of 0.05 kcal per mol. For each combination of ΔΔG_F and ΔΔG_D with the given phenotype, we calculated ΔΔG_B, using the Reverse Function as described for non-pleiotropic mutations.

Double mutant phenotypes from single mutants’ phenotypes

For each double mutant, we simply added the changes in the free energies of both single mutants to the corresponding wild-type free energy. Then, we used the updated parameters to calculate the downstream phenotypes based on the Forward Function explained in the section of Phenotypes to free energy for non-pleiotropic mutations. Double mutants’ phenotypes are rounded to 2 decimal places in log(2) scale in order to avoid counting phenotypes with very similar values as different phenotypes.

Thermodynamic model of simple protein interactions

We considered the protein of interest (that is mutated) to be in three different configuration states: (1) unfolded, (2) folded, and (3) folded and bound (or dimer) (Fig. 6a and Supplementary Fig. 5a). The steady-state equilibrium is in the same format as shown for CI protein in Eqs. (2) and (3). When protein binds to a substrate instead of to itself, it follows Eq. (9).

\frac{[Complex]}{[ProteinX] \cdot [Ligand]} = \exp (\frac{- Δ G_{B}}{R T}) .

Above, [complex] is the concentration of the bound Protein X to its ligand (or substrate molecule). The parameters we used in the model for Figs. 6 and S5 are ∆G_F, WT = −1 kcal per mol; ∆G_{B (or D), WT} = −2 kcal per mol. [Protein X]:[Ligand] = 1:1.

3D visualisation of CI bound to O_R1–3

The 3D structure of CI bound to O_R1–3 was generated based on PDB structure 3bdn, using YASARA software (v 19.7.20).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Supplementary information

Supplementary Information^{(6MB, pdf)}

Peer Review File^{(183.2KB, docx)}

Reporting Summary^{(1.1MB, pdf)}

Acknowledgements

This work was supported by a European Research Council (ERC) Consolidator grant (616434), the Spanish Ministry of Economy and Competitiveness (BFU2017-89488-P and SEV-2012-0208), the Bettencourt Schueller Foundation, Agencia de Gestio d’Ajuts Universitaris i de Recerca (AGAUR, 2017 SGR 1322), and the CERCA Program/Generalitat de Catalunya. We also acknowledge the support of the Spanish Ministry of Economy, Industry and Competitiveness (MEIC) to the EMBL partnership and the Centro de Excelencia Severo Ochoa.

Source data

Source Data^{(64.7MB, xls)}

Author contributions

X.L. performed all analyses and made the figures; X.L. and B.L. conceived the study, designed the analyses and wrote the paper.

Data availability

All data supporting this work are provided within the paper, the supplementary information and the source data file. Source data are provided with this paper.

Code availability

Scripts are publicly available from https://github.com/lehner-lab/Biophysical_Ambiguity. Source data are provided with this paper.

Competing interests

The authors declare no competing interests.

Footnotes

Peer review information Nature Communications thanks Elena Kuzmin and other, anonymous, reviewers for their contributions to the peer review of this work. Peer review reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information is available for this paper at 10.1038/s41467-020-18694-0.

References

1.Claussnitzer M, et al. A brief history of human disease genetics. Nature. 2020;577:179–189. doi: 10.1038/s41586-019-1879-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Lehner B. Genotype to phenotype: lessons from model organisms for human genetics. Nat. Rev. Genet. 2013;14:168–178. doi: 10.1038/nrg3404. [DOI] [PubMed] [Google Scholar]
3.Starita LM, Fields S. Deep mutational scanning: A highly parallel method to measure the effects of mutation on protein function. Cold Spring Harb. Protoc. 2015;2015:711–714. doi: 10.1101/pdb.top077503. [DOI] [PubMed] [Google Scholar]
4.Shendure J, Akey JM. The origins, determinants, and consequences of human mutations. Science. 2015;349:1478–1483. doi: 10.1126/science.aaa9119. [DOI] [PubMed] [Google Scholar]
5.Jelier R, Semple JI, Garcia-Verdugo R, Lehner B. Predicting phenotypic variation in yeast from individual genome sequences. Nat. Genet. 2011;43:1270–1274. doi: 10.1038/ng.1007. [DOI] [PubMed] [Google Scholar]
6.Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods. 2018;15:816–822. doi: 10.1038/s41592-018-0138-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Domingo J, Baeza-Centurion P, Lehner B. The causes and consequences of genetic interactions (epistasis) Annu. Rev. Genomics Hum. Genet. 2019;20:083118–014857. doi: 10.1146/annurev-genom-083118-014857. [DOI] [PubMed] [Google Scholar]
8.Costanzo M, et al. Global genetic networks and the genotype-to-phenotype relationship. Cell. 2019;177:85–100. doi: 10.1016/j.cell.2019.01.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hartl DL, Dykhuizen DE, Dean AM. Limits of adaptation: The evolution of selective neutrality. Genetics. 1985;111:655–674. doi: 10.1093/genetics/111.3.655. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Ptashne, M. A Genetic Switch: Phage Lambda Revisited (Cold Spring Harbor Laboratory Press, 2004).
11.Sauer RT, Jordan SR, Pabo CO. λ Repressor: a model system for understanding protein–DNA interactions and protein stability. Adv. Protein Chem. 1990;40:1–61. doi: 10.1016/S0065-3233(08)60286-7. [DOI] [PubMed] [Google Scholar]
12.Hecht MH, Nelson HC, Sauer RT. Mutations in lambda repressor’s amino-terminal domain: implications for protein stability and DNA binding. Proc. Natl Acad. Sci. USA. 1983;80:2676–2680. doi: 10.1073/pnas.80.9.2676. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Sepúlveda L, Xu H, Zhang J, Wang M. Measurement of gene regulation in individual cells reveals rapid switching between promoter states. Science. 2016;351:1218–1222. doi: 10.1126/science.aad0635. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Golding I. Decision making in living cells: lessons from a simple system. Annu. Rev. Biophys. 2011;40:63–80. doi: 10.1146/annurev-biophys-042910-155227. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Ptashne M, et al. How the lambda repressor and cro work. Cell. 1980;19:1–11. doi: 10.1016/0092-8674(80)90383-9. [DOI] [PubMed] [Google Scholar]
16.Meyer BJ, Ptashne M. Gene regulation at the right operator (OR) of bacteriophage λ. III. λ Repressor directly activates gene transcription. J. Mol. Biol. 1980;139:195–205. doi: 10.1016/0022-2836(80)90304-6. [DOI] [PubMed] [Google Scholar]
17.Ackers GK, Johnson AD, Shea MA. Quantitative model for gene regulation by lambda phage repressor. Proc. Natl Acad. Sci. USA. 1982;79:1129–1133. doi: 10.1073/pnas.79.4.1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Shea MA, Ackers GK. The OR control system of bacteriophage lambda. A physical-chemical model for gene regulation. J. Mol. Biol. 1985;181:211–230. doi: 10.1016/0022-2836(85)90086-5. [DOI] [PubMed] [Google Scholar]
19.Li X, Lalic J, Baeza-Centurion P, Dhar R, Lehner B. Changes in gene expression predictably shift and switch genetic interactions. Nat. Commun. 2019;10:3886. doi: 10.1038/s41467-019-11735-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Lagator M, Paixao T, Barton N, Bollback JP, Guet CC. On the mechanistic nature of epistasis in a canonical cis -regulatory element. Elife. 2017;6:e25192. doi: 10.7554/eLife.25192. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Bray D. Protein molecules as computational elements in living cells. Nature. 1995;376:307–312. doi: 10.1038/376307a0. [DOI] [PubMed] [Google Scholar]
22.Tokuriki N, Tawfik DS. Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol. 2009;19:596–604. doi: 10.1016/j.sbi.2009.08.003. [DOI] [PubMed] [Google Scholar]
23.Stein A, Fowler DM, Hartmann-Petersen R, Lindorff-Larsen K. Biophysical and mechanistic models for disease-causing protein variants. Trends Biochem. Sci. 2019;44:575–588. doi: 10.1016/j.tibs.2019.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Casadio R, Vassura M, Tiwari S, Fariselli P, Luigi Martelli P. Correlating disease-related mutations to their effect on protein stability: a large-scale analysis of the human proteome. Hum. Mutat. 2011;32:1161–1170. doi: 10.1002/humu.21555. [DOI] [PubMed] [Google Scholar]
25.Sahni N, et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell. 2015;161:647–660. doi: 10.1016/j.cell.2015.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Gimble FS, Sauer RT. λ Repressor mutants that are better substrates for RecA-mediated cleavage. J. Mol. Biol. 1989;206:29–39. doi: 10.1016/0022-2836(89)90521-4. [DOI] [PubMed] [Google Scholar]
27.Nelson HC, Sauer RT. Lambda repressor mutations that increase the affinity and specificity of operator binding. Cell. 1985;42:549–558. doi: 10.1016/0092-8674(85)90112-6. [DOI] [PubMed] [Google Scholar]
28.Nelson HCM, Hecht MH, Sauer RT. Mutations defining the operator-binding sites of bacteriophage repressor. Cold Spring Harb. Symp. Quant. Biol. 1983;47:441–449. doi: 10.1101/SQB.1983.047.01.052. [DOI] [PubMed] [Google Scholar]
29.Stayrook S, Jaru-Ampornpan P, Ni J, Hochschild A, Lewis M. Crystal structure of the λ repressor and a model for pairwise cooperative operator binding. Nature. 2008;452:1022–1025. doi: 10.1038/nature06831. [DOI] [PubMed] [Google Scholar]
30.Beckett D, et al. Isolation of λ repressor mutants with defects in cooperative operator binding. Biochemistry. 1993;32:9073–9079. doi: 10.1021/bi00086a012. [DOI] [PubMed] [Google Scholar]
31.Nelson HCM, Sauer RT. Interaction of mutant λ repressors with operator and non-operator DNA. J. Mol. Biol. 1986;192:27–38. doi: 10.1016/0022-2836(86)90461-4. [DOI] [PubMed] [Google Scholar]
32.Hecht, M. H., Sturtevant, J. M. & Sauer, R. T. Effect of single amino acid replacements on the thermal stability of the NH2-terminal domain of phage lambda repressor. Proc. Natl Acad. Sci. USA81, 5685–5689 (1984). [DOI] [PMC free article] [PubMed]
33.Hecht MH, Hehir KM, Nelson HCM, Sturtevant JM, Sauer RT. Increasing and decreasing protein stability: Effects of revertant substitutions on the thermal denaturation of phage λ repressor. J. Cell. Biochem. 1985;29:217–224. doi: 10.1002/jcb.240290306. [DOI] [PubMed] [Google Scholar]
34.Soskine M, Tawfik DS. Mutational effects and the evolution of new protein functions. Nat. Rev. Genet. 2010;11:572–582. doi: 10.1038/nrg2808. [DOI] [PubMed] [Google Scholar]
35.Otwinowski J. Biophysical inference of epistasis and the effects of mutations on protein stability and function. Mol. Biol. Evol. 2018;35:2345–2354. doi: 10.1093/molbev/msy141. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Wodak SJ, et al. Allostery in its many disguises: from theory to applications. Structure. 2019;27:566–578. doi: 10.1016/j.str.2019.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Horovitz A, Fleisher RC, Mondal T. Double-mutant cycles: new directions and applications. Curr. Opin. Struct. Biol. 2019;58:10–17. doi: 10.1016/j.sbi.2019.03.025. [DOI] [PubMed] [Google Scholar]
38.Tokuriki N, Stricher F, Schymkowitz J, Serrano L, Tawfik DS. The stability effects of protein mutations appear to be universally distributed. J. Mol. Biol. 2007;369:1318–1332. doi: 10.1016/j.jmb.2007.03.069. [DOI] [PubMed] [Google Scholar]
39.Otwinowski J, McCandlish DM, Plotkin JB. Inferring the shape of global epistasis. Proc. Natl Acad. Sci. USA. 2018;115:E7550–E7558. doi: 10.1073/pnas.1804015115. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Gjuvsland, A. B., Wang, Y., Plahte, E. & Omholt, S. W. Monotonicity is a key feature of genotype-phenotype maps. Front. Genet. 4, 216 (2013). [DOI] [PMC free article] [PubMed]
41.Olson CA, Wu NC, Sun R. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 2014;24:2643–2651. doi: 10.1016/j.cub.2014.09.072. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Matreyek, K. et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 50, 874–882 (2018). [DOI] [PMC free article] [PubMed]
43.Woodsmith J, et al. Protein interaction perturbation profiling at amino-acid resolution. Nat. Methods. 2017;14:1213–1221. doi: 10.1038/nmeth.4464. [DOI] [PubMed] [Google Scholar]
44.Diss G, Lehner B. The genetic landscape of a physical interaction. Elife. 2018;7:e32472. doi: 10.7554/eLife.32472. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Keren L, et al. Massively parallel interrogation of the effects of gene expression levels on fitness. Cell. 2016;166:1282–1294. doi: 10.1016/j.cell.2016.07.024. [DOI] [PubMed] [Google Scholar]
46.Fowler DM, Stephany JJ, Fields S. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat. Protoc. 2014;9:2267–2284. doi: 10.1038/nprot.2014.153. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Mighell TL, Evans-Dutson S, O’Roak BJ. A saturation mutagenesis approach to understanding PTEN lipid phosphatase activity and genotype-phenotype relationships. Am. J. Hum. Genet. 2018;102:943–955. doi: 10.1016/j.ajhg.2018.03.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Chure G, et al. Predictive shifts in free energy couple mutations to their phenotypic consequences. Proc. Natl Acad. Sci. USA. 2019;116:18275–18284. doi: 10.1073/pnas.1907869116. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Huang GS, Oas TG. Structure and stability of monomeric .lambda. repressor: NMR evidence for two-state folding. Biochemistry. 1995;34:3884–3892. doi: 10.1021/bi00012a003. [DOI] [PubMed] [Google Scholar]
50.Reichardt L, Kaiser AD. Control of lambda repressor synthesis. Proc. Natl Acad. Sci. USA. 1971;68:2185–2189. doi: 10.1073/pnas.68.9.2185. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Maurer R, Meyer BJ, Ptashne M. Gene regulation at the right operator (OR) of bacteriophage λ. I. OR3 and autogenous negative control by repressor. J. Mol. Biol. 1980;139:147–161. doi: 10.1016/0022-2836(80)90302-2. [DOI] [PubMed] [Google Scholar]
52.Brent, R. P. in Algorithms for Minimization Without Derivatives 61–80, 10.1109/TAC.1974.1100629 (1973).
53.Soetaert, K. & Herman, P. M. J. A Practical Guide to Ecological Modelling: Using R as a Simulation Platform (Springer, 2008).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(6MB, pdf)}

Peer Review File^{(183.2KB, docx)}

Reporting Summary^{(1.1MB, pdf)}

Data Availability Statement

All data supporting this work are provided within the paper, the supplementary information and the source data file. Source data are provided with this paper.

Scripts are publicly available from https://github.com/lehner-lab/Biophysical_Ambiguity. Source data are provided with this paper.

[CR1] 1.Claussnitzer M, et al. A brief history of human disease genetics. Nature. 2020;577:179–189. doi: 10.1038/s41586-019-1879-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Lehner B. Genotype to phenotype: lessons from model organisms for human genetics. Nat. Rev. Genet. 2013;14:168–178. doi: 10.1038/nrg3404. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Starita LM, Fields S. Deep mutational scanning: A highly parallel method to measure the effects of mutation on protein function. Cold Spring Harb. Protoc. 2015;2015:711–714. doi: 10.1101/pdb.top077503. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Shendure J, Akey JM. The origins, determinants, and consequences of human mutations. Science. 2015;349:1478–1483. doi: 10.1126/science.aaa9119. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Jelier R, Semple JI, Garcia-Verdugo R, Lehner B. Predicting phenotypic variation in yeast from individual genome sequences. Nat. Genet. 2011;43:1270–1274. doi: 10.1038/ng.1007. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods. 2018;15:816–822. doi: 10.1038/s41592-018-0138-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Domingo J, Baeza-Centurion P, Lehner B. The causes and consequences of genetic interactions (epistasis) Annu. Rev. Genomics Hum. Genet. 2019;20:083118–014857. doi: 10.1146/annurev-genom-083118-014857. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Costanzo M, et al. Global genetic networks and the genotype-to-phenotype relationship. Cell. 2019;177:85–100. doi: 10.1016/j.cell.2019.01.033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Hartl DL, Dykhuizen DE, Dean AM. Limits of adaptation: The evolution of selective neutrality. Genetics. 1985;111:655–674. doi: 10.1093/genetics/111.3.655. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Ptashne, M. A Genetic Switch: Phage Lambda Revisited (Cold Spring Harbor Laboratory Press, 2004).

[CR11] 11.Sauer RT, Jordan SR, Pabo CO. λ Repressor: a model system for understanding protein–DNA interactions and protein stability. Adv. Protein Chem. 1990;40:1–61. doi: 10.1016/S0065-3233(08)60286-7. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Hecht MH, Nelson HC, Sauer RT. Mutations in lambda repressor’s amino-terminal domain: implications for protein stability and DNA binding. Proc. Natl Acad. Sci. USA. 1983;80:2676–2680. doi: 10.1073/pnas.80.9.2676. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Sepúlveda L, Xu H, Zhang J, Wang M. Measurement of gene regulation in individual cells reveals rapid switching between promoter states. Science. 2016;351:1218–1222. doi: 10.1126/science.aad0635. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Golding I. Decision making in living cells: lessons from a simple system. Annu. Rev. Biophys. 2011;40:63–80. doi: 10.1146/annurev-biophys-042910-155227. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Ptashne M, et al. How the lambda repressor and cro work. Cell. 1980;19:1–11. doi: 10.1016/0092-8674(80)90383-9. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Meyer BJ, Ptashne M. Gene regulation at the right operator (OR) of bacteriophage λ. III. λ Repressor directly activates gene transcription. J. Mol. Biol. 1980;139:195–205. doi: 10.1016/0022-2836(80)90304-6. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Ackers GK, Johnson AD, Shea MA. Quantitative model for gene regulation by lambda phage repressor. Proc. Natl Acad. Sci. USA. 1982;79:1129–1133. doi: 10.1073/pnas.79.4.1129. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Shea MA, Ackers GK. The OR control system of bacteriophage lambda. A physical-chemical model for gene regulation. J. Mol. Biol. 1985;181:211–230. doi: 10.1016/0022-2836(85)90086-5. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Li X, Lalic J, Baeza-Centurion P, Dhar R, Lehner B. Changes in gene expression predictably shift and switch genetic interactions. Nat. Commun. 2019;10:3886. doi: 10.1038/s41467-019-11735-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Lagator M, Paixao T, Barton N, Bollback JP, Guet CC. On the mechanistic nature of epistasis in a canonical cis -regulatory element. Elife. 2017;6:e25192. doi: 10.7554/eLife.25192. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Bray D. Protein molecules as computational elements in living cells. Nature. 1995;376:307–312. doi: 10.1038/376307a0. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Tokuriki N, Tawfik DS. Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol. 2009;19:596–604. doi: 10.1016/j.sbi.2009.08.003. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Stein A, Fowler DM, Hartmann-Petersen R, Lindorff-Larsen K. Biophysical and mechanistic models for disease-causing protein variants. Trends Biochem. Sci. 2019;44:575–588. doi: 10.1016/j.tibs.2019.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Casadio R, Vassura M, Tiwari S, Fariselli P, Luigi Martelli P. Correlating disease-related mutations to their effect on protein stability: a large-scale analysis of the human proteome. Hum. Mutat. 2011;32:1161–1170. doi: 10.1002/humu.21555. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Sahni N, et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell. 2015;161:647–660. doi: 10.1016/j.cell.2015.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Gimble FS, Sauer RT. λ Repressor mutants that are better substrates for RecA-mediated cleavage. J. Mol. Biol. 1989;206:29–39. doi: 10.1016/0022-2836(89)90521-4. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Nelson HC, Sauer RT. Lambda repressor mutations that increase the affinity and specificity of operator binding. Cell. 1985;42:549–558. doi: 10.1016/0092-8674(85)90112-6. [DOI] [PubMed] [Google Scholar]

[CR28] 28.Nelson HCM, Hecht MH, Sauer RT. Mutations defining the operator-binding sites of bacteriophage repressor. Cold Spring Harb. Symp. Quant. Biol. 1983;47:441–449. doi: 10.1101/SQB.1983.047.01.052. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Stayrook S, Jaru-Ampornpan P, Ni J, Hochschild A, Lewis M. Crystal structure of the λ repressor and a model for pairwise cooperative operator binding. Nature. 2008;452:1022–1025. doi: 10.1038/nature06831. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Beckett D, et al. Isolation of λ repressor mutants with defects in cooperative operator binding. Biochemistry. 1993;32:9073–9079. doi: 10.1021/bi00086a012. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Nelson HCM, Sauer RT. Interaction of mutant λ repressors with operator and non-operator DNA. J. Mol. Biol. 1986;192:27–38. doi: 10.1016/0022-2836(86)90461-4. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Hecht, M. H., Sturtevant, J. M. & Sauer, R. T. Effect of single amino acid replacements on the thermal stability of the NH2-terminal domain of phage lambda repressor. Proc. Natl Acad. Sci. USA81, 5685–5689 (1984). [DOI] [PMC free article] [PubMed]

[CR33] 33.Hecht MH, Hehir KM, Nelson HCM, Sturtevant JM, Sauer RT. Increasing and decreasing protein stability: Effects of revertant substitutions on the thermal denaturation of phage λ repressor. J. Cell. Biochem. 1985;29:217–224. doi: 10.1002/jcb.240290306. [DOI] [PubMed] [Google Scholar]

[CR34] 34.Soskine M, Tawfik DS. Mutational effects and the evolution of new protein functions. Nat. Rev. Genet. 2010;11:572–582. doi: 10.1038/nrg2808. [DOI] [PubMed] [Google Scholar]

[CR35] 35.Otwinowski J. Biophysical inference of epistasis and the effects of mutations on protein stability and function. Mol. Biol. Evol. 2018;35:2345–2354. doi: 10.1093/molbev/msy141. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Wodak SJ, et al. Allostery in its many disguises: from theory to applications. Structure. 2019;27:566–578. doi: 10.1016/j.str.2019.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Horovitz A, Fleisher RC, Mondal T. Double-mutant cycles: new directions and applications. Curr. Opin. Struct. Biol. 2019;58:10–17. doi: 10.1016/j.sbi.2019.03.025. [DOI] [PubMed] [Google Scholar]

[CR38] 38.Tokuriki N, Stricher F, Schymkowitz J, Serrano L, Tawfik DS. The stability effects of protein mutations appear to be universally distributed. J. Mol. Biol. 2007;369:1318–1332. doi: 10.1016/j.jmb.2007.03.069. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Otwinowski J, McCandlish DM, Plotkin JB. Inferring the shape of global epistasis. Proc. Natl Acad. Sci. USA. 2018;115:E7550–E7558. doi: 10.1073/pnas.1804015115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Gjuvsland, A. B., Wang, Y., Plahte, E. & Omholt, S. W. Monotonicity is a key feature of genotype-phenotype maps. Front. Genet. 4, 216 (2013). [DOI] [PMC free article] [PubMed]

[CR41] 41.Olson CA, Wu NC, Sun R. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 2014;24:2643–2651. doi: 10.1016/j.cub.2014.09.072. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Matreyek, K. et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 50, 874–882 (2018). [DOI] [PMC free article] [PubMed]

[CR43] 43.Woodsmith J, et al. Protein interaction perturbation profiling at amino-acid resolution. Nat. Methods. 2017;14:1213–1221. doi: 10.1038/nmeth.4464. [DOI] [PubMed] [Google Scholar]

[CR44] 44.Diss G, Lehner B. The genetic landscape of a physical interaction. Elife. 2018;7:e32472. doi: 10.7554/eLife.32472. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Keren L, et al. Massively parallel interrogation of the effects of gene expression levels on fitness. Cell. 2016;166:1282–1294. doi: 10.1016/j.cell.2016.07.024. [DOI] [PubMed] [Google Scholar]

[CR46] 46.Fowler DM, Stephany JJ, Fields S. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat. Protoc. 2014;9:2267–2284. doi: 10.1038/nprot.2014.153. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Mighell TL, Evans-Dutson S, O’Roak BJ. A saturation mutagenesis approach to understanding PTEN lipid phosphatase activity and genotype-phenotype relationships. Am. J. Hum. Genet. 2018;102:943–955. doi: 10.1016/j.ajhg.2018.03.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Chure G, et al. Predictive shifts in free energy couple mutations to their phenotypic consequences. Proc. Natl Acad. Sci. USA. 2019;116:18275–18284. doi: 10.1073/pnas.1907869116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Huang GS, Oas TG. Structure and stability of monomeric .lambda. repressor: NMR evidence for two-state folding. Biochemistry. 1995;34:3884–3892. doi: 10.1021/bi00012a003. [DOI] [PubMed] [Google Scholar]

[CR50] 50.Reichardt L, Kaiser AD. Control of lambda repressor synthesis. Proc. Natl Acad. Sci. USA. 1971;68:2185–2189. doi: 10.1073/pnas.68.9.2185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Maurer R, Meyer BJ, Ptashne M. Gene regulation at the right operator (OR) of bacteriophage λ. I. OR3 and autogenous negative control by repressor. J. Mol. Biol. 1980;139:147–161. doi: 10.1016/0022-2836(80)90302-2. [DOI] [PubMed] [Google Scholar]

[CR52] 52.Brent, R. P. in Algorithms for Minimization Without Derivatives 61–80, 10.1109/TAC.1974.1100629 (1973).

[CR53] 53.Soetaert, K. & Herman, P. M. J. A Practical Guide to Ecological Modelling: Using R as a Simulation Platform (Springer, 2008).

PERMALINK

Biophysical ambiguities prevent accurate genetic prediction

Xianghua Li

Ben Lehner

Abstract

Introduction

Fig. 1. Genetic interactions in a transcription factor.

Results

Combining mutations in a thermodynamic model

Genetic prediction for mutations affecting protein stability

Fig. 2. Non-monotonicity results in ambiguous phenotype prediction.

Mutations with other known biophysical effects

Prediction for mutations with unknown biophysical effects

Fig. 3. Biophysical ambiguity prevents phenotype prediction.

Biophysical pleiotropy further confounds genetic prediction

Fig. 4. Biophysical pleiotropy further confounds phenotype prediction.

Biophysical ambiguity confounds genetic prediction

Fig. 5. Biophysical ambiguity as a hidden layer for phenotype prediction.

Biophysical ambiguity in even simpler systems

Fig. 6. Biophysical ambiguity in a protein–protein interaction system.

Discussion

Methods

Methods overview

CI configuration states

Equilibrium between CI unfolded and folded monomer states

Equilibrium between folded CI monomer and free dimer states

Equilibrium between free CI dimer and operator-bound states

CI distribution based on statistical thermodynamics

Probability of PR—ON

Probability of PRM—ON

Calculating free dimer concentration

Biophysical changes to phenotypes

Phenotypes to free energy for non-pleiotropic mutations

Phenotypes to free energy for pleiotropic mutations

Double mutant phenotypes from single mutants’ phenotypes

Thermodynamic model of simple protein interactions

3D visualisation of CI bound to OR1–3

Reporting summary

Supplementary information

Acknowledgements

Source data

Author contributions

Data availability

Code availability

Competing interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Probability of P_R—ON

Probability of P_RM—ON

3D visualisation of CI bound to O_R1–3