Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Mar 13.
Published in final edited form as: Phys Rev E. 2023 Nov;108(5-1):054408. doi: 10.1103/PhysRevE.108.054408

Epistasis and pleiotropy shape biophysical protein subspaces associated with drug resistance

C Brandon Ogbunugafor 1,2,3,*, Rafael F Guerrero 4, Miles D Miller-Dickson 5, Eugene I Shakhnovich 6, Matthew D Shoulders 2
PMCID: PMC10935598  NIHMSID: NIHMS1972065  PMID: 38115433

Abstract

Protein space is a rich analogy for genotype-phenotype maps, where amino acid sequence is organized into a high-dimensional space that highlights the connectivity between protein variants. It is a useful abstraction for understanding the process of evolution, and for efforts to engineer proteins towards desirable phenotypes. Few mentions of protein space consider how protein phenotypes can be described in terms of their biophysical components, nor do they rigorously interrogate how forces like epistasis—describing the nonlinear interaction between mutations and their phenotypic consequences—manifest across these components. In this study, we deconstruct a low-dimensional protein space of a bacterial enzyme (dihydrofolate reductase; DHFR) into “subspaces” corresponding to a set of kinetic and thermodynamic traits kcat,KM,Ki, and Tm (melting temperature)]. We then examine how combinations of three mutations (eight alleles in total) display pleiotropy, or unique effects on individual subspace traits. We examine protein spaces across three orthologous DHFR enzymes (Escherichia coli, Listeria grayi, and Chlamydia muridarum), adding a genotypic context dimension through which epistasis occurs across subspaces. In doing so, we reveal that protein space is a deceptively complex notion, and that future applications to bioengineering should consider how interactions between amino acid substitutions manifest across different phenotypic subspaces.

I. INTRODUCTION

For all the sophistication of technologies associated with studying protein structure function—cryoEM, AlphaFold, and deep mutational scanning, for example—basic questions remain about how we consider and measure the shape of genotype-phenotype maps in the study of proteins. Addressing these questions requires theoretical and conceptual instruments that can be used to understand how genotype and amino acid composition confers phenotype, and especially how evolution happens at the protein level. With regard to the latter, evolutionary biologists have used two related analogies—the fitness landscape and protein space—to describe how evolution searches through the space of possibility from genotype to protein phenotype [17].

One important conceptual innovation in the study of genotype-phenotype maps is that phenotypes can often be deconstructed into “micro-landscapes” that are parts of a larger fitness landscape [8]. A foundational study in this area (2005) reconstructed a protein fitness landscape associated with the use of a co-enzyme from component biophysical traits [9]. In the case of some enzymes, there are multiple genotype-phenotype maps corresponding to different biophysical phenotypes: for example, those related to enzyme kinetics and those defining thermodynamic properties (often in the setting of proteins associated with drug resistance) [8,10]. In the language of protein space, the hierarchy of complex phenotypes and their component phenotypes can be framed in terms of protein spaces and their component “subspaces.”

If we consider the possibility that protein space is composed of biophysical subspaces, new questions arise surrounding how those subspaces are constructed, and what their shape (topopgraphy) means for protein evolution. For example, epistasis—defined colloquially as the “surprise at the phenotype when mutations are combined, given the constituent mutations’ individual effects” [11]—has long been known to craft the topography of fitness landscapes. Epistasis remains a provocative concept because it shapes genotype-phenotype maps in surprising ways, creating rugged fitness landscapes with fitness valleys that can undermine or constrain the process of adaptation [1221]. If the protein space through which evolution is operating is composed of several subspaces (Fig. 1), then one might ask how epistasis manifests across each of them.

FIG. 1.

FIG. 1.

A 3D conceptual diagram of protein space and subspace, revealing how different epistatic interactions manifest across different traits (pleiotropy). This image outlines the central concept behind the complex structure of protein-level phenotypes, organized into protein space. The complex phenotype (a) can be deconstructed into subspaces (b) and (c). The heights of the lines correspond to a phenotype value for that particular space trait. Note the differences in topography. Importantly, these differences manifest because of differences in epistasis between mutations. The [0] and [1] values correspond to the presence (1) or absence (0) of a mutation at a given location. The size of the protein spaces in this schematic (including eight alleles) is the same as that used in this study, but this need not be the case. The subspace concept transcends a space of any size: one could construct the space-subspace dichotomy for spaces of hundreds or thousands of nodes.

In examining how mutation effects manifest across different subspaces, we run into a different (perhaps equally) provocative concept from evolutionary theory: pleiotropy, which can be defined by the differential effects of genes or mutations on seemingly disparate traits [22]. Pleiotropy is frequently used in discussions around tradeoffs in evolution, as in a presumed tradeoff between generalism and specialism [23] or pathogen virulence and transmission [24]. However, the concept has a greater reach: it forces us to reconsider the phenotypic effects of every allele or mutation, as the effects we focus on might not be the only (or most meaningful) trait affected.

Relatedly, several studies have examined how nonlinear interactions between locations in a gene (and their putative amino acids) manifest across the traits that correspond to subspaces [810,2527]. Why is this important? Because while protein evolution can be observed at the level of complex phenotypes (e.g., fitness, IC50, or minimal inhibitory concentration), evolution often operates incongruously across the different subspaces (e.g., thermodynamic versus kinetic components in enzymes). Consequently, focusing on the shape of these subspaces—whose shape is dictated by epistasis—is important for resolving and predicting the phenotypic effects of mutations and their putative amino acid substitutions (as in protein evolution and bioengineering).

In this study, we deconstruct sets of protein spaces (composed of eight alleles) for dihydrofolate reductase (DHFR) orthologs in Escherichia coli, Listeria grayi, and Chlamydia muridaum into subspaces corresponding to different biophysical traits: kcat,KM,Ki, and Tm (melting temperature). Specifically, we test the hypothesis that there is no correlation (positive or negative) between the topographies of the different biophysical subspaces, and quantify whatever relationships may exist. In doing so, we reveal how the shapes of protein subspaces can differ profoundly according to biophysical subspace trait. Finally, we discuss the implications of this in the present and future study of evolutionary theory, protein evolution, and bioengineering.

II. METHODS

A. Laboratory measurement of biophysical traits

The data in this study originated from a prior biophysical decomposition of a fitness landscape [8]. The methods for measuring the biophysical traits in this study were also previously described in a prior study [8,28]. We refer readers interested in replicating the laboratory-derived biochemical and biophysical phenotypes to those studies.

B. Nomenclature

For translation purposes, we will employ a particular nomenclature for discussing the mutations and DHFR alleles used in this study. The mutations corresponding to P21L, A26T, and L28R in E. coli and L. grayi are referred to with regard to their combinatorial arrangement. For example, “PAL” corresponds to the enzyme variant with amino acids proline (P), alanine (A), and leucine (L) at the three sites of interest. In C. muridarum, the orthologous mutations are P23L, E28T, and L30R.

C. Notes on the bacterial orthologs

In the Appendix (Table I), we observe the sequence identity matrix for DHFRs derived from E. coli, L. grayi, and C. muridarum. These data are also present in a prior study that examined this same collection of enzymes [8].

TABLE I.

Sequence identity matrix. Blank cells represent spaces with redundant information.

E. coli L. grayi C. muridarum
E. coli
L. grayi 36
C. muridarum 27 23

D. Comparing the topography of the protein spaces and fitness landscapes

Protein spaces were constructed from existing data (see [29]), and compared using the Kendall rank order test and matrix. This test measures the concordance or discordance of the landscapes with respect to the order of the alleles in a landscape. For each species, we ranked genotypes within subspaces, assigning values from 1 (maximum) to 8 (minimum value for the measurement in that subspace). We used these ranks to calculate rank correlations (Kendall’s τ) across all subspace pairs (a method implemented in R; base and corrr packages; [30]).

E. Higher-order epistasis

The system explored in this study has been previously examined with respect to how proteostasis machinery influences higher-order epistasis [27,31]. Moreover, while other studies have measured epistasis on biophysical traits [8,10,32,33], few have rigorously examined how higher-order epistasis manifests across subspaces, or directly compared the shapes of these spaces across species orthologs.

To measure epistasis, we use a method adapted from theoretical computer science and signal processing termed the Walsh-Hadamard transform, which computes a coefficient corresponding to the magnitude and sign of interaction between mutations. The Walsh-Hadamard transform generalizes the standard epistatic coefficient, allowing an analysis of higher-order epistatic interactions. It was pioneered for use in a 2013 study of epistasis that both provided a primer for the calculation and analyzed several combinatorially complete data sets [11]. The Walsh-Hadamard transform has since been further elaborated on and applied to the study of higher-order epistasis across an array of empirical data sets [3436].

While we will describe certain features of the analysis here, those interested in further details should refer to manuscripts that describe and apply the method to data sets similar in structure to the ones analyzed here [11,34]. Note that this approach is only one of myriad methods that one can use to quantify epistasis, and we encourage those interested to engage several reviews that have addressed this topic directly [21,26,37,38]. Moreover, there are new methods that facilitate the measurement of epistasis in large genomic data sets [39,40].

One limitation of the Walsh-Hadamard transform is that the data it employs are generally combinatorially complete with no more than two variants at a given locus (location, or site on a genome) of information. More recent studies have, however, proposed strategies to transcend some of the presumed limitations [41,42]. Nonetheless, in this study we utilize the methods on a combinatorially complete set of mutants, where we can represent the absence or presence of a given mutation by a 0 or 1, respectively, at a given site. For example, we can represent a wild-type gene variant as 000. In this scenario, mutations at each of three sites (e.g., the three mutations corresponding to trimethoprim resistance in E. coli dihydrofolate reductase P21L, A26T, and L28R) encoded as 111 (for example, see Fig. 1).

The full data set for the alleles consists of a vector of phenotypic values (resistance to trimethoprim in the case of the DHFR mutants) for all possible combinations of mutations (eight total), represented by their single amino acid substitutions:

  • For E. coli and L. grayi:
    • PAL, LAL, PAR, PTL, PTR, LAR, LTL, LTR
  • For C. muridarum:
    • PEL, LEL, PER, PTL, PTR, LER, LTL, LTR

In binary notation, [0] can represent the wild-type genotype, and [1] the mutant, and so we can encode the above variants as

000,100,001,010,011,101,110,111

We implement the weighted Walsh-Hadamard transform from [34], which incorporates an additional scaling matrix V, allowing for an interpretation of higher-order epistatic coefficients as averages over different genetic backgrounds. The phenotypic values (the biochemical and biophysical traits in our study) over the combinatorially complete set are arranged in a vector ϰ, whose elements must be ordered properly to enable the correct interpretation of the epistatic coefficients, as we will see.

This vector of genotypes is multiplied by a square matrix, which is itself the product of a diagonal matrix V and a Hadamard matrix H. These are defined recursively by

Vn+1=12Vn00-Vn,V0=1, (1)
Hn+1=HnHnHn-Hn,H0=1, (2)

where n is the number of mutations that define protein spaces in the DHFR orthologs (n=3 in this study). The generalized epistatic coefficients are given by γ in the following expression:

γ=VHϰ. (3)

H=Hn and V=Vn are the matrices from Eqs. (1) and (2) for n=3, and we sometimes refer to γ simply as the Walsh coefficients, a measure of the average epistatic interaction between amino acid substitutions. Analogous to the standard epistatic coefficient, these generalized coefficients γ have the following interpretation:

Positive values correspond to interactions between mutations that on average add to the value of the biophysical phenotype, whereas negative values correspond to interactions between mutations that on average subtract from the value of the biophysical phenotype.

F. Interpreting higher-order interactions across biophysical trait

The standard epistatic coefficient ϵ represents the effect that a mutation at one site has on another site. This can be written as

ϵ=ϰ11-ϰ10-ϰ01-ϰ00, (4)

which is the difference between the effects of a mutation at the second site when a mutation at the first site is present or absent. Note that ϵ is symmetric with respect to swapping the labels of sites, i.e., under ϰabϰba, and is therefore agnostic to which site is called “first” or “second.” In the context of three sites of variation, one can average over the effect of the third site, for example,

γ110=ϰ110-ϰ100-ϰ010-ϰ000+ϰ111-ϰ101-ϰ011-ϰ0012, (5)

where we’ve used the notation γ110 to indicate the epistatic relationship between sites 1 and 2, averaging (“0”) over the third. This background-averaged epistasis is precisely the sixth element of the generalized epistatic coefficients, γ, treating γ000 as the “zeroth” element (see Fig. 2). Note the notational convenience: γ110 is the sixth element, and 110 = 6 in binary. Here we see why the ordering of the phenotype vector ϰ is crucial to the proper interpretation of the generalized epistatic coefficients. Specifically, we use the convention that ϰ is arranged in increasing binary order, regarding bit sequences as integers (e.g. 010 = 2 and 011 = 3 in binary). Thus, for our n=3 case, ϰ is arranged in the order shown in Fig. 2. This ordering, along with the form of the matrix V, ensures that, for example, γ101 has the interpretation of the epistatic coefficient between sites 1 and 3, averaged over site 2.

FIG. 2.

FIG. 2.

Explicit forms of the Hn and Vn matrices for n=3. (+) stands in for +1 and (−) stands in for −1.

Similarly, the average effect of a mutation at a single site can be computed by

γ100=ϰ111-ϰ011+ϰ110-ϰ010+ϰ101-ϰ001+ϰ100-ϰ0004 (6)

for a mutation at the first site. This is the fourth element of the vector γ (see Fig. 2).

Finally, the Walsh-Hadamard transform generalizes to higher-order interactions with the γ111 term (specifically, to order 3 interactions), given by

γ111=ϰ111-ϰ110-ϰ101-ϰ100-ϰ011-ϰ010-ϰ001-ϰ000, (7)

which is the difference between two standard epistatic coefficients between sites 2 and 3, with and without a mutation at the first site. In this way, one measures the extent to which the presence of a third mutation controls the interaction effect between two other mutations. Thus, higher-order epistatic interactions ask how the addition of a mutation at one site affects the mutation interactions among other sites, which is easily generalized to any number of sites. Note that as before, owing to the symmetry in the definition of the coefficients, this value is agnostic to the labeling of sites 1, 2, and 3, and thus represents an analog of the epistatic coefficient for three sites.

  • γ000: The average of the phenotypic value across the combinatorially complete set

  • γ001: The average phenotypic effect of a mutation at the third site (L28R in E. coli and L. grayi; L30R in C. muridarum)

  • γ010: The average phenotypic effect of a mutation at the second site, A26T in E. coli and L. grayi; E28T in C. muridarum

  • γ100: The average phenotypic effect of a mutation at the first site (P21L in E. coli and L. grayi; P23L in C. muridarum)

  • γ011: The average phenotypic effect of the pairwise (second-order) interaction between mutations at the second and third sites, averaged over the genetic background of the first site

  • γ101: The average phenotypic effect of the pairwise (second-order) interaction between mutations at the first and third sites, averaged over the genetic background of the second site

  • γ110 : The average phenotypic effect of the pairwise (second-order) interaction between mutations at the first and second site, averaged over the genetic background of the third site

  • γ111: The phenotypic effect of the third-order interaction between mutations at all three sites (no explicit genetic background is averaged over).

In the set of interactions that we measure, there is one zeroth-order effect, three first-order interactions, three second-order interactions, and one third-order interaction. The third-order interaction would formally qualify as “higher order.”

In addition, one can take the mean of these epistatic coefficients within an order, which can facilitate comparisons between orders. For a given interaction, we compute an epistatic coefficient, E, as in prior studies that have examined higher-order interactions on empirical fitness landscapes [36],

Ei=γi2jγj2, (8)

where the sum is taken over all elements in γj2. Though we have used absolute values in prior examinations of epistasis (e.g., [36]), this iteration of the calculation utilizes the squares, as it is more analogous with an interpretation of signal strength in the basis of epistatic coefficients (as opposed to strengths in the phenotypic basis). This calculation translates to the intensity of the interactions corresponding to a specific order. For example, first-order effects are captured by E001+E010+E100, second-order effects are captured by E110+E101+E011, and the third-order effect is given by E111.

G. A brief note on the language of protein space vs adaptive or fitness landscape

Some of the concepts explored here have previously been framed in terms of protein fitness landscapes [5,43]. The concepts of protein space and the fitness landscape are at least compatible, even identical in some cases [44]. And previous work has explored the similarity between different framings and definitions of the fitness landscape [4,7,44,45] or contains more elaborate discussions on the varied definitions of fitness landscapes. For many problems in molecular evolution, both can be used. But the subtle differences are important to articulate here.

  1. The fitness or adaptive landscape analogy is most appropriate as a representation of genotype-phenotype maps when examining an evolving population. Protein space, on the other hand, is less fixated on any solution or “fitness peak,” but rather focuses on the broader notion that evolutionary possibility can be mapped across an n-dimensional space.

  2. Relatedly, the fitness or adaptive landscape concept can be encumbered by the definition of “fitness.” Protein space can describe the relationship between mutational neighbors (nodes in the space) with respect to any conceivable phenotype, whether it be adaptive or not.

III. RESULTS

This study aimed to examine the topography of biophysical protein subspaces across three orthologs of DHFR (E. coli, L. grayi, and C. muridarum). We offer that pairwise and higher-order epistasis influence topographical differences across ortholog (species of bacteria) and trait subspaces. Our findings are organized into three major categories:

  1. Comparisons between the topography of protein subspaces of DHFR across biophysical traits and orthologs.

  2. Measuring the epistasis between individual mutations of DHFR across biophysical traits and orthologs.

  3. Comparison of the higher-order epistasis that drives these differences in subspace topography.

Where relevant, we will discuss the statistical tests used, our rationale, and the conclusions from those analyses.

A. Comparing the topography of the subspaces

We first organized the data into independent subspaces. We depict the resulting subspaces in terms of how the scaled values (shown in standard deviations) change across biophysical traits (Fig. 3). We then used a Kendall rank order correlation to test the hypothesis that the topographies of the landscapes are independent (have no correlation; see Methods). In Fig. 4 we observe that many significant findings (positive or negative correlations) involve the kinetic traits Ki,kcat, and KM. For example, focusing on the E. coli subspaces: there is a strong negative correlation between Ki and kcat(p < 0.01). Also for E. coli, note the less strong but significant concordance between kcat and KM(p<0.05). Similar results can be observed within and between the other species, reflecting both biophysical patterns (again, the kinetic traits are related), and widespread variation across the expanse of protein spaces.

FIG. 3.

FIG. 3.

Measurements of protein subspaces across orthologous bacterial enzymes (DHFR). Scaled values are shown in standard deviations. Measurements of the different subspaces for the suite of mutations associated with resistance to antifolate drugs in dihydrofolate reductase. Values for each subspace are scaled according to the mean of the value for that trait. Comparing the topography of the subspaces within and across species suggests varying patterns of similarities and differences across subspaces.

FIG. 4.

FIG. 4.

Comparison of topography across subspaces. We use a Kendall rank-order test to quantify correlations among the landscapes of five traits in three species. * p < 0.05, ** p < 0.01. Concordant and discordant subspaces tend to be focused on the kinetic traits: KM,kcat, and Ki. See Fig. 7 for a depiction of the rank orders.

Next, we computed these epistatic effects as outlined in the Methods, even ranking the interactions and effects, across trait and bacterial orthologs (Fig. 5). We then reorganized the effects from Fig. 5 into higher-order terms. That is, we squared the effects depicted in Fig. 5 and summed them according to order—zeroth, first, second (pairwise), or third (see Methods). This lens, depicted in Fig. 6, shows how mutations and their interactions can differ drastically across subspace traits and orthologs.

FIG. 5.

FIG. 5.

Epistasis meets pleiotropy: measurements of epistasis across orthologous subspaces. We compute the Walsh-Hadamard coefficient for each sort of interaction. Top panel: Individual graphs correspond to different traits, starting with the complex trait (IC50) on the left, followed by the individual biochemical and biophysical traits Ki,KM,kcat, and Tm. The x axis depicts individual mutation effects, with [1]’s corresponding to the presence of a mutation at a given location. A [1] at the first site corresponds to the presence of the P21L (E. coli and L. grayi) or P23L mutation (C. muridarum). A [1] at the second site corresponds to the A26T (E. coli and L. grayi) or E28T (C. muridarum) mutations, and a [1] at the last site corresponds to the L28R (E. coli and L. grayi) or L30R (C. muridarum) mutations. Bottom panel: The same data as in the top panel, depicted as rank orders of effects. For example, for L. grayi the 1*1 pairwise interaction between the P21L and A26T mutations is the highest ranked (has the highest magnitude) of all of the interactions.

FIG. 6.

FIG. 6.

Higher-order epistasis across traits and orthologous subspaces: As outlined in the methods, the epistatic coefficients can be reorganized to depict the effects by order. This approach aids in efforts to compare the overall presence of epistatic effects of a certain order across biophysical subspaces. We observe that different sorts of interactions govern certain biophysical traits. For example, KM is dominated by pairwise interactions in C. muridarum, third-order interactions in E. coli, and first-order and pairwise interactions in L. grayi.

For E. coli, pairwise and third-order effects predominate in KM and Ki.Tm, however, is most influenced by the combination of wild-type mutations. For L. grayi, pairwise effects dominate the kinetic traits—KM,kcat, and Ki—with third-order effects playing a meaningful role in abundance. Recall from the comparison of the topography of landscapes (Fig. 4) that these traits were the ones with the strongest patterns of concordance or discordance. While not a rule that applies across the entire data set, this theme suggests that similar patterns of epistasis operate on traits that are biologically related, a finding that is consistent with our intuition.

Zeroth-order effects (corresponding to the combination of mutations present in the wild-type L. grayi DHFR) are especially meaningful in the Tm subspace. In C. muridarum, pairwise or third-order effects dominate every subspace except for Tm subspace, where zeroth-order effects predominate. Indeed, while patterns differ across ortholog subspaces, one consistent observation is the relative lack of higher-order effects operating on the Tm subspace. In all three species, zeroth-order effects were a notable influence on Tm.

B. Analysis of epistasis that underlies differences in subspace topography

Using the Walsh-Hadamard transform, we then calculated the average effects of mutations, as well as the pairwise and three-way effects, on a range of traits across the IC50 and biophysical subspaces. These calculations revealed large differences in patterns of epistasis across subspace traits (Fig. 5). Note again that IC50 is depicted alongside the subspaces, for visualization purposes, so that we can see how the subspaces compare to the higher-level space (IC50).

IV. DISCUSSION

In this study, we measured epistasis across four biophysical subspaces (kcat,KM,Ki, and Tm) of three orthologs (Escherichia coli, Listeria grayi, and Chlamydia muridarum) of dihydrofolate reductase, an enzyme target of antimicrobial drugs. Our findings fortify the notion that epistatic interactions remain a major challenge in resolving phenotype from genotype, because mutations can tune the shape of different subspaces of a single protein differently. In this way, our study offers insight into the interface between two population genetics concepts—epistasis and pleiotropy—each of which are important forces in adaptive evolution [46,47].

We observe that the shape of protein space differs across orthologs of DHFR (Figs. 35). This finding emphasizes how even relatively minor differences in amino acid sequence (corresponding to the three species of bacteria) can have meaningful consequences for structuring protein space. Figures 3 and 4 highlight several relationships (both concordance and discordance) between the shapes of protein spaces associated with IC50,kcat,KM, and Ki. For example, there is strong discordance between kcat and Ki across the protein spaces, and relatively strong concordance between kcat and KM. These results are intuitive: kcat and KM are subspace phenotypes that are properties of the enzyme active site, with both Michaelis-Menten and mechanistic bases for an expected relationship between traits (e.g., that kcat and Ki should be discordant, because chemical inhibitors interfere with catalysis). By contrast, note that the Tm protein subspaces appear to be uncorrelated—neither concordant nor discordant—across the three species of bacteria. What explains this pattern? We can only speculate. Unlike the other phenotypes measured in this study, Tm is a global protein trait that might be relatively less influenced by properties that focus on the enzyme active site.

Measures of epistasis (Figs. 5 and 6) tell an important part of the story. Each subspace has patterns of similarity and difference according to trait and ortholog. For example, pairwise interactions between mutations appear to be an actor in many kinetic traits, but third-order interactions are relatively low in magnitude in the L. grayi ortholog of DHFR, across traits. Notably, the Tm subspace is dominated by zeroth-order effects—where mutations in the wild-type genotype had the largest interaction effect (Fig. 6). This observation complements the comparisons of concordance or discordance depicted in Fig. 3 (and discussed above). As with the lack of correlation between Tm and the other trait subspaces, the result could be related to the thermodynamics that contribute to a given protein’s Tm, which may be more reliant on global features of an amino acid sequence rather than peculiar interactions between mutations that influence resistance to a small molecule. Future studies will examine this point at a more rigorous level.

A. Study limitations

This study has several limitations. The protein spaces explored are low-dimensional, each composed of only eight nodes. This represents a very small slice of the true protein space (astronomical in size), encompassing only a set of engineered mutations corresponding to those identified in experimental populations of bacteria exposed to trimethoprim [48]. Furthermore, the conversation about epistasis in evolutionary theory has grown in sophistication in recent years, with ideas such as “global epistasis” adding a new point of intrigue. Global epistasis refers to the notion that epistatic effects follow a system-wide pattern (e.g., diminishing returns) and arise from linear relationships between the phenotypic effects of a mutation and the fitness of the genetic background [4952]. Global epistasis, as a phenomenon of nonlinear genotype-to-phenotype mapping, is likely to compound the effects we describe in our study. The genotype-to-phenotype map is specific to each trait, so global “diminishing returns” effects are likely to affect each subspace differently. This constitutes a current area of investigation.

B. Ideas and speculation

Our results have direct implications for efforts to engineer proteins using directed evolution or other approaches. For example, evolving a thermostable enzyme would amount to selection across one of the subspaces measured in our study Tm. Our study suggests that such directed evolution efforts should consider not only how mutations associated with increased thermostability interact epistatically but also the pleiotropic consequences of this epistasis on other traits. One might even contrive a new term that describes how epistatic interactions between mutations manifest across different traits. “Pleiotropic epistasis” or even “pleio-stasis” are natural chimeric terms that capture the essence of pleiotropy and epistasis. Nelogisms can, however, be confusing, and so we didn’t introduce it formally earlier in the paper.

Importantly, our study differs from an influential 2005 study that examined epistasis on component biophysical traits of an enzyme (isopropyl malate dehydrogenase; IMDH) [9]. In that study, epistasis was minimal across biophysical subspace traits, but was present in higher level of protein fitness phenotypes. In our study, not only is epistasis acting on biophysical subspace traits, but patterns also differ from subspace trait to subspace trait. That different enzymes have unique patterns of epistasis is not unexpected, but still notable, and highlights the importance of not overgeneralizing results from the study of a single, or just a few, enzymes.

Future studies can utilize newer technologies to examine protein space at a larger scale. For example, the use of deep mutational scanning has revealed substitutions in SARS-CoV-2 proteins that may be relevant for the design of vaccines and therapeutics [5355], and revealed how host cell chaperones shape the evolution of viral pathogens [5661]. These tools may demonstrate how epistasis and pleiotropy play out across subspaces with thousands of nodes.

Furthermore, efforts to direct the evolution of protein phenotypes might be improved with the knowledge of how forces like epistasis, pleiotropy, and genotypic context function as “knobs” that tune higher-level protein phenotypes. Even further, this lens may aid in public health efforts to resolve the effects of mutations in proteins associated with pathogen evolution. For example, the effects of mutations on larger-scale pathogen phenotypes such as transmissibility (e.g., SARS-CoV-2) might be better diagnosed and understood mechanistically by examining their effects on spike protein subspaces.

C. Conclusion

Our findings suggest that even single-locus complex traits—like the IC50 of an enzyme target of drugs—contain biophysical subspace multitudes. This take has several implications for how we consider the process of protein evolution. Rather than describing evolution as moving “up” or “across” a rugged global fitness landscape, it can be more readily described as a combination of multiple searches through different subspaces. Such an interpretation is in line with modern efforts in complexity science that seek to understand the vargaries of byzantine biological systems, by disentangling the parcels that compose them. And this perspective can be animated in efforts to use evolution as a tool to engineer biomolecules for practical use.

Data and code can be found at [62].

ACKNOWLEDGMENTS

The authors thank J.Yoon, S. Scarpino, B. Kerr, J. Rodrigues, J. Diaz-Colunga, K. Kabengele, and two anonymous peer reviewers for helpful interactions on the manuscript. The authors acknowledge support from the National Institutes of Health Grants No. R35GM136354 (M.D.S.), R35GM147107 (R.F.G.), and R01AI168166 (M.D.S. and C.B.O.), and the National Science Foundation’s Division of Environmental Biology Award No. 2142720 (C.B.O.). The authors would also like to thank the Martin Luther King Jr. Visiting Professors and Scholars Program at the Massachusetts Institute of Technology for support (C.B.O.). Finally, the authors would like to thank the organizers and participants in the workshop “Reimagining the Central Dogma” at The Foundations Institute, University of California, Santa Barbara, where ideas relevant to this manuscript were discussed.

C.B.O. and R.F.G. conceived the project; C.B.O., R.F.G., and M.M.D. collected and analyzed data; C.B.O., R.F.G., E.I.S, and M.D.S. interpreted and integrated data; C.B.O., E.I.S., and M.D.S. supervised the project; and C.B.O., R.F.G., M.M.D., E.I.S., and M.D.S. wrote the paper.

APPENDIX

Here we provide Fig. 7, with rank orders of the TEM-1/TEM-50 alleles corresponding to those in Fig. 2, and Table I, with sequence identity matrix for the three species analyzed in this study (E. coli, L. grayi, and C. muridarum).

FIG. 7.

FIG. 7.

Rank orders of the TEM-1/TEM-50 alleles corresponding to those in Fig. 2. We have utilized binary notation for this representation. As outlined in the Methods section, binary notation corresponds to different alleles across the different species. For E. coli and L. grayi: 000 (PAL), 100 (LAL), 001 (PAR), 010 (PTL), 100 (LAL) 011 (PTR), 101 (LAR), 110(LTL), 111(LTR). For C. muridarum: 000 (PEL), 100 (LEL), 001 (PER), 010 (PTL), 100(LEL), 011 (PTR), 101 (LER), 110(LTL), 111(LTR).

References

  • [1].Smith JM, Natural selection and the concept of a protein space, Nature (London) 225, 563 (1970). [DOI] [PubMed] [Google Scholar]
  • [2].DePristo MA, Weinreich DM, and Hartl DL, Missense meanderings in sequence space: A biophysical view of protein evolution, Nat. Rev. Genet 6, 678 (2005). [DOI] [PubMed] [Google Scholar]
  • [3].Weinreich DM, Delaney NF, DePristo MA, and Hartl DL, Darwinian evolution can follow only very few mutational paths to fitter proteins, Science 312, 111 (2006). [DOI] [PubMed] [Google Scholar]
  • [4].Arnold FH, The library of Maynard-Smith: My search for meaning in the protein universe, Microbe 6, 316 (2011). [Google Scholar]
  • [5].de Visser JAG and Krug J, Empirical fitness landscapes and the predictability of evolution, Nat. Rev. Genet 15, 480 (2014). [DOI] [PubMed] [Google Scholar]
  • [6].Currin A, Swainston N, Day PJ, and Kell DB, Synthetic biology for the directed evolution of protein biocatalysts: Navigating sequence space intelligently, Chem. Soc. Rev 44, 1172 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Arnold FH, Innovation by evolution: Bringing new chemistry to life (Nobel Lecture), Angew. Chem. Int. Ed 58, 14420 (2019). [DOI] [PubMed] [Google Scholar]
  • [8].Rodrigues JV, Bershtein S, Li A, Lozovsky ER, Hartl DL, and Shakhnovich EI, Biophysical principles predict fitness landscapes of drug resistance, Proc. Natl. Acad. Sci. USA 113, E1470 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Lunzer M, Miller SP, Felsheim R, and Dean AM, The biochemical architecture of an ancient adaptive landscape, Science 310, 499 (2005). [DOI] [PubMed] [Google Scholar]
  • [10].Knies J, Cai F, and Weinreich DM, Enzyme efficiency but not thermostability drives cefotaxime resistance evolution in TEM-1 β-lactamase, Mol. Biol. Evol 34, 1040 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Weinreich DM, Lan Y, Wylie CS, and Heckendorn RB, Should evolutionary geneticists worry about higher-order epistasis?, Curr. Opin. Genet. Dev 23, 700 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Bridgham JT, Ortlund EA, and Thornton JW, An epistatic ratchet constrains the direction of glucocorticoid receptor evolution, Nature (London) 461, 515 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Chou H-H, Chiu H-C, Delaney NF, Segrè D, and Marx CJ, Diminishing returns epistasis among beneficial mutations decelerates adaptation, Science 332, 1190 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Draghi JA and Plotkin JB, Selection biases the prevalence and type of epistasis along adaptive trajectories, Evolution 67, 3120 (2013) [DOI] [PubMed] [Google Scholar]
  • [15].Lindsey HA, Gallie J, Taylor S, and Kerr B, Evolutionary rescue from extinction is contingent on a lower rate of environmental change, Nature (London) 494, 463 (2013). [DOI] [PubMed] [Google Scholar]
  • [16].Greene D and Crona K, The changing geometry of a fitness landscape along an adaptive walk, PLoS Comput. Biol 10, e1003520 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Anderson DW, McKeown AN, and Thornton JW, Intermolecular epistasis shaped the function and evolution of an ancient transcription factor and its DNA binding sites, eLife 4, e07864 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Crona K and Luo M, Higher order epistasis and fitness peaks, arXiv:1708.02063. [Google Scholar]
  • [19].Kaznatcheev A, Computational complexity as an ultimate constraint on evolution, Genetics 212, 245 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Lozovsky ER, Daniels RF, Heffernan GD, Jacobus DP, and Hartl DL, Relevance of higher-order epistasis in drug resistance, Mol. Biol. Evol 38, 142 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Bank C, Epistasis and adaptation on fitness landscapes, Annu. Rev. Ecol. Evol. Syst 53, 457 (2022). [Google Scholar]
  • [22].Stearns FW, One hundred years of pleiotropy: A retrospective, Genetics 186, 767 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Duffy S, Turner PE, and Burch CL, Pleiotropic costs of niche expansion in the RNA bacteriophage ϕ6, Genetics 172, 751 (2006) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Lenski RE, Evolution of plague virulence, Nature (London) 334, 473 (1988). [DOI] [PubMed] [Google Scholar]
  • [25].Ortlund EA, Bridgham JT, Redinbo MR, and Thornton JW, Crystal structure of an ancient protein: Evolution by conformational epistasis, Science 317, 1544 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Starr TN and Thornton JW, Epistasis in protein evolution, Protein Sci. 25, 1204 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Guerrero RF, Scarpino SV, Rodrigues JV, Hartl DL, and Ogbunugafor CB, Proteostasis environment shapes higher-order epistasis operating on antibiotic resistance, Genetics 212, 565 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Bershtein S, Mu W, Serohijos AW, Zhou J, and Shakhnovich EI, Protein quality control acts on folding intermediates to shape the effects of mutations on organismal fitness, Mol. Cell 49, 133 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Mira PM, Meza JC, Nandipati A, and Barlow M, Adaptive landscapes of resistance genes change as antibiotic concentrations change, Mol. Biol. Evol 32, 2707 (2015). [DOI] [PubMed] [Google Scholar]
  • [30].Kuhn M, Jackson S, and Cimentada J, corrr: Correlations in R, R package version 0.4.4 (2022).
  • [31].Ogbunugafor CB and Eppstein MJ, Genetic background modifies the topography of a fitness landscape, influencing the dynamics of adaptive evolution, IEEE; Access 7, 113675 (2019). [Google Scholar]
  • [32].Otwinowski J, Biophysical inference of epistasis and the effects of mutations on protein stability and function, Mol. Biol. Evol 35, 2345 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Olson CA, Wu NC, and Sun R, A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain, Curr. Biol 24, 2643 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Poelwijk FJ, Krishna V, and Ranganathan R, The context-dependence of mutations: A linkage of formalisms, PLoS Comput. Biol 12, e1004771 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Weinreich DM, Lan Y, Jaffe J, and Heckendorn RB, The influence of higher-order epistasis on biological fitness landscape topography, J. Stat. Phys 172, 208 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Ogbunugafor CB, The mutation effect reaction norm (mu-rn) highlights environmentally dependent mutation effects and epistatic interactions, Evolution 76, 37 (2022). [DOI] [PubMed] [Google Scholar]
  • [37].Domingo J, Baeza-Centurion P, and Lehner B, The causes and consequences of genetic interactions (epistasis), Annu. Rev. Genom. Human Gen 20, 433 (2019). [DOI] [PubMed] [Google Scholar]
  • [38].Barnes JE, Miller CR, and Ytreberg FM, Searching for a mechanistic description of pairwise epistasis in protein systems, Proteins Struct. Funct. Bioinf 90, 1474 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Crawford L, Zeng P, Mukherjee S, and Zhou X, Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits, PLoS Gen. 13, e1006869 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Stamp J, DenAdel A, Weinreich D, and Crawford L, Lever-aging the genetic correlation between traits improves the detection of epistasis in genome-wide association studies, bioRxvid:2022.11.30.518547 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Doro S and Herman MA, On the fourier transform of a quantitative trait: Implications for compressive sensing, J. Theor. Biol 540, 110985 (2022). [DOI] [PubMed] [Google Scholar]
  • [42].Faure AJ, Lehner B, Miró Pina V, Serrano Colome C, and Weghorn D, An extension of the Walsh-Hadamard transform to calculate and model epistasis in genetic landscapes of arbitrary shape and complexity, bioRxiv:2023.03.06.531391 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Romero PA and Arnold FH, Exploring protein fitness landscapes by directed evolution, Nat. Rev. Mol. Cell Biol 10, 866 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Gavrilets S, Fitness Landscapes and the Origin of Species (MPB-41), Monographs in Population Biology Vol. 41 (Princeton University Press, Princeton, 2018). [Google Scholar]
  • [45].Ogbunugafor CB, A reflection on 50 years of John Maynard Smith’s “protein space”, Genetics 214, 749 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Østman B, Hintze A, and Adami C, Impact of epistasis and pleiotropy on evolutionary adaptation, Proc. R. Soc. B 279, 247 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Kinsler G, Geiler-Samerotte K, and Petrov DA, Fitness variation across subtle environmental perturbations reveals local modularity and global pleiotropy of adaptation, eLife 9, e61271 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [48].Toprak E, Veres A, Michel J-B, Chait R, Hartl DL, and Kishony R, Evolutionary paths to antibiotic resistance under dynamically sustained drug selection, Nat. Genet 44, 101 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [49].Kryazhimskiy S, Rice DP, Jerison ER, and Desai MM, Global epistasis makes adaptation predictable despite sequence-level stochasticity, Science 344, 1519 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].Otwinowski J, McCandlish DM, and Plotkin JB, Inferring the shape of global epistasis, Proc. Natl. Acad. Sci. USA 115, E7550 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [51].Diaz-Colunga J, Skwara A, Gowda K, Diaz-Uriarte R, Tikhonov M, Bajic D, and Sanchez A, Global epistasis on fitness landscapes, Philo. Trans. Roy. Soc. B 378, 20220053 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Diaz-Colunga J, Skwara A, Vila JCC, Bajic D, and Sánchez Á, Global epistasis and the emergence of ecological function, bioRxviv:2022.06.21.496987 (2022). [DOI] [PubMed] [Google Scholar]
  • [53].Starr TN, Greaney AJ, Hilton SK, Ellis D, Crawford KH, Dingens AS, Navarro MJ, Bowen JE, Tortorici MA, Walls AC et al. , Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding, Cell 182, 1295 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [54].Starr TN, Greaney AJ, Stewart CM, Walls AC, Hannon WW, Veesler D, and Bloom JD, Deep mutational scans for ACE2 binding, RBD expression, and antibody escape in the SARS-CoV-2 omicron BA. 1 and BA.2 receptor-binding domains, PLoS Pathogens 18, e1010951 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [55].Leonard AC, Weinstein JJ, Steiner PJ, Erbse AH, Fleishman SJ, and Whitehead TA, Stabilization of the SARS-CoV-2 receptor binding domain by protein core redesign and deep mutational scanning, Protein Eng. Des. Sel 35, gzac002 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [56].Phillips AM, Ponomarenko AI, Chen K, Ashenberg O, Miao J, McHugh SM, Butty VL, Whittaker CA, Moore CL, Bloom JD et al. , Destabilized adaptive influenza variants critical for innate immune system escape are potentiated by host chaperones, PLOS Biol. 16, e3000008 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Phillips AM, Doud MB, Gonzalez LO, Butty VL, Lin Y-S, Bloom JD, and Shoulders MD, Enhanced ER proteostasis and temperature differentially impact the mutational tolerance of influenza hemagglutinin, eLife 7, e38795 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [58].Yoon J, Nekongo EE, Patrick JE, Hui T, Phillips AM, Ponomarenko AI, Hendel SJ, Sebastian RM, Zhang YM, Butty VL et al. , The endoplasmic reticulum proteostasis network profoundly shapes the protein sequence space accessible to HIV envelope, PLoS Biol. 20, e3001569 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [59].Geller R, Pechmann S, Acevedo A, Andino R, and Frydman J, Hsp90 shapes protein and rna evolution to balance trade-offs between protein stability and aggregation, Nat. Commun 9, 1781 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [60].Yoon J, Patrick JE, Ogbunugafor CB, and Shoulders MD, Viral evolution shaped by host proteostasis networks, Annu. Rev. Virol 10, 77 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [61].Phillips AM, Gonzalez LO, Nekongo EE, Ponomarenko AI, McHugh SM, Butty VL, Levine SS, Lin Y-S, Mirny LA, and Shoulders MD, Host proteostasis modulates influenza evolution, eLife 6, e28652 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [62]. https://github.com/OgPlexus/subspace1 .

RESOURCES