Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2025 Jan 3:arXiv:2301.01847v3. Originally published 2023 Jan 4. [Version 3]

Probabilistic Genotype-Phenotype Maps Reveal Mutational Robustness of RNA Folding, Spin Glasses, and Quantum Circuits

Anna Sappington 1,2,3,*, Vaibhav Mohanty 4,2,3,*
PMCID: PMC9882568  PMID: 36713233

Abstract

Recent studies of genotype-phenotype (GP) maps have reported universally enhanced phenotypic robustness to genotype mutations, a feature essential to evolution. Virtually all of these studies make a simplifying assumption that each genotype—represented as a sequence—maps deterministically to a single phenotype, such as a discrete structure. Here, we introduce probabilistic genotype-phenotype (PrGP) maps, where each genotype maps to a vector of phenotype probabilities, as a more realistic and universal language for investigating robustness in a variety of physical, biological, and computational systems. We study three model systems to show that PrGP maps offer a generalized framework which can handle uncertainty emerging from various physical sources: (1) thermal fluctuation in RNA folding, (2) external field disorder in spin glass ground state finding, and (3) superposition and entanglement in quantum circuits, which are realized experimentally on IBM quantum computers. In all three cases, we observe a novel biphasic robustness scaling which is enhanced relative to random expectation for more frequent phenotypes and approaches random expectation for less frequent phenotypes. We derive an analytical theory for the behavior of PrGP robustness, and we demonstrate that the theory is highly predictive of empirical robustness.

I. INTRODUCTION

Systems which take a sequence as input and nontrivially produce a structure, function, or behavior as output are ubiquitous throughout the sciences and engineering. In biological systems such as RNA folding [111], lattice protein folding [4], protein self-assembly [12, 13], and gene regulatory networks [14, 15], the relationship between genotype (stored biological information) and phenotype (observable or functional properties) can be structured as genotype-phenotype (GP) maps, which have a rich history of computational and analytical investigation [134]. Systems from physics and computer science have also been analyzed as GP maps, including the spin glass ground state problem [30], linear genetic programming [26], and digital circuits [31].

Despite being completely disparate systems, all of the GP maps above share a number of common structural features, most notably an enhanced robustness of the phenotypes to genotype mutations. Phenotypic robustness ρn of a phenotype n is the average probability that a single character mutation of a genotype g which maps to n does not change the resultant phenotype n, averaged over all genotypes g mapping to n. A completely random assignment of genotype to phenotype predicts that ρnfn [4], where fn is the fraction of genotypes that map to phenotype n. However, the systems mentioned above display a surprising and substantially enhanced robustness, exhibiting the relationship ρna+blogfnfn with system-dependent constants a and b, meaning that even for rare phenotypes, a small changes to the genotype do not necessarily result in change of the phenotype. It has been shown that, in evolution, this enhanced robustness facilitates discovery of new phenotypes [11, 19, 20, 35] and is crucial for navigating fitness landscapes [5]. As a result, it is important to accurately quantify robustness and its relationship with phenotype frequency.

All GP map studies referenced above, spanning several decades of research, make the assumption that a genotype maps deterministically to a single phenotype. However, we argue that for most of the above systems, this is a major simplification. For instance, by mapping an RNA genotype to only the ground state energy structure, previous studies [111] make an implicit zero temperature approximation for the ensemble of molecules, even if the Gibbs free energy of an individual molecule itself is calculated within the folding software at finite temperature. Similarly, in studies of gene regulatory networks, spin glasses, linear genetic programs, and digital circuits, the systems investigated do not interact with external networks or variables. These investigations assume that the environmental effect on the GP mapping of the subsystem of interest is static. Probabilistic mappings from genotype to phenotype have certainly existed in many realms of science, such as probabilistic classifications of images by neural networks (i.e. sequences of pixel intensities mapping to probabilities of classes). However, the literature still lacks a unifying framework to analyze the single-character mutational robustness of these maps, among other properties, in the way that there already exists a universal language for the deterministic GP (DGP) maps mentioned above.

In this article, we introduce probabilistic genotype-phenotype (PrGP) maps—in contrast to the above systems which we call DGP maps—as a universal framework for analyzing the mutational robustness of sequence-to-discrete classifications. DGP maps thereby emerge as the limiting case of PrGP maps in each genotype or sequence maps to a single phenotype with probability 1 and all other phenotypes with probability 0. The definitions of phenotypic robustness and transition probabilities retain the same physical meaning in PrGP maps as in DGP maps, and we emphasize that PrGP maps can handle disorder and uncertainty emerging from a variety of sources.

To address the implicit zero temperature approximation in sequence-to-structure mappings (RNA, lattice protein folding, protein self-assembly), we study the folding of RNA primary sequences to a canonical ensemble of secondary structures corresponding to low-lying local free energy minima. To address external variable disorder with a known distribution, we study the zero temperature mapping of a spin glass bond configuration to its ground state with quenched external field disorder, building a phenotype probability vector using many replicas of the disordered field. This has implications for viral fitness landscape inference [3640], where external fields, in part, model host immune pressure [39]. Lastly, to investigate inherent uncertainty in phenotypes, we introduce quantum circuit GP maps where uncertainty emerges from superposition and entanglement of classically measurable basis states. Our experimental realization of these quantum circuits on a 7-qubit IBM quantum computer also introduces measurement noise, which has a clear and unique effect on robustness. The PrGP map properties of the three model systems are summarized in Table I and visually in Figure 1. We observe empirically that PrGP maps exhibit a novel biphasic scaling of robustness versus phenotype frequency which, for higher frequency phenotypes, resembles the ρnlogfn seen in DGP maps but is suppressed, and, for lower frequency phenotypes, settles closer to a linear relationship between ρn and fn. We then develop a set of approximations which yield an analytically solvable model of robustness which predicts empirical robustness well outside the approximation regime.

TABLE I.

Overview of the genotypes and phenotypes of each PrGP system, as well as their respective sources of uncertainty.

System Genotype Alphabet Alphabet size k Phenotype Source of Uncertainty

RNA folding {A,U,G,C} (or {G,C}) 4 (or 2) Folded dot-bracket structure Thermal fluctuation, T>0
Spin glass ground state {1,+1} 2 Ground state spin configuration Disordered external field
Quantum circuit {Z,X,Y,H,S,S,T,T} 8 Classical measurement of circuit output Superposition and entanglement

FIG. 1.

FIG. 1.

Schematic representations of the PrGP model systems studied in this work. For each system, its respective genotype (green), a visualization of the system, its phenotypes (pink), and its method for calculating the phenotype probability vector are shown. For RNA folding, the genotype is a primary sequence of nucleotides and the phenotype is the folded dot-bracket structure. For spin glass ground states, the genotype is a bond configuration. Each spin si (pink dots) are connected via this bond configuration Jij (green lines) and a disordered external field hi is applied. The phenotype is the fraction of replicas in which each ground state appears. For quantum circuits, the genotype consists of a subset of gates from a random circuit. The circuit is given a set input state, |000, and the exact phenotype probability vector is the probability of classically measuring each basis state, pn(g)=|n|U(g)|000|2, where U(g) is the circuit operation as a function of the genotype g. The experimental phenotype probability vector is computed from tallying classically measured states from 1000 experimental shots on a quantum computer.

II. THEORY

In this study, we focus on mappings of discrete genotypes, which can be written as sequences from a fixed alphabet, onto a discrete set of phenotypes (i.e. discrete-to-discrete GP maps).

Let Ω(g)=n represent the mapping of genotype g to phenotype n, where g is an element of S,k, the set of all k sequences of length drawn from an alphabet of k characters. A generalization of robustness is the transition probability ϕmn, the average probability that a single character mutation of a genotype mapping to phenotype n will change the phenotype to m, with the average taken over all genotypes mapping to n. For DGP maps, ϕmn is given by

ϕmn={gΩ(g)=n}|{hnn(g)Ω(h)=m}|(k1)|{gΩ(g)=n}|. (1)

where nn(g) is the single character mutational neighborhood of sequence g. In this formula, the numerator is counting how many single-character mutational neighbors of some genotype g (which maps to phenotype m) map to phenotype n. This means that the robustness can be written as the special case m=n:

ρn=ϕnn={gΩ(g)=n}|{hnn(g)Ω(h)=n}|(k1)kfn. (2)

For PrGP maps, we show in Appendix A that the transition probability formula becomes a modified version of eq. (1) in which we take a weighted sum in the numerator. In particular, we have

ϕmn={g,h}Δ,k[p(g)p(h)+(p(g)p(h))T]mn(k1)kfn, (3)

where p(g)=(p0(g),p1(g),) with pn(g)=[Ω(g)=n], the probability that genotype g maps to phenotype n. Again, the robustness ρn=ϕnn. In the above equation, Δ,k is the set of all k(k1)/2 unordered pairs of sequences in S,k which differ by exactly one character. The phenotype probability vector obeys the normalization conditions kf=gS,kp(g) and 1=n{phenotypes}pn(g) for all gS,k, and phenotype robustnesses are given by the diagonal of the transition probability matrix, ρn=ϕnn. We also are interested in the phenotype entropy S(g)=n{phenotypes}pn(g)logpn(g), which quantifies the spread of a genotype’s mappings onto multiple phenotypes, and the genotype entropy

Snγ=g{genotypes}pn(g)fnklogpn(g)fnk, (4)

which quantifies the spread of a phenotype across all genotypes. In particular, we will show that the genotype entropy can be useful for estimating robustness.

In DGP maps, a random null model [4] for robustness can be built by randomly assigning genotype-phenotype pairings while keeping the frequencies f constant. As a result, the probability of a single character mutation leading to a change from phenotype n to phenotype m is approximately ϕmnfm for all m. For PrGP maps, a naive expectation can be built by letting all phenotype probability vectors equal the frequency vector, p(g)=f for all genotypes g. From eq. (3), one finds that ϕmn=fm; thus, the two random expectations are the same, even though they physically represent different scenarios.

A fundamental difference between PrGP maps and DGP maps is that DGP maps can have no frequencies lower than k, but PrGP phenotypes in principle could have arbitrarily small frequencies, suggesting that the PrGP robustness curve has a tail, representing very rare phenotypes, that is not necessarily predictable from existing DGP robustness theory [4, 12, 34]. In this work, we show that under two approximations, the robustness becomes analytically solvable in terms of the phenotype frequency fn and genotype entropy Snγ. Although these approximations make specific assumptions on the shape and distribution of the phenotype’s probability over genotypes which do not necessarily coincide with empirical distributions, we demonstrate in the Results section that, amazingly, the resulting robustness formula below exhibits exceptional predictive performance well outside of the approximations made on the phenotype structure on all 3 systems empirically studied.

The two key approximations are as follows: (1) a phenotype n with frequency fn has probability mass evenly spread across a fixed number of genotypes, and (2) that fixed number of genotypes would be a robustness-maximizing set in the DGP sense (i.e. maximizing eq. (1)). Two central results of this paper which follow from the above assumptions (see Appendix B for the derivation) are the approximate PrGP robustness as a function of the phenotype frequency fn and the genotype entropy Snγ:

ρn(fn,Snγ)kfnSnγeSnγlogk, (5)

and approximate upper bounds on the PrGP robustness given by the piecewise continuous function

ρnPrGPupper(fn){fnk1fnk11+logfnlogkfnk1. (6)

The upper bound illustrates two distinct scaling laws—namely, a DGP-like ρna+blogfn scaling for sufficiently large frequencies, and a null model-like linear scaling ρnfn for small frequencies. Since empirical DGP robustness often scales like a “suppressed” downscaling of the DGP maximum ρnDGPmax1+logfnlogk, the biphasic scaling of the PrGP upper bound suggests that empirical PrGP robustness may also appear biphasic and suppressed relative to the upper bound.

The upper bounds here are approximate because we rely on the genotypes forming a robustness-maximizing set (in the DGP sense, meaning the genotypes tend to be clustered in the sequence space) before we optimize or approximate the spread of the phenotype’s probability mass over those genotypes. Although this may very well be an exact and/or tight bound, we do not prove its tightness here. However, we discuss specific cases of how each of these upper bounds can be achieved in real phenotypes: for rare phenotypes in the “tail” of the robustness upper bound (fnk1), we find that phenotypes which maximize the robustness are spread evenly over exactly k genotypes whose sequences all differ at exactly one character in the sequence, with each of those genotypes having probability pn(g)=fnk1 of mapping to the phenotype. For more common phenotypes with fnk1, an instructive example appears when a phenotype frequency is fn=kr for some integer 1r. We consider r of the sites in the sequence to be “constrained” (using terminology from ref. [12]), meaning that mutating any of those sites will lead to a change in phenotype. The remaining r sites are “unconstrained,” meaning that phenotypes at those sites will not lead to any change in phenotype. If the phenotype probability is pn(g)=1 at all kr of those genotypes, the robustness is exactly equal to the DGP robustness and is simply equal to the probability of mutating an unconstrained site, namely ρn=r/, which attains the upper bound in eq. (6). For frequencies in which fn=kr for some non-integer value of r, finding a configuration in which robustness maximized is nontrivial; for DGP maps, this problem was solved in ref. [34] and the maximal robustness was found to be given by a fractal curve though it asymptotically behaves like eq. (6) with small corrections. The exact upper bound for PrGP maps remains an open problem.

In the Results section, we show that eq. (5), which is highly successful at recapitulating empirical robustness in 3 systems (RNA, spin glasses, quantum circuits), is amenable to further analytical approximation given system-specific information about the genotype entropy Snγ, yielding such biphasic scaling in different frequency regimes.

III. NUMERICAL METHODS

A. RNA

In RNA folding DGP map studies [111], the global free energy minimum secondary structure (reported as a “dot-bracket” string indicating polymer connectivity) was calculated for every RNA sequence of fixed length drawn from the alphabet of the four canonical nucleotides {A,C,G,U} (alphabet size k=4). Here, we are interested in not only the global free energy minimum structures but also the low-lying local minima, and we additionally investigate the temperature-dependent behavior of the robustness. We use the RNAsubopt program from the ViennaRNA package (version 2.4.17) [41] to calculate the secondary structures and associated Gibbs free energies for the local free energy minima within 6 kcal/mol of the global free energy minimum (or all the nonpositive free energy local minima, if the global minimum is greater than −6 kcal/mol). Because of the increased computational time required to discover all the local minima within an energy range, we use a reduced alphabet of {C,G} for our main simulations with sequence length =20. A validation study with =12 and the full k=4 alphabet is reported in the Supplementary Material (SM) [42]. Simulations for the =20, k=2 trials were conducted at 20°C, 37 °C (human body temperature), and 70 °C. We take the low-lying local free energy minima structures to comprise a canonical ensemble at the simulation temperature, so the probability of RNA sequence g mapping to secondary structure n is determined from pn(g)=eΔGn/(RT)/Z, where Z normalizes the vector.

B. Spin Glasses

In a previous spin glass [43, 44] DGP map study [30], a zero temperature ±J spin glass on a random graph 𝒢(V,E) with Hamiltonian H(s;J)={i,j}EJijsisjiVhisi was considered. The genotype is the bond configuration where each Jij{1,+1}, and the phenotype is the ground state configuration where each si{1,+1}. Degeneracies of the ground state were broken by the uniformly drawn, i.i.d. random external fields hi[104,104] which were fixed for each simulation. In our spin glass PrGP map, we use a similar setup, but we are interested in the effect of external field disorder on robustness. We therefore incorporate the effects of Gaussian-distributed external fields hi𝒩(h0,i,σh2), where the uniformly distributed means h0,i[0.1,0.1] are fixed across all realizations of the disorder for each simulation. To obtain accurate robustness measurements, we exactly calculate every ground state for spin glasses with |V|=9, and |E|=15 by exhaustive enumeration. We examine the effect of external field disorder by simulating 450 replicas of {hi} with variances σh2=0.001, 0.01, and 0.1 and fixed means {h0,i}. Phenotype probability vectors for each genotype gJ were constructed by tallying and normalizing the number of appearances of each ground state across each replica. Graph topology 𝒢(V,E) corresponding to data presented here, as well as validation trial data, are in the SM [42].

C. Quantum Circuits

Although methods to evolve quantum circuits have been suggested [45], to our knowledge this work is the first to analyze the structural properties of quantum circuit GP maps. We generate perform 7 trials in which we generate random quantum circuits (see SM for algorithm) with 7 qubits and 4 layers of gates; we also conduct an additional trial with 11 qubits and 4 layers of gates. Circuits are randomly seeded with CNOT gates which cannot participate in the genotype, and the remaining spaces are filled with single-qubit gates drawn from the alphabet {Z,X,Y,H,S,S,T,T}. We choose =4 (=5 for the 11 qubit trial) of these gates to be variable gates which comprise the genotype. The input to the circuit is the prepared state |000|0|0, and the exact probability of classically measuring the basis state |n=|qi{|0,|1}|qi is pn(g)=|n|U(g)|000|2, where |qi is the i-th qubit, and U(g) is the total circuit operation. We realize these quantum circuits on the ibm_lagos v1.2.0 quantum computer [42], one of the 7-qubit IBM Quantum Falcon r5.11H processors. Experimental phenotype probability vectors are constructed from tallying classical measurements from 1000 shots for each genotype. The 11-qubit trial is conducted on a Qiskit Aer simulator instead of an experimental quantum computer, using the ibm_brisbane noise profile to simulate noise. The circuits from our experimental trials are depicted in the SM [42].

IV. RESULTS

After running simulations to obtain the raw PrGP map data from the RNA, spin glass, and quantum circuit numerical experiments, we computed robustness, genotype entropy distributions, and phenotype distributions, which we plot in Figure 2, Appendix C, and Appendix E, respectively. Transition probabilities between different phenotypes and the RNA k=4, =12 genotype entropy distribution are plotted in the SM [42]. As noted previously, validation trial data for spin glasses on a different random graph as well as multiple experimental quantum circuit trials’ data are also provided in the SM [42]. The SM also contains a table in which we note the frequencies of the RNA “unfolded” phenotype; notably, in the k=2, =20 cases, the unfolded phenotype frequency is less than 3% while for k=4, =12 case, the unfolded phenotype frequency is more than 80% and there are much fewer phenotypes, as expected from the RNA12 DGP study [3].

FIG. 2.

FIG. 2.

Plots of (a, d, g) robustness versus frequency, (b, e, h) robustness versus log10(frequency), and (c, f, i) log10(robustness) versus log10(frequency) for (a, b, c) RNA folding, (d, e, f) spin glass ground state, and (g, h, i) quantum circuit PrGP maps. The dashed line is the random null expectation ρn=fn.

In Figure 2 we plot robustness versus frequency, robustness versus log frequency, and log robustness versus log frequency for each of the 3 main systems studied (additional RNA, spin glass, and quantum circuit trials are in the SM [42]). Notable common features across all systems include robustness much higher than predicted by the null model for sufficiently large frequencies and a convergence toward the null model behavior for sufficiently small frequencies. The RNA PrGP maps, all show suppressed robustness relative to their DGP counterparts, and this scaling is further suppressed as temperature increases.

Similarly, in spin glasses, the DGP robustness is highest and closest to the linear-log relationship; the PrGP maps show increasingly suppressed scaling as the disorder variance is tuned higher. In quantum circuit PrGP maps, the trials with experimental or simulated noise show the appearance of a long tail of many new small-frequency phenotypes with, leading to the suppression of the robustness of the large-frequency phenotypes with a maintenance of the approximate log fn scaling.

From the phenotype entropy distributions in Appendix E, we see that as disorder parameters are increased (temperature, field variance, measurement noise), phenotype entropy distributions widen, meaning a genotype is more likely to have a broader distribution of phenotypes to which it maps.

We now make predictions of robustness by directly plugging in measurements of Snγ and fn into eq. (5). We show an example plot of the theoretical robustness, empirical robustness, null model, and upper bound for spin glasses with σh2=0.001 in Figure 3(a). Not only does the the theoretical robustness, given only Snγ and fn, recapitulate the salient scaling behavior of the empirical robustness, as shown in Figure 3(b), but the Pearson correlation between the predicted and empirical robustness is r=0.990; in Table II, we show that the Pearson correlations from robustness obtained from eq. (5) for all systems tabulated range from 0.947–0.99994 and outperformed the null model and DGP maximum robustness formulas across all systems, illustrating the success of eq. (5). While the Pearson correlations are high, the prediction from eq. (5) varies by additive or multiplicative constant factors likely due to violation of one or both assumptions mentioned in the Theory section. As disorder parameters increase, these violations become more prominent and eq. (5) and the null model’s relative performance becomes better (see Table II), meaning that biphasic scaling starts to fade away in favor of null model-like linear scaling when there is too much disorder. Nonetheless, in all cases our analytical theory performs the best and remains highly predictive.

FIG. 3.

FIG. 3.

(a) Plot of log10(frequency) versus robustness ρn, where ρn has either been computed empirically from the experimental data or theoretically from eq. (5) for the spin glass system (σh2=0.001). Includes upper bounds from eq. (6) and null model. (b) Scatter plot of theoretical ρn versus empirical ρn for the spin glass system (σh2=0.001) with Pearson r=0.990.

TABLE II.

Pearson correlation coefficient r between robustness predicted from eq. (5) versus empirically measured robustness. In general, the theory outperforms the null model and the DGP maximum approximations, and overall Pearson correlations are very high, close to 1, highlighting the success of eq. (5). Bold indicates the best-performing model.

System Details Theory vs. Real Pearson r Null vs. Real Pearson r DGP Max vs. Real Pearson r
Spin Glass σh2=0.001 0.990 0.766 0.940
Spin Glass σh2=0.01 0.994 0.874 0.924
Spin Glass σh2=0.1 0.995 0.976 0.954
RNA GC20 20°C 0.947 0.811 0.797
RNA GC20 37°C 0.951 0.854 0.766
RNA GC20 70°C 0.965 0.856 0.665
RNA12 37°C 0.958 0.832 0.882
Quantum Circuit 11 qubit (exact) 0.99994 0.901 0.997
Quantum Circuit 11 qubit (simulation) 0.9991 0.865 0.576
Quantum Circuit 7 qubit, trial 1 (exact) 0.99994 0.926 0.998
Quantum Circuit 7 qubit, trail 1 (exp.) 0.9996 0.912 0.712

We now combine empirical results for Snγ versus fn with eq. (5) to develop a semi-empirical theory for understanding how robustness ρn scales versus fn. We observe in Appendix C that genotype entropy, which is exactly Snγ=logk+logfn for DGP maps, empirically maintains similar scaling

Snγα+ηlogfn (7)

for some α and η in PrGP maps, but is generally increased with respect to the DGP genotype entropy and generally with 0η1. This means that a phenotype with fixed frequency is likely to be spread out over more genotypes in the PrGP case than in the DGP case, as expected. This relationship tends to hold over many orders of magnitude, for all 3 systems, though with slightly differing behavior. For instance, there are some cases where η depends on fn, but is constant for large stretches of frequencies. One example is spin glasses with σh2=0.001, where η1 for most common phenotypes and then suddenly transitions to η0 for sufficiently small frequencies. Regardless, eq. (7) can still be combined with eq. (5) to understand how ρn scales with fn, and different values of η can be used in limits of small or fn.

Substituting in eq. (7) into eq. (5), we have the robustness expression

ρn=keαlogk[f1η(α+ηlogfn)], (8)

Notably, when η=0 (e.g. for sufficiently small frequencies in the spin glass σh2=0.001), eq. (8) becomes ρnfn. In RNA, quantum circuits, and the spin glass σh2=0.1 case, a slope 0<η<1 is observed. In these cases, we can see from the formula above that ρn when fn0. However, the appropriate limit should actually take into account the fact that the smallest phenotype frequency, fmin, is finite. We show in Appendix D that when fnfmineα/η, which is the case for the empirical systems in which 0<η<1, then a power law relationship logρnC+(1η)logfn is expected for many orders of magnitude of frequency. Only after frequencies are so small that fmineα/η is violated would a sharp divergence to occur, but empirical frequencies observed in this study do not reach this regime. In Figure 2, we indeed observe a clear power law relationship for sufficiently small frequencies, which of course simplifies to the aforementioned linear relationship when η=0. When η is small but not 0, it may be difficult to distinguish a power law from a linear relationship from Figure 2.

Lastly, for sufficiently large frequencies, for 0<η<1, we generally have a complex behavior ρf1η(α+ηlogfn) which can be expanded to leading order in fn or to leading order in logfn, depending on the variable choice. For example, substituting xn=logfn into eq. (8), the leading order behavior for small xn becomes ρna+blogfn, which is the expected large frequency behavior. It is important to note that, for examples such as the aforementioned spin glasses with σh2=0.001 and even σh2=0.01, η1 yields the “robust” logarithmic scaling seen in robust DGP maps and the PrGP upper bound computed here. Moreover, in the RNA systems, for sufficiently high frequencies the Snγ versus fn plot points reach the exact DGP Snγ curve, which has η=1. In quantum circuits, the general trend of Snγ is to lay parallel or almost parallel to the DGP Snγ curve, which also suggests η1 or slightly less than 1. However, there are clusters of phenotypes with 0<η<1, which would each have different estimated α values. This leads to a fragmented genotype entropy curve and robustness curve which may not be entirely explained by the monotonic behavior predicted by eq. (8), though combining empirical measurements in eq. (5) still yields excellent predictions. Nonetheless, different regimes of eq. (8) provide evidence for the complex behavior seen in PrGP map robustness.

V. DISCUSSION

Compared to existing DGP maps, PrGP maps not only allow for the inclusion of realistic, physical sources of disorder like thermal fluctuation and external variables, but they also permit the analysis of new systems like quantum circuits with inherent uncertainty. We emphasize the broad applicability of this framework to a vast array of systems across biology, physics, and computer science, and other disciplines for the analysis of robustness and stability. The analytical theory introduced here, which functions well outside of the approximation regimes used to derive it, provides a link between a phenotype’s frequency fn, the genotype entropy Snγ, and the robustness ρn. Given the empirical observation of a logarithmic relationship between Snγ and fn, we can show that for high frequencies a complex ρf1η(α+ηlogfn) robustness relationship, which becomes linear-log (DGPlike) robustness when η=1 is obtained, while for small frequencies linear or power law relationship is expected, depending on system specific information. Moreover, as disorder in a system is increased, phenotypes spread over a larger number of genotypes, leading to increasingly suppressed robustness and more null model-like behavior. Most notably, our theory in eq. (5) is highly successful, measured by Pearson correlation, in predicting empirical robustness across all systems.

The scaling we observe empirically and justify theoretically in this article is observed in all three studied systems, despite being disparate, hinting at its universality. How this robustness trend affects navigability of (probabilistic) fitness landscapes is an important direction for further investigation. We suggest that evolutionary dynamics on fitness landscapes where the genotype-to-phenotype mapping is probabilistic may display unique phenomena which are not present on fitness landscapes with purely deterministic GP mapping.

We also suggest that the mapping of genotypes to probability vectors instead of discrete phenotypes may facilitate the taking of gradients of, for instance, a negative loss-likelihood loss function in the process of learning PrGP or even DGP maps using statistical learning methods. Specifically, one might model a GP map using a graph neural network [46] and predict the phenotype or related properties of neighboring nodes. Such a model may ultimately aid in inferring fitness landscapes from limited initial GP data [4749].

Supplementary Material

Supplement 1

VI. ACKNOWLEDGEMENTS

We acknowledge helpful discussions with Nora Martin, comments from anonymous referees, and the use of IBM Quantum services and the MIT Engaging Cluster for this work. This work was supported by awards T32GM007753 and T32GM144273 from the National Institute of General Medical Sciences, Hertz Foundation Fellowships (VM; AS), and a PD Soros Fellowship (VM). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences, the National Institutes of Health, IBM, or the IBM Quantum Team. The authors declare no known conflict of interest.

References

  • [1].Weiß M. and Ahnert S. E., Neutral components show a hierarchical community structure in the genotype–phenotype map of RNA secondary structure, Journal of The Royal Society Interface 17, 20200608 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Weiß M. and Ahnert S. E., Using small samples to estimate neutral component size and robustness in the genotype–phenotype map of RNA secondary structure, Journal of The Royal Society Interface 17, 20190784 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Aguirre J., Buldu J. M.´, Stich M., and Manrubia S. C., Topological Structure of the Space of Phenotypes: The Case of RNA Neutral Networks, PLoS ONE 6, e26324 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Greenbury S. F., Schaper S., Ahnert S. E., and Louis A. A., Genetic Correlations Greatly Increase Mutational Robustness and Can Both Reduce and Enhance Evolvability, PLOS Computational Biology 12, e1004773 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Greenbury S. F., Louis A. A., and Ahnert S. E., The structure of genotype-phenotype maps makes fitness landscapes navigable, Nature Ecology & Evolution 6, 1742 (2022). [DOI] [PubMed] [Google Scholar]
  • [6].Dingle K., Schaper S., and Louis A. A., The structure of the genotype–phenotype map strongly constrains the evolution of non-coding RNA, Interface Focus 5, 20150053 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Dingle K., Camargo C. Q., and Louis A. A., Input–output maps are strongly biased towards simple outputs, Nature Communications 9, 761 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Dingle K., Ghaddar F., Šulc P., and Louis A. A., Phenotype Bias Determines How Natural RNA Structures Occupy the Morphospace of All Possible Shapes, Molecular Biology and Evolution 39, msab280 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Dingle K., Pérez G. V., and Louis A. A, Generic predictions of output probability based on complexities of inputs and outputs, Scientific Reports 10, 4415 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Jörg T., Martin O. C., and Wagner A., Neutral network sizes of biological RNA molecules can be computed and are not atypically small, BMC Bioinformatics 9, 464 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Wagner A., Robustness and evolvability: a paradox resolved, Proceedings of the Royal Society B: Biological Sciences 275, 91 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Greenbury S. F., Johnston I. G., Louis A. A., and Ahnert S. E., A tractable genotype–phenotype map modelling the self-assembly of protein quaternary structure, Journal of The Royal Society Interface 11, 20140249 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Tesoro S. and Ahnert S. E., Non-deterministic genotype-phenotype maps of biological self-assembly, EPL (Euro-physics Letters) 123, 38002 (2018). [Google Scholar]
  • [14].Camargo C. Q. and Louis A. A., Boolean Threshold Networks as Models of Genotype-Phenotype Maps, Complex Networks XI , 143 (2020). [Google Scholar]
  • [15].Kauffman S., Homeostasis and Differentiation in Random Genetic Control Networks, Nature 224, 177 (1969). [DOI] [PubMed] [Google Scholar]
  • [16].Wagner A., Distributed robustness versus redundancy as causes of mutational robustness, BioEssays 27, 176 (2005). [DOI] [PubMed] [Google Scholar]
  • [17].Wagner A., Robustness and evolvability in living systems, 3rd ed., Princeton studies in complexity (Princeton Univ. Press, Princeton, NJ, 2007) oCLC: 845177181. [Google Scholar]
  • [18].Payne J. L. and Wagner A., Constraint and Contingency in Multifunctional Gene Regulatory Circuits, PLoS Computational Biology 9, e1003071 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Payne J. L., Moore J. H., and Wagner A., Robustness, evolvability, and the logic of genetic regulation, Artificial Life 20, 111 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Payne J. L. and Wagner A., The Robustness and Evolvability of Transcription Factor Binding Sites, Science 343, 875 (2014). [DOI] [PubMed] [Google Scholar]
  • [21].Schaper S. and Louis A. A., The Arrival of the Frequent: How Bias in Genotype-Phenotype Maps Can Steer Populations to Local Optima, PLoS ONE 9, e86635 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Greenbury S. F. and Ahnert S. E., The organization of biological sequences into constrained and unconstrained parts determines fundamental properties of genotype–phenotype maps, Journal of The Royal Society Interface 12, 20150724 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Ahnert S. E., Structural properties of genotype–phenotype maps, Journal of The Royal Society Interface 14, 20170275 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Weiß M. and Ahnert S. E., Phenotypes can be robust and evolvable if mutations have non-local effects on sequence constraints, Journal of The Royal Society Interface 15, 20170618 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Nichol D., Robertson-Tessi M., Anderson A. R. A., and Jeavons P., Model genotype–phenotype mappings and the algorithmic structure of evolution, Journal of The Royal Society Interface 16, 20190332 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Hu T., Tomassini M., and Banzhaf W., A network perspective on genotype–phenotype mapping in genetic programming, Genetic Programming and Evolvable Machines 10.1007/s10710-020-09379-0 (2020). [DOI] [Google Scholar]
  • [27].Manrubia S., Cuesta J. A., Aguirre J., Ahnert S. E., Altenberg L., Cano A. V., Catalán P., Diaz-Uriarte R., Elena S. F., García-Martín J. A., Hogeweg P., Khatri B. S., Krug J., Louis A. A., Martin N. S., Payne J. L., Tarnowski M. J., and Weiß M., From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics, Physics of Life Reviews 38, 55 (2021). [DOI] [PubMed] [Google Scholar]
  • [28].Payne J. L. and Wagner A., The causes of evolvability and their evolution, Nature Reviews Genetics 20, 24 (2019). [DOI] [PubMed] [Google Scholar]
  • [29].Schaper S., Johnston I. G., and Louis A. A., Epistasis can lead to fragmented neutral spaces and contingency in evolution, Proceedings of the Royal Society B: Biological Sciences 279, 1777 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Mohanty V. and Louis A. A., Robustness and stability of spin-glass ground states to perturbed interactions, Physical Review E 107, 014126 (2023), publisher: American Physical Society. [DOI] [PubMed] [Google Scholar]
  • [31].Wright A. H. and Laue C. L., Evolving Complexity is Hard (2022), arXiv:2209.13013 [cs]. [Google Scholar]
  • [32].Johnston I. G., Dingle K., Greenbury S. F., Camargo C. Q., Doye J. P. K., Ahnert S. E., and Louis A. A., Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution, Proceedings of the National Academy of Sciences 119, e2113883119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Mohanty V., Robustness of evolutionary and glassy systems, Ph.D. thesis, University of Oxford (2021). [Google Scholar]
  • [34].Mohanty V., Greenbury S. F., Sarkany T., Narayanan S., Dingle K., Ahnert S. E., and Louis A. A., Maximum mutational robustness in genotype–phenotype maps follows a self-similar blancmange-like curve, Journal of The Royal Society Interface 20, 20230169 (2023), publisher: Royal Society. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Draghi J. A., Parsons T. L., Wagner G. P., and Plotkin J. B., Mutational robustness can facilitate adaptation, Nature 463, 353 (2010), number: 7279 Publisher: Nature Publishing Group. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Louie R. H. Y., Kaczorowski K. J., Barton J. P., Chakraborty A. K., and McKay M. R., Fitness landscape of the human immunodeficiency virus envelope protein that is targeted by antibodies, Proceedings of the National Academy of Sciences 115, E564 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Butler T. C., Barton J. P., Kardar M., and Chakraborty A. K., Identification of drug resistance mutations in HIV from constraints on natural evolution, Physical Review E 93, 022412 (2016). [DOI] [PubMed] [Google Scholar]
  • [38].Barton J. P., Goonetilleke N., Butler T. C., Walker B. D., McMichael A. J., and Chakraborty A. K., Relative rate and location of intra-host HIV evolution to evade cellular immunity are predictable, Nature Communications 7, 11660 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Shekhar K., Ruberman C. F., Ferguson A. L., Barton J. P., Kardar M., and Chakraborty A. K., Spin models inferred from patient-derived viral sequence data faithfully describe HIV fitness landscapes, Physical Review E 88, 062705 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Hopf T. A., Ingraham J. B., Poelwijk F. J., Schärfe C. P. I., Springer M., Sander C., and Marks D. S., Mutation effects predicted from sequence co-variation, Nature Biotechnology 35, 128 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Lorenz R., Bernhart S. H., Höner zu Siederdissen C., Tafer H., Flamm C., Stadler P. F., and Hofacker I. L., ViennaRNA Package 2.0, Algorithms for Molecular Biology 6, 26 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42]. [See Supplemental Material for this work.]
  • [43].Edwards S. F. and Anderson P. W., Theory of spin glasses, Journal of Physics F: Metal Physics 5, 965 (1975). [Google Scholar]
  • [44].Sherrington D. and Kirkpatrick S., Solvable Model of a Spin-Glass, Physical Review Letters 35, 1792 (1975). [Google Scholar]
  • [45].Tandeitnik D. and Guerreiro T., Evolving Quantum Circuits (2022), arXiv:2210.05058 [quant-ph]. [Google Scholar]
  • [46].Kipf T. N. and Welling M., enSemi-Supervised Classification with Graph Convolutional Networks (2017), arXiv:1609.02907 [cs, stat]. [Google Scholar]
  • [47].Shaw R. G. and Geyer C. J., Inferring Fitness Landscapes, Evolution 64, 2510 (2010), publisher: [Society for the Study of Evolution, Wiley]. [DOI] [PubMed] [Google Scholar]
  • [48].Nozoe T., Kussell E., and Wakamoto Y., enInferring fitness landscapes and selection on phenotypic states from single-cell genealogical data, PLOS Genetics 13, e1006653 (2017), publisher: Public Library of Science. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [49].D’Costa S., Hinds E. C., Freschlin C. R., Song H., and Romero P. A., Inferring protein fitness landscapes from laboratory evolution experiments, PLOS Computational Biology 19, e1010956 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].Galkin O. E. and Galkina S. Y., Global extrema of the Delange function, bounds for digital sums and concave functions, Sbornik: Mathematics 211, 336 (2020). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

Articles from ArXiv are provided here courtesy of arXiv

RESOURCES