Abstract
Understanding how viral proteins adapt under immune pressure while preserving structural viability is crucial for anticipating the emergence of antibody-resistant variants. Here, we present a probabilistic framework that predicts the evolutionary trajectories of viral escape, revealing immune evasion is funneled through a remarkably small number of viable paths compared to total mutational space. These escape funnels arise from the combined constraints of protein viability and escape from antibodies, which we model using a generative model trained on structural homologs and deep mutational scanning data. We derive a mean-field approximation of evolutionary path ensembles, enabling us to quantify both the fitness and entropy of escape routes. Applied to the SARS-CoV-2 receptor binding domain, our framework reveals convergent evolution patterns, accurately predicts mutation sites in emerged variants of concern, and explains the differential effectiveness of antibody cocktails. In particular, we show that combinations of antibodies with de-correlated escape profiles slow viral adaptation by increasing the mutational effort and viability cost required for escape.
Keywords: Viral adaptation, Mutational pathways, Restricted Boltzmann machines, SARS-CoV-2, Antibody escape, Protein evolution
INTRODUCTION
Anticipating the evolutionary trajectories of SARS-CoV-2 has become pivotal to pandemic preparedness, driven by the virus’s continuous adaptation under immune pressure1,2. Central to these adaptations is the receptor-binding domain (RBD) of the viral spike protein, critical for host cell entry and a primary target of neutralizing antibodies3. Immune responses elicited by prior infections and widespread vaccinations have imposed selective pressures that significantly shape viral evolution, resulting in the emergence of variants characterized by mutations that enhance transmissibility and enable antibody evasion4–6.
Several studies7–9 documented striking cases of convergent evolution in SARS-CoV-2, showing that distinct viral lineages can independently acquire similar mutations in the RBD. These convergent mutations, which cluster at key residues, help the virus escape neutralizing antibodies while maintaining ACE2 binding. Cao et al.7 further demonstrated that this pattern is especially evident in recent Omicron subvariants, where immune imprinting from earlier exposures narrows the diversity of neutralizing antibodies, intensifying selective pressures and driving parallel evolutionary pathways.
Significant progress has been made in demonstrating the effects of individual viral mutations, largely through powerful experimental approaches such as deep mutational scanning (DMS)7,10–18. These techniques provide invaluable prospective data on how single amino acid changes in the SARS-CoV-2 spike protein or its RBD influence key viral traits, including receptor-binding affinity, viral entry, and antibody neutralization. Such insights have proven critical for viral surveillance and have even enabled the forecasting of evolutionary success for specific viral clades19–22. Yet, quantifying how these mutations combine and interact across entire evolutionary trajectories remains a substantially more complex and less explored challenge. While many computational approaches can now predict the fitness effects of individual mutations23–25 or low-order combinations26, they typically overlook the collective, high-dimensional epistatic constraints that ultimately govern which mutational paths are viable. In realistic protein landscapes, epistatic interactions emerge progressively as mutations accumulate, dynamically reshaping the accessibility of subsequent mutations and giving rise to long, emergent evolutionary timescales27.
Recent theoretical advances, including transition path sampling28,29 and study of navigability in high-dimensional genotype–phenotype maps30, have begun to reveal how epistasis and evolutionary constraint jointly shape accessible evolutionary paths. Yet, a comprehensive, quantitative framework for predicting how functional and immune selection co-constrain the ensemble of escape trajectories, validated against real-world data, remains lacking.
Here, we introduce a probabilistic framework to characterize immune escape as a constrained dynamical process through sequence space. By modeling both protein viability and antibody evasion, we show that viral adaptation is funneled through a small number of viable mutational trajectories, which we term escape funnels. To uncover and quantify these escape funnels, we define escape trajectories as sequences of single-residue mutations and model their probability under joint structural and immune constraints. Structural viability is captured by Restricted Boltzmann Machines (RBMs) trained on homologous sequences, while immune escape is modeled using antibody-specific binding scores derived from deep mutational scanning. We benchmark our approach on a solvable lattice protein model and then apply it to SARS-CoV-2 RBD using experimental DMS data. Crucially, we derive a mean-field approximation of the path ensemble, enabling tractable computation of the free energy, entropy, and continuity of escape paths. This approach reveals a sharply reduced set of viable escape paths, explains convergent evolution in real-world variants, and quantifies how antibody combinations reshape the mutational landscape.
RESULTS
Escape paths
To investigate how proteins can escape immune pressure while maintaining viability, we developed a framework to sample evolutionary paths28,29 under immune pressure (Figure 1). Each path is defined as a sequence of amino acid variants , starting from the wildtype , and constructed through single-residue mutations per step, such that the Hamming distance between consecutive variants satisfies . The overall path probability is factorized as
| (1) |
where each balances two competing pressures: the requirement to maintain protein viability (which includes stability and capacity to bind to ACE2 receptor) and the need to escape antibody binding.
Fig. 1: Evolutionary paths toward antibody escape.
Schematic representation of plausible evolutionary trajectories from the wildtype to antibody-escaping variants. Paths progress through a constrained sequence space, funneling through a small set of viable intermediates that gradually accumulate immune escape mutations.
To model protein viability, we learn the probability of each protein sequence using a Restricted Boltzmann Machine (RBM)31,32. The RBM is trained on a multiple sequence alignment of protein variants and serves as a generative model for viable sequences. It captures both local constraints on individual sites (through site-specific fields ) and higher-order epistatic interactions (through a hidden layer). In this model, each visible unit represents the amino-acid identity at site in the sequence, while each hidden (latent) unit summarizes a collective interaction pattern across sites via the input , where encodes the contribution of residue at site to hidden unit . The probability of observing a sequence is then:
| (2) |
where the non-linear function is the cumulant generating function associated with hidden unit .
To model immune escape, we define
| (3) |
where denotes the binding probability of sequence to antibody , and the product runs over a pool of antibodies.
The full fitness model
| (4) |
combines these two contributions and determines the probability of visiting each sequence along the evolutionary path.
To explore such trajectories, we extended a Monte Carlo sampling algorithm of transition paths (MCMC) previously introduced28 (see Methods). Each path is constructed under adjacency constraints (single-residue mutations per step), ensuring evolutionary continuity while respecting the underlying sequence probability in (4).
Lattice protein model
We first benchmark this approach in a controlled setting, and apply it to lattice proteins (LP)33,34, a simplified yet physically grounded model of protein folding (Figure 2a). LP sequences consist of 27 amino acids folding into compact, self-avoiding walks on a 3 × 3 × 3 cubic lattice, which defines the possible structures . In this model, the ground-truth viability of a sequence is the probability that it folds in the native structure rather than in one of the many competing structures (see Methods). To mimic the situation in which we do not know the ground-truth fitness, we use the RBM trained on a set of high- LP sequences, as a surrogate model for 35.
Fig. 2: Folding constraints shape convergent escape routes under immune pressure on lattice protein model.
(a) Native fold structure , with the antibody-targeted epitope (upper face) highlighted in red. During escape, the protein must accumulate epitope mutations while preserving high fold probability. (b) Evolution of folding probability along sampled escape paths. The blue line indicates the mean across paths; grey lines show individual trajectories. (c) Accumulation of epitope mutations along escape paths, measured as Hamming distance from the wildtype. Each grey line represents a single path. Mutations accumulate steadily across the epitope, indicating progressive immune escape. (d) Wild-type epitope and three representative epitope variants at the terminal step of MCMC-sampled escape paths; amino-acid colors denote biochemical classes. Escape routes consistently converge on a small set of mutations with similar biochemical properties. (e) Site-wise probability of being mutated at the final step of the mean-field trajectory.
To represent the immune pressure exerted by an abstract antibody, we introduce an escape potential that penalizes similarity to the wildtype within a targeted epitope region:
| (5) |
where denotes the Hamming distance between the candidate sequence and wildtype , restricted to epitope sites, and enables non infinite log escape probability, see Eq. (3), even when the epitope is not mutated.
We then sample escape paths of length using our MCMC algorithm, starting from an initial sequence with high (Figure S1). Choosing allows mutating up to ∼ 20% of the protein, roughly twice the mutation fraction observed in the SARS-CoV-2 RBD and comparable to the cumulative sequence divergence of influenza hemagglutinin since 196836. Despite immune pressure, the sampled paths maintain high folding probabilities throughout the trajectory (Figure 2b): while individual paths exhibit some variability, the mean folding probability remains as high as wildtpe folding probability, demonstrating that the RBM constraint successfully preserves structural integrity under immune-driven evolution.
To assess mutation dynamics, we compare the per-site probability of being non-wildtype along escape paths. When sampling paths, viability is either approximated with RBM, or with the ground-truth LP model, used as a reference. At each path step, we compute for every site the probability of carrying a mutation relative to the wildtype. As shown in Figure S1, the dynamics obtained from RBM-constrained paths closely track those of the LP model, whereas paths generated under a site-independent model are less predictive for which site will mutate. This improved agreement highlights the RBM’s ability to capture epistatic interactions that underlie structural dependencies between sites.
Simultaneously, escape progresses along these paths, as seen in Figure 2c: the Hamming distance from the wildtype increases steadily on the epitope, indicating mutations accumulate progressively and steadily in the epitope region, reflecting adaptive escape from antibody pressure. Interestingly, although many mutational routes are theoretically possible—given that 9 epitope residues are mutable—the final escape variants converge into a small number of distinct sequence clusters. Finally, Figure 2d shows different paths lead to similar functional outcomes in epitope region, shaped by the joint pressure to evade antibodies while maintaining protein stability.
Mean field for path sampling
A key advantage of using an RBM for sequence modeling is that its latent-unit structure admits a tractable mean-field (MF) approximation of the path ensemble, providing theoretical insight and scaling beyond sampled trajectories28. In the MF framework of statistical physics, each path is described via three sets of order parameters: (1) the sequence dependent inputs to each of the RBM hidden units , (2) the binding energy per antibody and (3) a continuity parameter along the path encoding the overlap between successive sequences of length (see Methods).
These parameters summarize the trajectory at each time step , and the probability of sampling paths with such parameters is approximated as a Boltzmann weight
| (6) |
where the path free energy can be exactly computed (see Methods). Here, plays the role of an inverse temperature, allowing us to select paths of variable quality.
To validate the framework, we first apply the MF formulation to our lattice model, with . Optimizing the MF path under this drive recovers the key mutated epitope sites (Figure 2e). Moreover, projecting MCMC trajectories and the MF path onto the order-parameter space (Figure S2) shows close agreement: MF reproduces the geometry and progression of individual trajectories.
Application to SARS-CoV-2 receptor binding domain
We next apply our framework to a more realistic setting by modeling SARS-CoV-2 receptor binding domain (RBD) sequences under functional and immune constraints. Viability and binding to ACE2 are modeled using an RBM trained on 1000 synthetic homologs generated with ESM-Inverse Folding, conditioned on the RBD–ACE2 complex structure. Antibody escape is incorporated using experimental deep mutational scanning (DMS) data, inspired by Greaney et al.37, for a fixed set of 29 antibodies identified early in the pandemic (see Methods)16. Although additional experimental DMS datasets became available later7, this panel spans all four antibody classes and offers high-quality coverage with few missing single mutants. The objective is to determine whether an escape path exists: a rapid route with few mutational steps that circumvents the specified antibody set while maintaining protein viability. Unlike previous approaches20,38, we do not explicitly model antibody concentration and apply a uniform immune pressure to all antibodies (see Methods).
We first validate the RBM’s learned landscape by comparing its single-site mutational effect against experimental phenotypes. For each site , we computed the RBM log-likelihood score for single-site mutations in the wild-type background:
| (7) |
where denotes the average over all amino-acid substitutions at position . This score reflects the expected effect of mutations at site on protein viability, with positive values indicating mutations more favorable than the wild-type residue39. As shown in Figure S3, these RBM-derived scores correlate well with DMS measurements of both protein expression and ACE2 binding, indicating that the model accurately captures the viability landscape of the RBD. The model achieves a Spearman correlation of 0.60 for expression data—comparable to the DCA model of Rodriguez-Rivas et al.26 and the RBM of Huot et al.38 (both trained on SARS-CoV homologous sequences). It further reaches a Spearman correlation of 0.69 for ACE2 binding, a performance not reported in these earlier studies. This increase in predictive power is likely due to the use of synthetic homologs generated with ESM-Inverse Folding, which are structurally conditioned and more likely to retain ACE2 binding, in contrast to far SARS-CoV-2 homologs which were likely not subject to the ACE2 binding constraint. The RBM retains most of the predictive power for expression, compared to the original ESM-Inverse Folding model, while remaining equally effective at predicting binding.
We then consider escape paths of length , corresponding to at most 20 mutations along a trajectory, consistent with the mutational load observed in circulating variants. A key advantage of MF is that it enables the quantification of the number of viable immune escape paths through the computation of a path entropy (see Methods). This entropy reflects the diversity of trajectories consistent with both structural and immune constraints, and is reduced not only by limiting the number of mutations per step, but also by the functional constraints encoded in the RBM, which restrict the set of viable trajectories. In practice, the number of escape paths shrinks from in the unconstrained model to . Notably, the continuity constraint of one mutation per step reduces entropy far more strongly than the RBM fields alone (site-independent model).
A small number of viable escape paths suggests that only a limited subset of mutations can enable immune evasion without compromising protein foldability. To explore the size of this subset of mutations, we project MF-derived site entropies (see Methods) onto the 3D structure of the RBD–ACE2 complex (Figure 3a), revealing that highly mutable positions (in red) are surface-exposed and often overlap with residues mutated in known variants of concern (VOCs). Notably, top predictions include known escape hotspots such as T376, N440, N460 and Q498 mutated in most of VOCs, further supporting the biological relevance of the model.
Fig. 3: Escape paths capture mutational and antigenic features of SARS-CoV-2 variants.
(a) 3D structure of the RBD (light blue) bound to ACE2 (dark blue). Residues in red are in the top 10% of predicted entropy in the final step of the mean field trajectory. Pink spheres mark the top 10% predicted sites that were also mutated in variants of concern (VOC), labeled with the VOC-observed mutation. (b) MCMC trajectories projected onto the top two principal components of antibody binding coefficients (antigenic space). SARS-CoV-2 variants observed during the pandemic are plotted and colored by their date of appearance. Key variants of concern (VOCs) such as Wildtype (WT), BA.5, BQ.1.1, and XBB are annotated. (c) Comparison of antibody binding scores (across 29 antibodies, colored by class) between MCMC-sampled paths and the mean-field (MF) trajectory. The MF trajectory progressively aligns with MCMC results as escape accumulate with class 1 and 2 antibodies losing binding earliest. (d) ROC curves showing the predictive performance of the mean-field trajectory in identifying antibodies escaped by each of three major SARS-CoV-2 variants of concern: BA.5, BQ.1.1, and XBB. Predictions are based on the final-step values of antibody binding order parameters computed along the MF path. Experimental escape data—measured as IC50 values for 438 antibodies from Cao et al.—serve as the ground truth. (e) MF trajectories starting from WT and BA.1 variants, projected onto the top two principal components of antibody binding coefficients (antigenic space). SARS-CoV-2 variants observed during the pandemic are plotted and colored by their date of appearance.
Altogether, these results demonstrate that our MF approximation identifies key mutational targets and offers an interpretable and efficient way to characterize dominant escape trajectories. We then projected the MCMC-sampled trajectories into antigenic space, defined by the top two principal components (PCs) of antibody binding scores of observed sequences in pandemic (Figure 3b). Observed SARS-CoV-2 variants were colored by date of emergence. This allows us to directly compare model-predicted antigenic drift with real-world viral evolution. Notably, MCMC trajectories align with the direction of antigenic evolution observed in circulating variants, progressing from wildtype toward major variants of concern (VOCs) such as BA.5, BQ.1.1, and XBB. Later variants appear further along the model-predicted escape path, highlighting the model’s ability to predict the temporal and directional structure of antigenic drift.
We then compare computed antibody binding scores across all 29 antibodies at first and last steps along the trajectories (Figure 3c and Figure S5). At each time point, the MF predictions show strong correlation with those from MCMC, with agreement improving as escape progresses. This indicates that the MF approximation successfully captures the global escape dynamics. In particular, the MF is highly consistent with stochastic sampling at later steps, where immune pressure drives more pronounced divergence from the wildtype. Notably, class 1 and 2 antibodies tend to lose binding earliest, reflecting their greater vulnerability to initial escape mutations.
To test whether these predicted trajectories correspond to actual antibody escape, we computed amino acid probabilities at the final-step to deduce antibody binding scores for 438 antibodies from Cao et al.7 (Figure 3d) and compared them with experimental IC50 measurements. We find that the model reliably identifies ineffective antibodies, with area under the ROC curve (AUC) values of 0.58 for BA.5, 0.67 for BQ.1.1, and 0.73 for XBB—demonstrating strong predictive power, especially for the more immune-evasive variants. These results show that our framework not only reproduces past antigenic evolution but also has the potential to prospectively forecast antibody failure, without requiring any a priori knowledge of which mutations will arise. Finally, our framework is not limited to escape paths originating from the wildtype. When initialized from BA.1, the MF trajectory also reaches an antigenic profile consistent with XBB, showing that convergent escape routes can also emerge from already mutated variants (Figure 3e).
Effect of cocktails
Our MF framework enables us to quantify how different antibody combinations shape viral escape dynamics and to understand why some cocktails offer greater short-term protection (evaluated here over a trajectory of T = 9 steps). We examined the evolution of the escape probability along the MF trajectory under four different conditions: selection by individual antibodies (COV2–2196 or REGN10987) in Figures 4a–b and selection by cocktails pairing each with REGN10933 in Figures 4c–d. For each case, we identified the first point along the trajectory where the fraction of sequences with is reached, providing a quantitative measure of escape speed.
Fig. 4: Mean field trajectories reveal resistance of antibody cocktails to viral escape.
(a–d) Mean field trajectories showing antibody escape under pressure from individual antibodies (COV2–2196, REGN10987) and cocktails with REGN10933. The vertical dotted line indicates the step at which at least 50% escape is achieved. (e) Relationship between antibody binding covariance with REGN10933 and additional steps required to reach 50% escape when the antibody is used in a cocktail with REGN10933. (f) Additional RBM energy, compared to evolution without immune pressure, to reach 50% escape level averaged over mean field path, for different antibody combinations. Additional RBM energy can be interpreted as protein loss of viability.
The results show that the REGN10933 + REGN10987 cocktail substantially delays escape compared to REGN10987 alone, while the COV2–2196 + REGN10933 cocktail provides only marginal additional protection over COV2–2196. This suggests that not all cocktails are equally effective at slowing escape.
To understand these differences, we examined the correlation between escape profiles of REGN10933 with other antibodies discovered early in pandemic16, measured as the covariance of their site-wise binding effects weighted by covariation38. This restricts the analysis to mutations that preserve protein viability. Antibody pairs with positively correlated escape profiles tend to be evaded by the same mutations, whereas uncorrelated profiles require distinct and often compensatory mutations. As shown in Figure 4e, the number of additional steps required to reach 50% escape when REGN10933 is added to a cocktail increases when covariance between the two antibodies decreases (Pearson r = −0.75). Notably, combining multiple correlated antibodies in a cocktail only modestly increased the number of steps required for escape (Figure S6). Finally, these results generalize beyond cocktails involving REGN10933, suggesting broader principles for effective antibody combinations. Notably, fast escape primarily occurs when combining Class 1 and Class 2 antibodies (Figure S7). These findings agree with a previous study38 that demonstrated how decorrelated cocktails lead to a lower number of viable escape mutants. Our work extends this by showing that these decorrelated cocktails are harder to escape because they force the virus to follow longer, more constrained evolutionary paths through sequence space.
We next translate this increased “trajectory length” into explicit fitness costs, as evaluated by the RBM score. Figure 4f shows the additional energy along each trajectory. Notably, individual antibodies REGN10933, COV2–2196 and REGN10987 alone require few extra loss of viability because they target variable sites where naturally occurring mutations are often sufficient to reduce binding by 50%. In contrast, antibody combinations require each component antibody to reach a higher escape level before the RBD escapes the cocktail, leading to a mutational cost greater than the sum for the individual antibodies. This effect is especially pronounced for decorrelated antibodies, where about two full additional mutations (relative to natural evolution) are needed to achieve 50% escape. Together, these results illustrate how decorrelated cocktails not only extend the evolutionary path but also force the virus to accumulate changes that compromise structural stability.
DISCUSSION
Understanding how viruses adapt under immune pressure while preserving protein function is critical for anticipating antigenic evolution and designing therapies that remain effective over time. Here, we present a framework that captures the constrained nature of viral escape from neutralizing antibodies, by modeling evolution as a path through sequence space shaped by both viability constraints and immune selection. Compared to the astronomical combinatorial size of protein sequence space, we find that viable escape is funneled through a remarkably narrow set of mutational trajectories that we term “escape funnels”. These trajectories are not only constrained by viability, but also exhibit convergence consistent with real-world viral evolution, such as that observed in SARS-CoV-2 variants of concern7.
A core advance of our approach lies in combining generative sequence modeling with a statistical physics treatment of evolutionary paths28. We leverage RBMs to capture complex epistatic patterns learned from protein sequences, including both local and higher-order dependencies that are critical for maintaining structural integrity. The RBM accurately recovers folding properties in benchmark lattice models35 and enables distillation of sequence ensembles generated by powerful models like ESM-Inverse Folding40 into a simpler, tractable representation. We then apply a mean-field approximation to the ensemble of evolutionary trajectories, allowing us to analytically characterize dominant paths while quantifying key properties such as path entropy, and fitness cost.
This framework yields several key biological insights. First, we show that viability and immune constraints together lead to strongly convergent evolution, where escape trajectories rapidly collapse onto a limited number of mutational solutions. This finding is validated in both lattice protein models and the SARS-CoV-2 RBD, where we observe that mean-field generated trajectories recapitulate mutational patterns found in natural sequences. Notably, sites predicted to exhibit high mutational entropy under immune pressure correspond closely to those that have repeatedly mutated in real-world variants such as BA.5 and XBB.
Second, our model enables prospective forecasting of antibody escape from any variant and, crucially, incorporates evolutionary time, unlike previous approaches22,41 focused on viral variants generation that tested antibody effectiveness on mutants far away from wildtype. By projecting trajectories into antigenic space, we find that model-predicted paths align with the direction of observed antigenic drift in SARS-CoV-2 evolution. Furthermore, the predicted antibody binding loss correlates with experimental measurements of antibody resistance across a panel of 438 antibodies, demonstrating the utility of the model for identifying at-risk therapeutics in advance.
Third, we provide a mechanistic explanation for the effectiveness of antibody cocktails. We show that cocktails composed of antibodies with anti-correlated escape profiles force the virus to traverse longer and more structurally costly paths to achieve immune evasion. These findings are not only consistent with previous empirical results42–44, but also yield a quantitative framework for optimizing cocktail combinations. In particular, we find that the REGN10933 + REGN10987 cocktail imposes both the greatest mutational burden and the largest stability penalty, providing a biophysical explanation for its superior durability relative to other combinations.
While our framework captures structural and immune constraints that define escape funnels, it deliberately omits the temporal dynamics of immune responses and treats the antibody lanscape as fixed. This abstraction is efficient for forecasting which mutations enable fastest and viable escape, given a known set of antibodies, but it cannot account for shifting repertoires or the delayed emergence of certain antibody classes after boosting45.
More broadly, our framework is not virus-specific. The use of generative models like RBMs and the statistical mechanics of constrained path ensembles make this approach generalizable to other rapidly evolving systems. As a matter of fact, RBMs and similar probabilistic models have already been used to successfully capture epistatic interactions in HIV46 and influenza hemagglutinin47 evolution.
METHODS
Fitness model of lattice sequences
The folding probability of a sequence into a given structure is determined by the energetic contributions of residue-residue interactions within the folded configuration33,48. These interactions occur between amino acid pairs that are spatially adjacent in the structure but not sequentially contiguous. The geometry of the structure is encoded by a contact map , where indicates that residues and are in contact in structure , and otherwise.
The total energy of a sequence in a particular structure is computed as:
| (8) |
where denotes the Miyazawa-Jernigan interaction energy49 between amino acids and , reflecting their physico-chemical compatibility.
The probability that sequence adopts structure is then given by a Boltzmann distribution:
| (9) |
where the denominator sums over all possible compact, self-avoiding conformations in the lattice. In this formulation, represents the ground-truth thermodynamic probability that sequence adopts a given structure, based on its folding energy . However, computing this distribution exactly requires summing over all compact, self-avoiding conformations—a combinatorially expensive task for realistic protein spaces.
In our work, we therefore approximate with coming from an RBM trained on sequences known to fold into the desired structure, following the procedure of Jacquin et al.35. Precisely, these sequences were sampled from a low temperature MC sampling using with as effective energy.
To account for antibody escape, penalize low Hamming distance to wildtype over a fixed epitope subset:
To combine structural stability with immune escape, we define the fitness of a sequence as the product of its native folding probability and the antibody escape factor:
ESMIF distillation
For the SARS-CoV-2 RBD, we generated an artificial multiple sequence alignment (MSA) of 1,000 sequences using ESM-Inverse Folding at temperature 1, conditioned on the full RBD–ACE2 complex structure from PDB entry 6M0J. Conditioning on ACE2 binding ensures that the sampled sequences are compatible with receptor interaction, in contrast to prepandemic coronaviridae or animal SARS-related sequences that may diverge functionally. The RBM was then trained using the persistent contrastive divergence algorithm to approximate this distribution, thereby capturing structural constraints relevant for ACE2 recognition.
Antibody binding
To model effect of single mutations on binding to RBD, we use deep mutational scans : data provided in bloomlab repository, that integrates data from several previous studies10–16, as well as data taken from work of Cao et al.7.
We then define the binding energy contribution of amino acid at site i to antibody , where
We assume that all wildtype (wt) amino acids, as well as single mutants missing from DMS, do not provide escape . Value of 0.001 was taken as it was close to minimum measurable escape ratio among all single variants in DMS. Note that since , binding energy contributions .
Following Greaney et al.37, we then define
In other words, antibody binds to RBD only if all mutations maintain binding.
In previous studies20,38, binding probability to an antibody with concentration and dissociation constant was typically modeled as . In the low-concentration regime , making the binding energy equal to used in previous approaches (up to an additive constant).
Pandemic sequences
All 15,371,428 spike sequences on GISAID50 as of 14-April-2023 were downloaded and aligned, following approach in the work of Starr et al.15. Sequences from non-human origins and with lengths outside [1260, 1276] were removed. They were then aligned via mafft51 and sequences containing unicode errors, gap or ambiguous characters were removed. To avoid regional bias, we filtered to only keep sequences observed in the USA. Overall, we retained 3,712,556 submissions represented by 10,453 unique RBD sequences. RBD amino-acid mutations were enumerated compared to the reference Wuhan-Hu-1 SARS-CoV-2 RBD sequence (Genbank MN908947, residues N331-T531).
To align with the MSA data, the model was finally applied to sequences of length 178, spanning residues S349 to G526, leading to 8,167 unique RBD sequences.
Path sampling with MCMC
We sample mutation paths under the target distribution . Paths start from a fixed wildtype and are updated by single–mutation Metropolis–Hastings moves. Proposed updates depend on the Hamming distance between neighboring states: if neighbors coincide, a new mutation is drawn randomly; if they differ by one residue, the update is restricted to that site; and if they differ by two residues, the update flips to the only compatible intermediate. Each proposal is accepted with probability ensuring detailed balance, so that the algorithm converges to the target distribution (see SI, MCMC algorithm for path sampling).
Mean field
We extend the framework of Mauri et al.28,29, which formulated mean–field dynamics of mutational paths under folding constraints, in two key ways. First, we incorporate antibody escape by introducing binding–score order parameters that act in concurrence with folding pressure, thereby capturing the trade–off between structural stability and immune evasion. Second, we free the end of the trajectories: unlike transition–path formulations that assume a fixed evolutionary destination, our paths are open-ended and evolve solely under the joint action of folding and immune constraints.
Sequence distribution is given by the RBM with fields and hidden unit cumulant generating function (see SI, RBM-based sequence modeling):
| (10) |
Antibody escape is given by the product of the probability to escape each antibody:
| (11) |
Sequence probability is a combination of stability and antibody escape:
| (12) |
with the potential
| (13) |
The potential should forbid large jumps along the paths. We thus consider a hard-wall repulsive potential,
| (14) |
The location of the hard wall, , allows the path to explore at most mutations in steps.
The free energy is given by:
| (15) |
The entropy term captures the diversity of sequence realizations with the same order parameters. The mean-field trajectory is obtained by minimizing with respect to the order parameters, subject to initial conditions fixed at one given sequence, such as the wildtype (see SI, Mean field).
Site entropy
To quantify the mutational variability at each site i of the RBD, we compute the Shannon entropy:
| (16) |
where is the marginal probability of observing amino acid a at position i. In the mean–field approximation, these marginals follow from the derivative of the partition function with respect to the local fields (see SI, Amino acid probability in mean field), yielding
| (17) |
High entropy indicates that the site tolerates diverse mutations, while low entropy reflects evolutionary constraints or functional importance.
Path Entropy
To build intuition, we consider limiting cases for the path entropy normalized by the protein length L. When paths are constrained to a single mutation per step and no external fields are applied, the entropy is given by . In the case where continuity is removed and no field is present, the entropy becomes . If the field is site-independent and continuity is still removed, the entropy takes the form , where denotes the average entropy across sites induced by the field. Using the full model, path entropy (see SI, Path Entropy) can be computed as:
| (18) |
Correlation of antibody escape profiles
To quantify the similarity between antibody escape profiles, we computed the covariance between the binding scores of antibodies identified early in the pandemic16. For a given antibody , the binding score, defined as
| (19) |
was evaluated over sequences sampled from , thereby biasing the analysis toward structurally viable variants.
To quantify antibody synergy, we compute the covariance between dissociation constants associated with antibodies 1 and 238:
| (20) |
where Cov represents the covariance of amino-acid occurrences when sampling sequences from the distribution . Sampling sequences from the RBM distribution ensures that escape profile correlations capture mutational accessibility under stability constraints, rather than relying solely on epitope comparison as done in previous studies14,37.
Supplementary Material
Table 1:
Number of paths under different model constraints.
| Constraints | none | site-independent field | 1 mutation per step | full model |
|---|---|---|---|---|
| Nb of paths | 104631 | 101985 | 1071 | 1021 |
SIGNIFICANCE.
Viruses must continually mutate to evade our immune defenses, yet they cannot mutate freely. Like navigating a minefield, each step toward immune escape comes at the potential cost of structural stability. This study shows that despite the astronomical number of potential mutations, viruses are funneled into narrow, predictable evolutionary paths to escape antibodies while preserving structural integrity. Using a statistical physics inspired model grounded in experimental antibody and SARS-CoV-2 epidemiology data, we model these “escape funnels” and show how they predict convergent evolution observed during the past pandemic. We also reveal why certain antibody cocktails are more resistant to viral escape: they constrain evolution to longer, less viable mutational routes.
ACKNOWLEDGMENTS
SC is grateful to G. Parisi for having suggested the application of transition paths to SARS-Cov2 evolution. We acknowledge funding from the Agence Nationale de la Recherche (ANR ProDiGen, AAP2024 CE45 to S.C. and R.M.). This work is supported by NIH R35GM139571. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
We gratefully acknowledge all data contributors, i.e., the Authors and their Originating laboratories responsible for obtaining the specimens, and their Submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based.
Footnotes
DECLARATION OF INTERESTS
The authors declare no competing interests.
RESOURCE AVAILABILITY
Data and code availability
The code used in this study will be available at https://github.com/m-huot/ESCAPE_PATHS after manuscript acceptance.
References
- 1.Raharinirina N.A., Gubela N., Börnigen D., Smith M.R., Oh D.Y., Budt M., Mache C., Schillings C., Fuchs S., Dürrwald R., Wolff T., Hölzer M., Paraskevopoulou S., and Von Kleist M. (2025). SARS-CoV-2 evolution on a dynamic immune landscape. Nature 639, 196–204. URL: https://www.nature.com/articles/s41586-024-08477-8. doi: 10.1038/s41586-024-08477-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Meijers M., Ruchnewitz D., Eberhardt J., Łuksza M., and Lässig M. (2023). Population immunity predicts evolutionary trajectories of SARS-CoV-2. Cell 186, 5151–5164.e13. URL: https://linkinghub.elsevier.com/retrieve/pii/S0092867423010760. doi: 10.1016/j.cell.2023.09.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jackson C.B., Farzan M., Chen B., and Choe H. (2022). Mechanisms of SARS-CoV-2 entry into cells. Nature Reviews Molecular Cell Biology 23, 3–20. URL: https://www.nature.com/articles/s41580-021-00418-x. doi: 10.1038/s41580-021-00418-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Carabelli A.M., Peacock T.P., Thorne L.G., Harvey W.T., Hughes J., COVID-19 Genomics UK Consortium, De Silva T.I., Peacock S.J., Barclay W.S., De Silva T.I., Towers G.J., and Robertson D.L. (2023). SARS-CoV-2 variant biology: immune escape, transmission and fitness. Nature Reviews Microbiology. URL: https://www.nature.com/articles/s41579-022-00841-7. doi: 10.1038/s41579-022-00841-7. [DOI] [Google Scholar]
- 5.Nabel K.G., Clark S.A., Shankar S., Pan J., Clark L.E., Yang P., Coscia A., McKay L.G.A., Varnum H.H., Brusic V., Tolan N.V., Zhou G., Desjardins M., Turbett S.E., Kanjilal S., Sherman A.C., Dighe A., LaRocque R.C., Ryan E.T., Tylek C., Cohen-Solal J.F., Darcy A.T., Tavella D., Clabbers A., Fan Y., Griffiths A., Correia I.R., Seagal J., Baden L.R., Charles R.C., and Abraham J. (2022). Structural basis for continued antibody evasion by the SARS-CoV-2 receptor binding domain. Science 375, eabl6251. URL: https://www.science.org/doi/10.1126/science.abl6251. doi: 10.1126/science.abl6251. [DOI] [Google Scholar]
- 6.Wang Q., Iketani S., Li Z., Liu L., Guo Y., Huang Y., Bowen A.D., Liu M., Wang M., Yu J., Valdez R., Lauring A.S., Sheng Z., Wang H.H., Gordon A., Liu L., and Ho D.D. (2023). Alarming antibody evasion properties of rising SARS-CoV-2 BQ and XBB subvariants. Cell 186, 279–286.e8. URL: https://linkinghub.elsevier.com/retrieve/pii/S0092867422015318. doi: 10.1016/j.cell.2022.12.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cao Y., Jian F., Wang J., Yu Y., Song W., Yisimayi A., Wang J., An R., Chen X., Zhang N., Wang Y., Wang P., Zhao L., Sun H., Yu L., Yang S., Niu X., Xiao T., Gu Q., Shao F., Hao X., Xu Y., Jin R., Shen Z., Wang Y., and Xie X.S. (2022). Imprinted SARS-CoV-2 humoral immunity induces convergent Omicron RBD evolution. Nature. URL: https://www.nature.com/articles/s41586-022-05644-7. doi: 10.1038/s41586-022-05644-7. [DOI] [Google Scholar]
- 8.Feng S., Reid G.E., Clark N.M., Harrington A., Uprichard S.L., and Baker S.C. (2024). Evidence of SARS-CoV-2 convergent evolution in immunosuppressed patients treated with antiviral therapies. Virology Journal 21, 105. URL: https://virologyj.biomedcentral.com/articles/10.1186/s12985-024-02378-y. doi: 10.1186/s12985-024-02378-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Jian F., Feng L., Yang S., Yu Y., Wang L., Song W., Yisimayi A., Chen X., Xu Y., Wang P., Yu L., Wang J., Liu L., Niu X., Wang J., Xiao T., An R., Wang Y., Gu Q., Shao F., Jin R., Shen Z., Wang Y., Wang X., and Cao Y. (2023). Convergent evolution of SARS-CoV-2 XBB lineages on receptor-binding domain 455–456 synergistically enhances antibody evasion and ACE2 binding. PLOS Pathogens 19, e1011868. URL: https://dx.plos.org/10.1371/journal.ppat.1011868. doi: 10.1371/journal.ppat.1011868. [DOI] [Google Scholar]
- 10.Greaney A.J., Starr T.N., Gilchuk P., Zost S.J., Binshtein E., Loes A.N., Hilton S.K., Huddleston J., Eguia R., Crawford K.H., Dingens A.S., Nargi R.S., Sutton R.E., Suryadevara N., Rothlauf P.W., Liu Z., Whelan S.P., Carnahan R.H., Crowe J.E., and Bloom J.D. (2021). Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody Recognition. Cell Host & Microbe 29, 44–57.e9. URL: https://linkinghub.elsevier.com/retrieve/pii/S1931312820306247. doi: 10.1016/j.chom.2020.11.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Greaney A.J., Starr T.N., Barnes C.O., Weisblum Y., Schmidt F., Caskey M., Gaebler C., Cho A., Agudelo M., Finkin S., Wang Z., Poston D., Muecksch F., Hatziioannou T., Bieniasz P.D., Robbiani D.F., Nussenzweig M.C., Bjorkman P.J., and Bloom J.D. (2021). Mapping mutations to the SARS-CoV-2 RBD that escape binding by different classes of antibodies. Nature Communications 12, 4196. URL: https://www.nature.com/articles/s41467-021-24435-8. doi: 10.1038/s41467-021-24435-8. [DOI] [Google Scholar]
- 12.Greaney A.J., Loes A.N., Crawford K.H., Starr T.N., Malone K.D., Chu H.Y., and Bloom J.D. (2021). Comprehensive mapping of mutations in the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human plasma antibodies. Cell Host & Microbe 29, 463–476.e6. URL: https://linkinghub.elsevier.com/retrieve/pii/S1931312821000822. doi: 10.1016/j.chom.2021.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tortorici M.A., Czudnochowski N., Starr T.N., Marzi R., Walls A.C., Zatta F., Bowen J.E., Jaconi S., Di Iulio J., Wang Z., De Marco A., Zepeda S.K., Pinto D., Liu Z., Beltramello M., Bartha I., Housley M.P., Lempp F.A., Rosen L.E., Dellota E., Kaiser H., Montiel-Ruiz M., Zhou J., Addetia A., Guarino B., Culap K., Sprugasci N., Saliba C., Vetti E., Giacchetto-Sasselli I., Fregni C.S., Abdelnabi R., Foo S.Y.C., Havenar-Daughton C., Schmid M.A., Benigni F., Cameroni E., Neyts J., Telenti A., Virgin H.W., Whelan S.P.J., Snell G., Bloom J.D., Corti D., Veesler D., and Pizzuto M.S. (2021). Broad sarbecovirus neutralization by a human monoclonal antibody. Nature 597, 103–108. URL: https://www.nature.com/articles/s41586-021-03817-4. doi: 10.1038/s41586-021-03817-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Starr T.N., Greaney A.J., Dingens A.S., and Bloom J.D. (2021). Complete map of SARS-CoV-2 RBD mutations that escape the monoclonal antibody LY-CoV555 and its cocktail with LY-CoV016. Cell Reports Medicine 2, 100255. URL: https://linkinghub.elsevier.com/retrieve/pii/S2666379121000719. doi: 10.1016/j.xcrm.2021.100255. [DOI] [Google Scholar]
- 15.Starr T.N., Greaney A.J., Addetia A., Hannon W.W., Choudhary M.C., Dingens A.S., Li J.Z., and Bloom J.D. (2021). Prospective mapping of viral mutations that escape antibodies used to treat COVID-19. Science 371, 850–854. URL: https://www.science.org/doi/10.1126/science.abf9302. doi: 10.1126/science.abf9302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Starr T.N., Czudnochowski N., Liu Z., Zatta F., Park Y.J., Addetia A., Pinto D., Beltramello M., Hernandez P., Greaney A.J., Marzi R., Glass W.G., Zhang I., Dingens A.S., Bowen J.E., Tortorici M.A., Walls A.C., Wojcechowskyj J.A., De Marco A., Rosen L.E., Zhou J., Montiel-Ruiz M., Kaiser H., Dillen J.R., Tucker H., Bassi J., SilacciFregni C., Housley M.P., Di Iulio J., Lombardo G., Agostini M., Sprugasci N., Culap K., Jaconi S., Meury M., Dellota E. Jr, Abdelnabi R., Foo S.Y.C., Cameroni E., Stumpf S., Croll T.I., Nix J.C., Havenar-Daughton C., Piccoli L., Benigni F., Neyts J., Telenti A., Lempp F.A., Pizzuto M.S., Chodera J.D., Hebner C.M., Virgin H.W., Whelan S.P.J., Veesler D., Corti D., Bloom J.D., and Snell G. (2021). SARS-CoV-2 RBD antibodies that maximize breadth and resistance to escape. Nature 597, 97–102. URL: https://www.nature.com/articles/s41586-021-03807-6. doi: 10.1038/s41586-021-03807-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dadonaite B., Brown J., McMahon T.E., Farrell A.G., Figgins M.D., Asarnow D., Stewart C., Lee J., Logue J., Bedford T., Murrell B., Chu H.Y., Veesler D., and Bloom J.D. (2024). Spike deep mutational scanning helps predict success of SARS-CoV-2 clades. Nature 631, 617–626. URL: https://www.nature.com/articles/s41586-024-07636-1. doi: 10.1038/s41586-024-07636-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yisimayi A., Song W., Wang J., Jian F., Yu Y., Chen X., Xu Y., Yang S., Niu X., Xiao T., Wang J., Zhao L., Sun H., An R., Zhang N., Wang Y., Wang P., Yu L., Lv Z., Gu Q., Shao F., Jin R., Shen Z., Xie X.S., Wang Y., and Cao Y. (2024). Repeated Omicron exposures override ancestral SARS-CoV-2 immune imprinting. Nature 625, 148–156. URL: https://www.nature.com/articles/s41586-023-06753-7. doi: 10.1038/s41586-023-06753-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ito J., Strange A., Liu W., Joas G., Lytras S., The Genotype to Phenotype Japan (G2P-Japan) Consortium, Matsuno K., Nao N., Sawa H., Mizuma K., Kojima I., Li J., Tsubo T., Tanaka S., Tsuda M., Wang L., Oda Y., Ferdous Z., Shishido K., Fukuhara T., Tamura T., Suzuki R., Suzuki S., Tsujino S., Ito H., Kaku Y., Misawa N., Plianchaisuk A., Guo Z., Hinay A.A., Usui K., Saikruang W., Uriu K., Kosugi Y., Fujita S., Tolentino M., J.E., Chen L., Pan L., Li W., Suganami M., Chiba M., Yoshimura R., Yasuda K., Iida K., Ohsumi N., Tanaka S., Okumura K., Yoshimura K., Sadamas K., Nagashima M., Asakura H., Yoshida I., Nakagawa S., Takaori-Kondo A., Shirakawa K., Nagata K., Nomura R., Horisawa Y., Tashiro Y., Kawai Y., Takayama K., Hashimoto R., Deguchi S., Watanabe Y., Nakata Y., Futatsusako H., Sakamoto A., Yasuhara N., Hashiguchi T., Suzuki T., Kimura K., Sasaki J., Nakajima Y., Yajima H., Irie T., Kawabata R., Sasaki-Tabata K., Ikeda T., Nasse H., Shimizu R., Begum M.M., Jonathan M., Mugita Y., Leong S., Takahashi O., Ichihara K., Ueno T., Motozono C., Toyoda M., Saito A., Shofa M., Shibatani Y., Nishiuchi T., Zahradni J., Andrikopoulos P., Padilla-Blanco M., Konar A., and Sato K. (2025). A protein language model for exploring viral fitness landscapes. Nature Communications 16, 4236. URL: https://www.nature.com/articles/s41467-025-59422-w. doi: 10.1038/s41467-025-59422-w. [DOI] [Google Scholar]
- 20.Wang D., Huot M., Mohanty V., and Shakhnovich E.I. (2024). Biophysical principles predict fitness of sars-cov-2 variants. Proceedings of the National Academy of Sciences 121, e2314518121. URL: https://www.pnas.org/doi/abs/10.1073/pnas.2314518121. doi: 10.1073/pnas.2314518121. arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2314518121. [DOI] [Google Scholar]
- 21.Maher M.C., Bartha I., Weaver S., Di Iulio J., Ferri E., Soriaga L., Lempp F.A., Hie B.L., Bryson B., Berger B., Robertson D.L., Snell G., Corti D., Virgin H.W., Kosakovsky Pond S.L., and Telenti A. (2022). Predicting the mutational drivers of future SARS-CoV-2 variants of concern. Science Translational Medicine 14, eabk3445. URL: https://www.science.org/doi/10.1126/scitranslmed.abk3445. doi: 10.1126/scitranslmed.abk3445. [DOI] [Google Scholar]
- 22.Huot M., Wang D., Liu J., and Shakhnovich E.I. (2025). Predicting high-fitness viral protein variants with bayesian active learning and biophysics. Proceedings of the National Academy of Sciences 122, e2503742122. URL: https://www.pnas.org/doi/abs/10.1073/pnas.2503742122. doi: 10.1073/pnas.2503742122. arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2503742122. [DOI] [Google Scholar]
- 23.Thadani N.N., Gurev S., Notin P., Youssef N., Rollins N.J., Ritter D., Sander C., Gal Y., and Marks D.S. (2023). Learning from prepandemic data to forecast viral escape. Nature 622, 818–825. URL: https://www.nature.com/articles/s41586-023-06617-0. doi: 10.1038/s41586-023-06617-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wang G., Liu X., Wang K., Gao Y., Li G., Baptista-Hon D.T., Yang X.H., Xue K., Tai W.H., Jiang Z., Cheng L., Fok M., Lau J.Y.N., Yang S., Lu L., Zhang P., and Zhang K. (2023). Deep-learning-enabled protein–protein interaction analysis for prediction of SARS-CoV-2 infectivity and variant evolution. Nature Medicine 29, 2007–2018. URL: https://www.nature.com/articles/s41591-023-02483-5. doi: 10.1038/s41591-023-02483-5. [DOI] [Google Scholar]
- 25.Wang Dianzhuo, Huot M., Zechen Zhang, Kaiyi Jiang, Shakhnovich E.I., and Esvelt K.M. (2025). Without Safeguards, AI-Biology Integration Risks Accelerating Future Pandemics. URL: https://rgdoi.net/10.13140/RG.2.2.29765.15849. doi: 10.13140/RG.2.2.29765.15849. Publisher: Unpublished. [DOI] [Google Scholar]
- 26.Rodriguez-Rivas J., Croce G., Muscat M., and Weigt M. (2022). Epistatic models predict mutable sites in SARS-CoV-2 proteins and epitopes. Proceedings of the National Academy of Sciences 119, e2113118119. URL: https://pnas.org/doi/full/10.1073/pnas.2113118119. doi: 10.1073/pnas.2113118119. [DOI] [Google Scholar]
- 27.Bari L.D., Bisardi M., Cotogno S., Weigt M., and Zamponi F. (2024). Emergent time scales of epistasis in protein evolution. Proceedings of the National Academy of Sciences 121, e2406807121. URL: https://www.pnas.org/doi/abs/10.1073/pnas.2406807121. doi: 10.1073/pnas.2406807121. arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2406807121. [DOI] [Google Scholar]
- 28.Mauri E., Cocco S., and Monasson R. (2023). Mutational Paths with Sequence-Based Models of Proteins: From Sampling to Mean-Field Characterization. Physical Review Letters 130, 158402. URL: https://link.aps.org/doi/10.1103/PhysRevLett.130.158402. doi: 10.1103/PhysRevLett.130.158402. [DOI] [Google Scholar]
- 29.Mauri E., Cocco S., and Monasson R. (2023). Transition paths in Potts-like energy landscapes: General properties and application to protein sequence models. Physical Review E 108, 024141. URL: https://link.aps.org/doi/10.1103/PhysRevE.108.024141. doi: 10.1103/PhysRevE.108.024141. [DOI] [Google Scholar]
- 30.Greenbury S.F., Louis A.A., and Ahnert S.E. (2022). The structure of genotype-phenotype maps makes fitness landscapes navigable. Nature Ecology & Evolution 6, 1742–1752. URL: https://www.nature.com/articles/s41559-022-01867-z. doi: 10.1038/s41559-022-01867-z. [DOI] [PubMed] [Google Scholar]
- 31.Fischer A., and Igel C. (2014). Training restricted Boltzmann machines: An introduction. Pattern Recognition 47, 25–39. URL: https://linkinghub.elsevier.com/retrieve/pii/S0031320313002495. doi: 10.1016/j.patcog.2013.05.025. [DOI] [Google Scholar]
- 32.Tubiana J., Cocco S., and Monasson R. (2019). Learning protein constitutive motifs from sequence data. eLife 8, e39397. URL: https://elifesciences.org/articles/39397. doi: 10.7554/eLife.39397. [DOI] [Google Scholar]
- 33.Lau K.F., and Dill K.A. (1989). A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22, 3986–3997. URL: https://pubs.acs.org/doi/abs/10.1021/ma00200a030. doi: 10.1021/ma00200a030. [DOI] [Google Scholar]
- 34.Loffredo E., Vesconi E., Razban R., Peleg O., Shakhnovich E., Cocco S., and Monasson R. (2023). Evolutionary dynamics of a lattice dimer: a toy model for stability vs. affinity trade-offs in proteins. Journal of Physics A: Mathematical and Theoretical 56, 455002. URL: https://iopscience.iop.org/article/10.1088/1751-8121/acfddc. doi: 10.1088/1751-8121/acfddc. [DOI] [Google Scholar]
- 35.Jacquin H., Gilson A., Shakhnovich E., Cocco S., and Monasson R. (2016). Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models. PLOS Computational Biology 12, e1004889. URL: https://dx.plos.org/10.1371/journal.pcbi.1004889. doi: 10.1371/journal.pcbi.1004889. [DOI] [Google Scholar]
- 36.Yang H., Carney P.J., Chang J.C., Guo Z., Villanueva J.M., and Stevens J. (2015). Structure and receptor binding preferences of recombinant human A(H3N2) virus hemagglutinins. Virology 477, 18–31. URL: https://linkinghub.elsevier.com/retrieve/pii/S004268221400573X. doi: 10.1016/j.virol.2014.12.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Greaney A.J., Starr T.N., and Bloom J.D. (2022). An antibody-escape estimator for mutations to the sars-cov-2 receptor-binding domain. Virus Evolution 8, veac021. URL: https://doi.org/10.1093/ve/veac021. doi: 10.1093/ve/veac021.arXiv:https://academic.oup.com/ve/article-pdf/8/1/veac021/43671163/veac021.pdf. [DOI] [Google Scholar]
- 38.Huot M., Rosenbaum P., Planchais C., Mouquet H., Monasson R., and Cocco S. (2025). Generative model of sars-cov-2 variants under functional and immune pressure unveils viral escape potential and antibody resilience. bioRxiv. URL: https://www.biorxiv.org/content/early/2025/05/13/2025.05.12.653592. doi: 10.1101/2025.05.12.653592.arXiv:https://www.biorxiv.org/content/early/2025/05/13/2025.05.12.653592.full.p [DOI] [Google Scholar]
- 39.Cocco S., Feinauer C., Figliuzzi M., Monasson R., and Weigt M. (2018). Inverse statistical physics of protein sequences: a key issues review. Reports on Progress in Physics 81, 032601. URL: https://iopscience.iop.org/article/10.1088/1361-6633/aa9965. doi: 10.1088/1361-6633/aa9965. [DOI] [Google Scholar]
- 40.Hsu C., Verkuil R., Liu J., Lin Z., Hie B., Sercu T., Lerer A., and Rives A. (2022). Learning inverse folding from millions of predicted structures. In Chaudhuri K., Jegelka S., Song L., Szepesvari C., Niu G., and Sabato S., eds. Proceedings of the 39th International Conference on Machine Learning vol. 162 of Proceedings of Machine Learning Research. PMLR; pp. 8946–8970. URL: https://proceedings.mlr.press/v162/hsu22a.html. [Google Scholar]
- 41.Youssef N., Gurev S., Ghantous F., Brock K.P., Jaimes J.A., Thadani N.N., Dauphin A., Sherman A.C., Yurkovetskiy L., Soto D., Estanboulieh R., Kotzen B., Notin P., Kollasch A.W., Cohen A.A., Dross S.E., Erasmus J., Fuller D.H., Bjorkman P.J., Lemieux J.E., Luban J., Seaman M.S., and Marks D.S. (2025). Computationally designed proteins mimic antibody immune evasion in viral evolution. Immunity 58, 1411–1421.e6. URL: https://linkinghub.elsevier.com/retrieve/pii/S1074761325001785. doi: 10.1016/j.immuni.2025.04.015. [DOI] [PubMed] [Google Scholar]
- 42.Baum A., Fulton B.O., Wloga E., Copin R., Pascal K.E., Russo V., Giordano S., Lanza K., Negron N., Ni M., Wei Y., Atwal G.S., Murphy A.J., Stahl N., Yancopoulos G.D., and Kyratsous C.A. (2020). Antibody cocktail to sars-cov-2 spike protein prevents rapid mutational escape seen with individual antibodies. Science 369, 1014–1018. URL: https://www.science.org/doi/abs/10.1126/science.abd0831. doi: 10.1126/science.abd0831.arXiv:https://www.science.org/doi/pdf/10.1126/science.abd0831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ku Z., Xie X., Davidson E., Ye X., Su H., Menachery V.D., Li Y., Yuan Z., Zhang X., Muruato A.E., I Escuer A.G., Tyrell B., Doolan K., Doranz B.J., Wrapp D., Bates P.F., McLellan J.S., Weiss S.R., Zhang N., Shi P.Y., and An Z. (2021). Molecular determinants and mechanism for antibody cocktail preventing SARS-CoV-2 escape. Nature Communications 12, 469. URL: https://www.nature.com/articles/s41467-020-20789-7. doi: 10.1038/s41467-020-20789-7. [DOI] [Google Scholar]
- 44.Yu T.C., Thornton Z.T., Hannon W.W., DeWitt W.S., Radford C.E., Matsen F.A., and Bloom J.D. (2022). A biophysical model of viral escape from polyclonal antibodies. Virus Evolution 8, veac110. URL: https://academic.oup.com/ve/article/doi/10.1093/ve/veac110/6889254. doi: 10.1093/ve/veac110. [DOI] [Google Scholar]
- 45.Andreano E., Paciello I., Pierleoni G., Piccini G., Abbiento V., Antonelli G., Pileri P., Manganaro N., Pantano E., Maccari G., Marchese S., Donnici L., Benincasa L., Giglioli G., Leonardi M., De Santi C., Fabbiani M., Rancan I., Tumbarello M., Montagnani F., Sala C., Medini D., De Francesco R., Montomoli E., and Rappuoli R. (2023). B cell analyses after SARS-CoV-2 mRNA third vaccination reveals a hybrid immunity like antibody response. Nature Communications 14, 53. URL: https://www.nature.com/articles/s41467-022-35781-6. doi: 10.1038/s41467-022-35781-6. [DOI] [Google Scholar]
- 46.Shekhar K., Ruberman C.F., Ferguson A.L., Barton J.P., Kardar M., and Chakraborty A.K. (2013). Spin models inferred from patient-derived viral sequence data faithfully describe HIV fitness landscapes. Physical Review E 88, 062705. URL: https://link.aps.org/doi/10.1103/PhysRevE.88.062705. doi: 10.1103/PhysRevE.88.062705. [DOI] [Google Scholar]
- 47.Doelger J., Kardar M., and Chakraborty A.K. (2022). Inferring the intrinsic mutational fitness landscape of influenzalike evolving antigens from temporally ordered sequence data. Physical Review E 105, 024401. URL: https://link.aps.org/doi/10.1103/PhysRevE.105.024401. doi: 10.1103/PhysRevE.105.024401. [DOI] [Google Scholar]
- 48.Shakhnovich E.I., and Gutin A.M. (1993). Engineering of stable and fast-folding sequences of model proteins. Proceedings of the National Academy of Sciences 90, 7195–7199. URL: https://pnas.org/doi/full/10.1073/pnas.90.15.7195. doi: 10.1073/pnas.90.15.7195. [DOI] [Google Scholar]
- 49.Miyazawa S., and Jernigan R.L. (1996). Residue – Residue Potentials with a Favorable Contact Pair Term and an Unfavorable High Packing Density Term, for Simulation and Threading. Journal of Molecular Biology 256, 623–644. URL: https://linkinghub.elsevier.com/retrieve/pii/S002228369690114X. doi: 10.1006/jmbi.1996.0114. [DOI] [PubMed] [Google Scholar]
- 50.Elbe S., and Buckland-Merrett G. (2017). Data, disease and diplomacy: GISAID’s innovative contribution to global health: Data, Disease and Diplomacy. Global Challenges 1, 33–46. URL: https://onlinelibrary.wiley.com/doi/10.1002/gch2.1018. doi: 10.1002/gch2.1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Katoh K., and Standley D.M. (2013). MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution 30, 772–780. URL: https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/mst010. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The code used in this study will be available at https://github.com/m-huot/ESCAPE_PATHS after manuscript acceptance.




