Skip to main content
eLife logoLink to eLife
. 2025 Sep 24;14:RP106365. doi: 10.7554/eLife.106365

Forecasting protein evolution by integrating birth-death population models with structurally constrained substitution models

David Ferreiro 1,2, Luis Daniel González-Vázquez 1,2, Ana Prado-Comesaña 1, Miguel Arenas 1,2,
Editors: Anne-Florence Bitbol3, Aleksandra M Walczak4
PMCID: PMC12459951  PMID: 40991332

Abstract

Evolutionary studies in population genetics and ecology were mainly focused on predicting and understanding past evolutionary events. Recently, however, a growing trend explores the prediction of evolutionary trajectories toward the future promoted by its wide variety of applications. In this context, we introduce a forecasting protein evolution method that integrates birth-death population models with substitution models that consider selection on protein folding stability. In contrast to traditional population genetics methods that usually make the unrealistic assumption of simulating molecular evolution separately from the evolutionary history, the present method combines both processes to simultaneously model forward-in-time birth-death evolutionary trajectories and protein evolution under structurally constrained substitution models that outperformed traditional empirical substitution models. We implemented the method into a freely available computer framework. We evaluated the accuracy of the predictions with several monitored viral proteins of broad interest. Overall, the method showed acceptable errors in predicting the folding stability of the forecasted protein variants, but, expectedly, the errors were larger in the prediction of the corresponding sequences. We conclude that forecasting protein evolution is feasible in certain evolutionary scenarios and provide suggestions to enhance its accuracy by improving the underlying models of evolution.

Research organism: Viruses

Introduction

Molecular evolution is traditionally investigated through inferences about past evolutionary events, such as phylogenetic tree and ancestral sequence reconstructions, and predictions about the future were considered as inaccessible for a long time because they can be affected by complex processes such as environmental change. Nevertheless, a variety of biological systems display a Darwinian evolutionary process where selection operates toward a limited set of adapted variants. These variants, and in extension the evolutionary trajectories to reach them, would be positively selected and could present a certain degree of predictability (Lässig et al., 2017; Wortel et al., 2023). The progress made in developing more accurate models of evolution (Arenas, 2015b) and the benefits from predicting the outcome of evolution (i.e. to understand the course of evolution or to prepare for the future Wortel et al., 2023) motivated a variety of investigations on forecasting evolution in diverse fields including medicine, agriculture, biotechnology, and conservation biology, among others (e.g. Barton et al., 2016; Bull and Molineux, 2008; de Visser et al., 2018; Diaz-Uriarte and Vasallo, 2019; Fischer et al., 2015; Gerrish and Sniegowski, 2012; Lässig and Łuksza, 2014; Lind et al., 2019; Luksza and Lässig, 2014; Morris et al., 2018; Munck et al., 2014; Neher et al., 2014; Wortel et al., 2023). Unfortunately, forecasting evolution is not always achievable. Under neutral evolution, all the molecular variants are equally likely to be present in the population, showing lack of repeatability and disallowing accurate prediction of future variants. Thus, forecasting evolution requires a system with measurable selection pressures, and where certain positively selected variants could produce more descendants than other variants and expand in the population (Desai and Fisher, 2007; Goyal et al., 2012; Neher and Hallatschek, 2013; Neher et al., 2014). Actually, a rougher fitness landscape resulting from selection can lead to greater accuracy in evolutionary predictions (Papkou et al., 2023; Rubin et al., 2023; Van Cleve and Weissman, 2015). Overall, an evolutionary process could be predictable to some extent (prediction errors are inevitable with any method and in any evolutionary scenario) depending on the strength of selection driving evolution and the heterogeneity in fitness among different variants.

Here, we focus on forecasting protein evolution because it involves molecular evolutionary processes driven by selection pressures and where the fitness of each variant can be parameterized and predicted (Carneiro and Hartl, 2010; Gilson et al., 2017). Traditionally, evolutionary histories and ancestral sequences of proteins are inferred using probabilistic methods based on advanced substitution models of protein evolution (e.g. Arenas, 2015b; Arenas and Bastolla, 2020; Ferreiro et al., 2022; Malcolm et al., 1990; Moreira et al., 2023; Stackhouse et al., 1990; Thornton et al., 2003; Ugalde et al., 2004). The accuracy of these inferences is affected by the accuracy of the applied substitution model, where substitution models that better fit with the study data usually produce more accurate phylogenetic trees and ancestral sequences (Arenas and Bastolla, 2020; Del Amparo and Arenas, 2022a; Lemmon and Moriarty, 2004). These findings suggest that accurate substitution models of evolution are also convenient for forecasting protein evolution. In this regard, a variety of studies showed that structurally constrained substitution (SCS) models of protein evolution provide more accurate evolutionary inferences than the traditional empirical substitution models of protein evolution, in terms of phylogenetic likelihood, distribution of amino acid frequencies among protein sites, rates of molecular evolution and folding stability of reconstructed proteins, among other aspects (e.g. Arenas and Bastolla, 2020; Arenas et al., 2016a; Bastolla et al., 2006; Bordner and Mittelmann, 2014; Del Amparo et al., 2023; Echave and Wilke, 2017; Ferreiro et al., 2024a; Ferreiro et al., 2024b; Fornasari et al., 2002; Parisi and Echave, 2001; Pascual-García et al., 2019; Rodrigue et al., 2005), although SCS models usually demand more computational resources than substitution models that only include information from the protein sequence. Notice that the protein structure provides information about the location and molecular interactions of amino acids at different protein sites, which could be far from each other in the sequence but close in the three-dimensional structure and interact affecting their evolution (Morcos et al., 2011; Ruiz-González and Fares, 2013). Indeed, selection from the protein folding stability is relevant in the evolution of multiple proteins, including those in microbial and viral systems (e.g. Ferreiro et al., 2022; Gong et al., 2013; Jacquier et al., 2013; Rodrigues et al., 2016; Wylie and Shakhnovich, 2011; Zeldovich et al., 2007). Therefore, we believe that it should be taken into account for forecasting protein evolution in such systems.

Predictions about future evolutionary events can be performed with simulation-based methods (e.g. Eccleston et al., 2023; Neher et al., 2014; Yoshida et al., 2023). In order to simulate molecular evolution, traditional population genetics methods apply two separate steps (Arenas, 2012; Hoban et al., 2012). First, the simulation of the evolutionary history (i.e. a phylogenetic tree) using approaches such as the coalescent and birth-death population processes (Gernhard, 2008; Hudson, 1990; Kingman, 1982; Stadler, 2010). Afterward, the forward-in-time simulation of molecular evolution is performed, from the root node to the tip nodes, upon the previously simulated evolutionary history (Yang, 2006). This methodology was implemented into a variety of population genetics frameworks that simulate molecular evolution (Arenas, 2012; Hoban et al., 2012). However, for technical and computational simplicity, it assumes that the simulation of the evolutionary history is independent from the simulation of molecular evolution, which can produce biological incoherences (i.e. the evolutionary history is usually simulated under neutral evolution while molecular evolution is usually simulated with substitution models that consider selection). To enhance the realism of this modeling, here we merged both processes into a single one where evolutionary history influences molecular evolution and vice versa. In particular, we adopted a birth-death population genetics method to simulate the forward-in-time evolutionary history already used for forecasting evolution (Lässig and Łuksza, 2014; Neher et al., 2014), taking into account the fitness of the molecular variant (through evolutionary constraints from the protein folding stability Bastolla and Demetrius, 2005; Gong et al., 2013; Liberles et al., 2012; Zeldovich et al., 2007) at the corresponding node, to determine its subsequent birth or death event, and we integrated this process with SCS models to model protein evolution along the derived phylogenetic branches. The method is detailed below, and we implemented it into a new version of our computer framework ProteinEvolver (Arenas et al., 2013), which is freely available from https://github.com/MiguelArenas/proteinevolver (Arenas, 2025) ProteinEvolver2 includes detailed documentation and a variety of ready-to-use examples. Next, considering the potential applications of forecasting evolution to design vaccines and therapies against pathogens, we evaluated and applied the method to forecasting protein evolution in several real protein data of viruses monitored over time.

Methods

A method for forecasting protein evolution by combining birth-death population genetics with structurally constrained substitution models of protein evolution

Following previous methods for forecasting evolution based on simulations (Lässig and Łuksza, 2014; Neher et al., 2014), we developed a method to simulate the forward-in-time evolutionary history of a protein sample with a birth-death process that considers the fitness of the protein variant (based on folding stability) at every temporal node. The method derives the birth and death rates for a protein variant based on the fitness of that variant, where a high fitness results in a high birth rate and a low death rate, which can lead to a large number of descendants, and the opposite leading to a few or none (extinction) descendants. Thus, the fitness of the molecular variant at every node drives its corresponding forward in time birth-death evolutionary history. The details of this simulation process are outlined below.

First, similarly to common simulators of molecular evolution (Arenas, 2012; Hoban et al., 2012; Yang, 2006), a given protein sequence and structure (hereafter, protein variant) is assigned to the root node. The fitness (f) of the protein variant (A) is calculated from its folding stability (free energy, ΔG) following the Boltzmann distribution (Goldstein, 2013) (Equation 1, which takes values from 0 to 1),

f(A)=11+eΔG/kT (1)

Protein folding stability constrains protein evolution and is commonly used to obtain protein fitness (Bastolla et al., 2007; Goldstein, 2013; Liberles et al., 2012; Lobkovsky et al., 2010; Mendez et al., 2010; Sella and Hirsh, 2005; Zeldovich et al., 2007). The user can alternatively choose whether the fitness of the modeled protein variant is determined solely by its folding stability or by its similarity to the stability of a real protein variant (i.e., a protein structure from the Protein Data Bank, PDB). We believe the latter can be more realistic, as in nature, high folding stability does not necessarily indicate high fitness, but a stability that closely resembles that of a real protein may suggest high fitness since the real stability is the result of a selection process (which also incorporates negative design). If the fitness is derived from only the folding stability of the protein variant, the birth rate (b) is considered equal to the fitness. Alternatively, if the fitness is determined based on the similarity in folding stability between the modeled variant and a real variant, the birth rate is assumed to be 1 minus the root mean square deviation (RMSD, which offers advantages such as minimizing the influence of small deviations while amplifying larger differences, thereby enhancing the detection of remarkable molecular changes) in folding stability. Notice that the smaller this difference, the higher the birth rate. In both cases, the death rate (d) is considered as 1-b to allow a constant global (birth-death) rate. In this model, the fitness influences reproductive success, where protein variants with higher fitness have higher birth rates leading to more birth events, while those with lower fitness have higher death rates leading to more extinction events. This parameterization is meaningful in the context of protein evolution because the fitness of a protein variant can affect its survival (birth or extinction) without necessarily altering its rate of evolution. Although a higher growth rate can sometimes correlate with higher fitness, a variant with high fitness does not necessarily accumulate substitutions more rapidly.

Additionally, we incorporated another birth-death model that follows the proposal by Neher et al., 2014, in which the death rate is fixed at 1 and the birth rate is modeled as 1+fitness. In this model, fitness not only affects reproductive success but also influences the global birth-death rate, which can vary among lineages.

The birth-death process is simulated forward in time deciding whether every next event is a birth or a death event (Harmon, 2019; Stadler, 2010; Stadler, 2011) according to the fitness of the corresponding molecular variant, and it ends when a user-specified criterion, such as a particular sample size (considering or ignoring extinction nodes) or a certain evolutionary time (te), is reached. Starting from an ‘active’ node (i.e. the root node) at current time tc, the time to the next event (birth to produce two descendants or death to produce the extinction of the node) can be calculated (details below). In contrast to standard birth-death processes where birth and death rates are constant over time and among lineages, the present method considers heterogeneity where each protein variant at a node has specific birth and death rates according to its corresponding fitness. The method is described below and summarized in Figure 1.

Figure 1. Illustrative example of forward in time simulation of protein evolution integrating a birth-death population evolutionary process with fitness from the protein folding stability and the modeling of protein evolution with a structurally constrained substitution model.

Figure 1.

Given a protein variant assigned to a node at time t (blue node), its fitness is calculated considering its protein folding stability. Then, the fitness is used to determine the birth and death rates for that variant, which provide the time to the next birth or death event (horizontal dashed line) that corresponds to the forward-in-time branch length. Next, the variant is evolved forward in time toward each descendant, upon the previously determined branch length, under an SCS model of protein evolution. The process is repeated, forward in time, starting at each new variant. If a death event occurs, the variant of the extinct node (pink node) is obtained, but it does have descendants. The process finishes when a particular sample size or simulation time is reached (i.e. t+n).

  • (1) The process starts at the root node, assigning a user-specified protein sequence and corresponding protein structure (i.e. obtained from the PDB) to that node. In general, for every protein variant assigned to an active node, the corresponding birth and death rates are calculated following the indications presented above.

  • (2) Calculation of the time to the next birth or death event. Following common birth-death methods, the time to the next event tn is calculated through an exponential distribution with rate based on the number of active nodes (s) and the sum of the birth and death rates (Harmon, 2019; Equation 2),

tn=e(s(b+d)) (2)
  • One of the birth-death models that we implemented considers that b+d = 1 at each node, allowing variation of the reproductive success among nodes while keeping tn consistent among them, according to Harmon, 2019. In contrast, in the other birth-death model we implemented, b+d can vary among nodes (Neher et al., 2014), thereby allowing variation of both reproductive success and tn among nodes.

  • (3) Evaluate whether the simulation concludes before the next event occurs (tc +tn).

  • (4) If it does conclude (i.e. tc +tn higher than te), the simulation of the evolutionary history finishes.

  • (5) If it does not conclude, a random active node is selected, and its protein variant is analyzed to determine its type of next event (birth or death). The probability of a birth event is Pb = b/(b+d) and the probability of a death event is Pd = d/(b+d). A random sample from those probabilities is taken to determine the type of evolutionary event.

  • (6) If a birth event is selected. Two descendant active nodes from the study node are incorporated with branch lengths tn, and the study node is then considered inactive. Next, molecular evolution is simulated from the original node to each descendant node based on the specified SCS model of protein evolution (Arenas et al., 2013). The integration of SCS models to evolve protein variants along given branch lengths followed standards approaches of molecular evolution in population genetics, in which the branch length and the substitution model inform the number and type of substitution events, respectively (Arenas, 2012; Carvajal-Rodríguez, 2010; Hoban et al., 2012; Yang, 2006). Thus, the process results in a protein variant for every descendant node. Finally, the folding stability and subsequent fitness of these descendant protein variants are calculated.

  • (7) If a death event is selected. Then, the study node is considered inactive.

  • (8) Return to step 1 while the user-specified criterion for ending the birth-death process is not satisfied and at least one active node exists in the evolutionary history. Otherwise, the simulation ends.

This process simultaneously simulates, forward in time, evolutionary history and protein evolution, with protein evolution influencing the evolutionary history through selection from the folding stability. Indeed, selection can vary among protein variants at their corresponding nodes of the evolutionary history. The process produces a forward in time birth-death phylogenetic history that encompasses nodes that reached the ending time, internal nodes, and nodes that were extinct at some time, along with the protein variant associated with each node.

  • The method includes several optional capabilities listed in Supplementary file 1A and in the software documentation, with some summarized below.

  • The user can fix the birth and death rates along the evolutionary history or specify that they are based on the fitness of the corresponding protein variant as described before. For the latter, the fitness can be based on the folding stability of the analyzed protein variant or on the similarity in folding stability between the analyzed and real protein variants. In addition, the global birth-death rate can be constant or vary among lineages depending on the specified birth-death model.

  • The birth-death simulation can finish when any of the following criteria is met: (a) A specified sample size, including extinction nodes derived from death events. (b) A specified sample size, excluding extinction nodes. (c) A specified evolutionary time, measured from the root to a tip node.

  • Extinction nodes can either be preserved or removed from the evolutionary history.

  • Evaluation of the simulated birth-death evolutionary history in terms of tree balance using the Colless index (Colless and Wiley, 1982; Lemant et al., 2022).

  • Possible variation of the site-specific substitution rate according to user specifications.

  • Customizable substitution model featuring site-specific exchangeability matrices (relative rates of change among amino acids and their frequencies at the equilibrium).

  • Implementation of several SCS models and a variety of empirical substitution models of protein evolution. The implemented SCS models were evaluated in our previous work (Arenas et al., 2013), and the implemented empirical substitution models were properly identified using simulated data with ProtTest3 (Darriba et al., 2011).

The implemented SCS models consider molecular energy functions based on amino acid contact matrices and configurational entropies per residue in unfolded and misfolded proteins (Arenas et al., 2013; Bastolla et al., 2007; Mendez et al., 2010). These models incorporate both positive and negative design strategies. In particular, the evaluation of the target protein structure while taking into account a database of residue contacts from alternative protein structures in the PDB, thus considering background genetic information that helps reduce prediction biases (Minning et al., 2013). Technical details about these SCS models are presented in our previous study (Arenas et al., 2013). Next, SCS models outperformed models that ignore structural evolutionary constraints in terms of phylogenetic likelihood, among other properties (Arenas et al., 2013; Arenas et al., 2016a; Bordner and Mittelmann, 2014). The method also implements common empirical substitution models of protein evolution (i.e. Blosum62, CpRev, Dayhoff, DayhoffDCMUT, FLU, HIVb, HIVw, JTT, JonesDCMUT, LG, Mtart, Mtmam, Mtrev24, RtRev, VT, and WAG; Supplementary file 1A) and the user can specify any particular exchangeability matrix for all sites or for each site, allowing for heterogeneity of the substitution process among sites (details in Supplementary file 1A and in the software documentation). In addition, the framework implements heterogeneous substitution rates among sites by the traditional Gamma distribution (+G; Yang, 1996) and proportion of invariable sites (+I; Fitch and Margoliash, 1967), and also the user can directly alter the substitution rate at each site for any empirical or SCS model (Table S1). Regarding the evolutionary history, in addition to the birth-death process presented before, the user can specify a particular phylogenetic tree or simulate a coalescent evolutionary history (Hudson, 1983; Kingman, 1982; Supplementary file 1A). In this regard, we maintained the capabilities of the previous version, including the coalescent with recombination (Hudson, 1983) which can be homogeneous or heterogeneous along the sequence according to Wiuf and Posada, 2003, variable population size over time (growth rate or demographic periods), several migration models that is island (Hudson, 1998), stepping-stone (Kimura and Weiss, 1964), and continent-island Wright, 1931 with temporal variation of migration rates and convergence of demes or subpopulations, simulation of haploid or diploid data, and longitudinal sampling (Navascués et al., 2010; details in Supplementary file 1A and in the software documentation). The framework outputs a simulated multiple sequence alignment with the protein sequences of the internal and tip nodes, as well as their folding stabilities and the evolutionary history, among other information (Supplementary file 1A and software documentation).

Study data for evaluating the method for forecasting protein evolution

We evaluated the accuracy of the method for forecasting protein evolution using viral proteins sampled over time (longitudinal sampling). Specifically, we used protein sequences from previous experiments of virus evolution monitored over time, which contain consensus molecular data (avoiding rare variants) that belong to different evolutionary time points.

  • The matrix (MA) protein of HIV-1, with data obtained from an in vitro cell culture experiment where samples were collected at different times (Arenas et al., 2016b). This data included 21 MA protein consensus sequences collected at times (population passages, T) T1 (initial time and includes an initial sequence) and T31 (time after 31 passages and includes 20 sequences), which accumulated 48 amino acid substitutions (sequence identity 0.973; Figure 2).

  • The main protease (Mpro) and papain-like protease (PLpro) proteins of SARS-CoV-2. For each protein, the data includes the first sequenced variant (Wuhan) and a sequence built with all the substitutions observed in a dataset of 384 genomes of the Omicron variant of concern collected from the GISAID database. The resulting Mpro and PLpro sequences presented 10 and 22 substitutions (sequence identity 0.967 and 0.930), respectively.

  • The non-structural protein 1 (NS1) of the influenza virus. We retrieved the NS1 sequences from the years 2005, 2015 and 2020 from the Influenza Virus Resource (Bao et al., 2008). Next, we obtained the consensus sequence of the 2005 dataset as initial time point (T1), and the consensus sequences from the 2015 (T2) and 2020 (T3) datasets as subsequent time points. The resulting consensus sequences for 2015 and 2020 showed 40 and 37 substitutions (sequence identity 0.802 and 0.817), respectively.

  • The protease (PR) of HIV-1 with data sampled from patients monitored over time, from 2008 to 2017, available from the Specialized Assistance Services in Sexually Transmissible Diseases and HIV/AIDS in Brazil (Ferreiro et al., 2022; Santos-Pereira et al., 2021; Souto et al., 2021). These evolutionary scenarios are complex due to the diverse antiretroviral therapies administered to the patients (Supplementary file 1B), which could vary during the studied time periods and that could promote the fixation of specific mutations (i.e. associated with resistance; Ferreiro et al., 2022; Santos-Pereira et al., 2021; Souto et al., 2021). For each viral population (patient), the data included a consensus sequence for each of the four or five samples collected at different time points. These consensus sequences exhibited between one and 22 amino acid substitutions with respect to the consensus sequence of the corresponding first sample (Supplementary file 1B).

Figure 2. Distribution of amino acid substitutions observed along the HIV-1 MA sequences at time T31.

Figure 2.

Left: Distribution of the observed amino acid substitutions along the HIV-1 matrix (MA) protein sequences at time T31. Right: Distribution of the indicated amino acid substitutions (shown in blue) along the protein structure.

We identified a representative protein structure for each dataset, which was used to predict the folding stability and to inform the SCS model. In particular, we obtained the protein structures of the sequences at the initial time (T1) from the PDB (Supplementary file 1C). For the case of the HIV-1 protease, we obtained the protein structure through homology modeling. We used SWISS-MODEL (Arnold et al., 2006) to identify the best-fitting templates of PDB structures (Supplementary file 1C). Next, we predicted the protein structures by homology modeling with Modeller (Sali and Blundell, 1993) using the protein sequences and corresponding best-fitting structural templates.

Forward in time prediction of viral protein variants

The evaluation was performed with the previously presented real data, which includes a real protein variant present at the initial time (T1) and subsequent protein variants present at a later time (Tn). We applied the method for forecasting protein evolution to predict the most likely protein variants at Tn derived from the real protein variant observed at the initial time (T1). The prediction error was determined by measuring the distance between the real protein variants and the predicted variants corresponding to time Tn.

Thus, we assigned to the initial node (T1) the corresponding real protein variant (including its sequence and structure), and its evolution was simulated forward in time until it reached the number of substitutions observed in the real data at time Tn. Thus, we considered the number of observed substitutions as a measure of evolutionary time to allow proper comparisons between real and predicted variants. We simulated 100 alignments of protein sequences, each containing the same number of sequences as the real data at Tn. This included 20 HIV-1 MA protein sequences, a consensus sequence for the SARS-CoV-2 Mpro, a consensus sequence for the SARS-CoV-2 PLpro, a consensus sequence of the influenza NS1 protein for each time point, and a consensus sequence of HIV-1 PR for each viral population. To avoid rare variants, each sequence of the simulated multiple sequence alignments was obtained as the consensus of 100 linked simulated sequences.

To investigate the effect of selection on the predictions, we compared the accuracy of forecasting protein evolution when selection from the protein structure is considered and when it is ignored (neutral evolution). If selection is considered, as previously presented, the probability of birth and death events was based on the fitness of the protein variants, and protein evolution was modeled using an SCS model (Arenas et al., 2013). In the case of neutral evolution, all protein variants equally fit and are allowed. Since variants are observed, we allowed birth events. However, it assumed the absence of death events as no information independent of fitness is available to support their inclusion, thereby avoiding the imposition of arbitrary death events based on an arbitrary death rate. Also, to model neutral evolution, we used an exchangeability matrix with the same relative rates of change to all amino acid pairs.

Accuracy of the predicted protein variants

We assessed the accuracy of the method for forecasting protein evolution by comparing the predicted and real protein variants present at time Tn, both at the protein sequence and structure levels.

For data containing multiple sequences at time Tn (i.e. HIV-1 MA dataset), we calculated the Kullback-Leibler (KL) divergence, which provides a distance between two multiple sequence alignments (the real and predicted data) based on the distribution of amino acid frequencies along the sequences (Equation III, where factors P and Q denote the distribution of amino acid frequencies in the real and predicted protein sequences at time Tn, respectively, i refers to protein site) (Kullback and Leibler, 1951). This distance was only calculated for data with a set of sequences at time Tn (HIV-1 MA) because a single sequence does not provide site-specific variability.

KL(PQ)=iPi×log(PiQi) (3)

We also compared the real and predicted evolutionary trajectories of protein variants using the Grantham distance, which measures the differences between amino acids based on their physicochemical properties (Grantham, 1974). In particular, for both real and predicted protein variants, we calculated the Grantham distance at each protein site that differs between the two datasets, considering its evolution from T1 to the subsequent multiple sequence alignment at Tn. We examined sites that varied over time, thus the general site-specific Grantham distance Gi was calculated as the frequency of each amino acid f at site i multiplied by the specific Grantham distance between amino acid m at time Tn and amino acid n at time T1, normalized with the largest Grantham distance Gmax to obtain values between 0 and 1 (Equation IV). Next, to compare the real and predicted data, we calculated the site-specific difference of Grantham distance Gbi between the real P and predicted Q protein variants (Equation V),

Gi=m=120fm×G(m,n)Gmax (4)
Gbi=∣GP,iGQ,i (5)

In addition, we obtained and compared the protein folding stability (ΔG) of the predicted and real protein variants observed at time Tn, using their corresponding protein structures, with DeltaGREM (Arenas et al., 2016a; Minning et al., 2013).

Results

Implementation of the forecasting protein evolution method

We extended the previous version of our framework ProteinEvolver (Arenas et al., 2013), maintaining its previous capabilities (i.e. simulation of protein evolution upon user-specified phylogenetic trees and upon phylogenetic trees simulated with the coalescent with or without recombination, migration, demographics and longitudinal sampling, empirical and SCS models, among others; Supplementary file 1A), by adding, among others (Supplementary file 1A), the forward in time modeling of protein evolution that combines a birth-death process based on the fitness of every protein variant (folding stability) at each node to determine its birth and death rates, as well as SCS models of protein evolution. The framework ProteinEvolver2 is written in C and distributed with a detailed documentation and a variety of illustrative practical examples. The framework is freely available from https://github.com/MiguelArenas/proteinevolver (Arenas, 2025).

Evaluation of predictions of HIV-1 MA evolution

Regarding the evolution of the HIV-1 MA protein, the Grantham distance and the KL divergence between the real variants at time Tn and the corresponding predicted variants were low (around 5% and 6%, respectively; Table 1), and they did not differ comparing predictions that consider selection on the folding stability (including birth-death models with constant and variable global birth-death rate among lineages) and predictions that ignore it (Table 1). On the other hand, we found that the folding stability of the protein variants predicted considering selection on the folding stability (again, including birth-death models with constant or variable global birth-death rate among lineages) was closer to the folding stability of the real protein variants than that of the protein variants predicted under neutral evolution (Table 1). In particular, the protein variants predicted ignoring selection were less stable than those predicted considering selection and also less stable than the real protein variants.

Table 1. Comparison of real and predicted sequences of the HIV-1 MA protein considering predictions based on the SCS and neutral models.

For the data simulated under the SCS [including birth-death models with constant (SCS) and variable global birth-death rate among lineages (GlobalBDvar)] and neutral models, the table shows the Grantham distance between the amino acids that changed during the real and predicted evolutionary trajectories and the Kullback-Leibler (KL) divergence between the real and predicted multiple sequence alignments. Next, it shows the folding stability (ΔG) of the real protein variants at times T1 and T31 and the folding stability of the predicted protein variants at time T31. The error corresponds to the 95% confidence interval from the mean (100 samples) of predictions of folding stability.

Grantham distance KL divergence ΔG of the real variant at T1 (kcal/mol) ΔG of the real variants at T31 (kcal/mol) ΔG of the predicted variants at T31 (kcal/mol) ΔΔG (kcal/mol) at T31 (predicted – real variants)
SCS model 5% 6% –9.72 –10.34±0.14 –9.96±0.02 0.38
SCS GlobalBDvar model 5% 6% –9.72 –10.34±0.14 –10.03±0.03 0.31
Neutral model 5% 6% –9.72 –10.34±0.14 –9.21±0.07 1.14

Evaluation of predictions of SARS-CoV-2 Mpro and PLpro evolution

The analyses of the SARS-CoV-2 Mpro and PLpro data showed Grantham distances between the real and predicted sequences around 25% and 36%, respectively (Figure 3A). Again, this distance was similar when comparing predictions based on models that consider selection on the protein folding stability (including birth-death models with constant or variable global birth-death rate among lineages) and a model of neutral evolution. Regarding comparisons based on the protein folding stability, we found again that the models that consider selection from the folding stability produce variants closer to the stability of the real protein variants than the model that ignores selection (Figure 3B). Indeed, protein variants derived from the models that consider selection were more stable than those derived from the model of neutral evolution. Next, we did not find statistically significant differences in sequence similarity or folding stability between variants predicted under birth-death models with constant or variable global birth-death rate among lineages (Figure 3).

Figure 3. Prediction error of SARS-CoV-2 Mpro and PLpro evolution under SCS and neutral models regarding physicochemical properties of the amino acid changes accumulated during the evolutionary trajectories and protein folding stability.

Figure 3.

Predictions based on data simulated under the SCS [including birth-death models with constant (SCS) and variable global birth-death rate among lineages (GlobalBDvar)] and neutral models. (A) Grantham distance calculated from the amino acid changes that occurred during the real and predicted evolutionary trajectories based on SCS and neutral models of protein evolution. (B) Variation of protein folding stability (ΔΔG) between real and predicted protein variants based on SCS and neutral models of protein evolution. Notice that positive ΔΔG indicates that the real protein variants are more stable than the predicted protein variants and vice versa. Error bars correspond to the 95% confidence interval of the mean of prediction error from 100 multiple sequence alignments simulated for the corresponding population and time.

Evaluation of predictions of influenza NS1 protein evolution

The evolutionary predictions for the influenza NS1 protein varied depending on the model used. Specifically, at the sequence level and for the two prediction time points studied, Grantham distances between the real and predicted protein sequences were around 23.5% for the models that incorporated structural evolutionary constraints, compared to about 25.5% for the neutral model (Figure 4A). These differences became more pronounced when examining predictions based on protein folding stability. For both time points, models that included selection consistently generated protein variants with stability more similar to that of the observed variants than those predicted by the neutral model (Figure 4B). Indeed, sequences predicted by the model that accounts for selection were generally more stable than those predicted under neutral evolution. Again, we found no statistically significant differences in sequence similarity or folding stability between variants predicted under birth-death models with constant or variable global birth-death rate among lineages (Figure 4).

Figure 4. Prediction error of influenza NS1 protein evolution under SCS and neutral models regarding physicochemical properties of the amino acid changes accumulated during the evolutionary trajectories and protein folding stability.

Figure 4.

Predictions were performed for two time points (longitudinal samples T2 and T3). Predictions based on data simulated under the SCS [including birth-death models with constant (SCS) and variable global birth-death rate among lineages (GlobalBDvar)] and neutral models. (A) Grantham distance calculated from the amino acid changes that occurred during the real and predicted evolutionary trajectories based on SCS and neutral models of protein evolution. (B) Variation of protein folding stability (ΔΔG) between real and predicted protein variants based on SCS and neutral models of protein evolution. Notice that positive ΔΔG indicates that the real protein variants are more stable than the predicted protein variants and vice versa. Error bars correspond to the 95% confidence interval of the mean of prediction error from 100 multiple sequence alignments simulated for the corresponding population and time.

Evaluation of predictions of HIV-1 PR evolution

In general, the Grantham distance, which compared the evolutionary trajectories of the real and predicted protein variants from time T1 to later times, varied among viral populations (patients; Figure 5A). However, for the majority of these populations, the distance remained below 30% and exhibited minor fluctuations over time. One particular population exhibited a notable trend, with the distance increasing from 10% to nearly 60% over time. Considering that the length of the evolutionary trajectories of the protein can differ among the studied populations, we explored whether the accumulated number of amino acid substitutions could affect the accuracy of the predictions, and we found that the number of substitutions varied among populations and this variability did not correlate with the Grantham distance between the real and predicted data (R2=0.0001, Figure 5B). In general, the folding stability of the predicted protein variants was similar or slightly less stable than that of the real protein variants [with a difference ranging from 0 to 9 kcal/mol and a mean of 3.1±0.9 (95% CI) kcal/mol; Figure 5C]. Indeed, the folding stability exhibited small fluctuations, increasing and decreasing, over time.

Figure 5. Prediction error of HIV-1 PR evolution at diverse populations regarding physicochemical properties of the amino acid changes accumulated during the evolutionary trajectories and protein folding stability.

Figure 5.

(A) For each viral population (patient, represented with a particular color) and time, Grantham distance calculated from the amino acid changes that occurred during the real and predicted evolutionary trajectories. For each population, the mean of distances obtained over time is shown on the right. (B) Relationship between Grantham distances and accumulated number of substitution events (R2=0.0001, which indicates a lack of correlation between these parameters). (C) Variation of protein folding stability (ΔΔG) between real and predicted protein variants at each viral population and time. For each population, the mean of distances obtained over time is shown on the right. Notice that positive ΔΔG indicates that the real protein variants are more stable than the predicted protein variants and vice versa. Error bars correspond to the 95% confidence interval of the mean of prediction error from 100 multiple sequence alignments simulated for the corresponding viral population and time.

Discussion

While reconstructing evolutionary histories and ancestral sequences, among other inferences about the past, was popular in the field, predictions about evolution toward the future were traditionally ignored due to potential high prediction errors. However, in the last decade, forecasting evolution has gained attention because of its variety of applications and advancements in models of evolution (see the reviews Lässig et al., 2017; Wortel et al., 2023). Several studies showed that forecasting evolution can be feasible in some systems, including virus evolution (Luksza and Lässig, 2014; Thadani et al., 2023). Here, we also investigated forecasting evolution in viruses but at the molecular level, considering the relevance of the substitution process to produce molecular variants that affect the viral fitness (Arenas, 2015a; Arenas et al., 2016b; Bloom and Neher, 2023; Poon et al., 2007; Watabe and Kishino, 2010). We believe that forecasting genome evolution remains challenging due to the complexity of its evolutionary processes [i.e. including epistatic interactions, chromosomal rearrangements, and heterogeneous substitution patterns among genomic regions (Arbiza et al., 2011), among others] that complicate the parameterization and prediction of accurate fitness landscapes. However, we believe that it could be more feasible in structural proteins because of their relatively simpler evolutionary processes that usually include selection on folding stability (Bloom et al., 2006; Goldstein, 2011).

To make evolutionary predictions over time, either toward the past or toward the future, considering a substitution model of molecular evolution can be convenient. Actually, a variety of traditional studies showed that the substitution model influences the phylogenetic likelihood generated by probabilistic approaches used for evolutionary inference (Lemmon and Moriarty, 2004; Yang et al., 1994; Zhang and Nei, 1997). To study protein evolution, empirical substitution models of protein evolution are routinely used because they allow rapid calculations based on the assumption of site-independent evolution and typically apply the same exchangeability matrix for all the protein sites, which is highly unrealistic (Echave et al., 2016). Besides, a small set of empirical substitution models was developed for modeling the evolution of the diverse range of viral proteins (Dang et al., 2010; Del Amparo and Arenas, 2022b; Dimmic et al., 2002; Le and Vinh, 2020; Nickle et al., 2007). As an alternative, some substitution models that consider evolutionary constraints on the protein structure incorporated site-dependent evolution, which allows a more accurate modeling of protein evolution compared to the empirical substitution models (Arenas et al., 2013; Arenas et al., 2016a; Bordner and Mittelmann, 2014; Ferreiro et al., 2024a; Parisi and Echave, 2005).

According to previous methods for forecasting evolution based on computer simulations (Lässig and Łuksza, 2014; Neher et al., 2014), we also adopted the birth-death population genetics process to allow a forward-in-time evolutionary process where the birth and death rates can differ among nodes (Stadler, 2010; Stadler, 2011). Traditional population genetics methods to simulate the evolution of molecular data implement a two-step simulation process, where, first, the evolutionary history is simulated [i.e. with the coalescent theory or a forward-in-time simulation approach (Arenas, 2012; Hoban et al., 2012), often assuming neutral evolution] and, in a subsequent step, molecular evolution is simulated upon the previously obtained evolutionary history (Yang, 2006). However, this methodology is unrealistic because the fitness of the data can affect its evolutionary history (i.e., a variant with high fitness is likely to have more descendants than a variant with low fitness). Thus, we designed and implemented a method for forecasting protein evolution that integrates a birth-death population genetics process (including the modeling of constant and variable global birth-death rate among lineages) with a SCS model of protein evolution, where the folding stability of each protein variant is evaluated to predict its future trajectory in terms of both evolutionary history and molecular evolution. We implemented this forward-in-time simulation of protein evolution in a new version of our previous framework ProteinEvolver (Arenas et al., 2013), while maintaining its previous capabilities and extending some of them (i.e. incorporation of site-specific exchangeability matrices and additional substitution models of protein evolution, among others; see Supplementary file 1A and software documentation). As any other method for forecasting evolution, the present method ignores possible environmental shifts that are inherently unpredictable and that could affect the accuracy of the predictions. Next, we evaluated the forward-in-time evolutionary predictions with real data of HIV-1, SARS-CoV-2, and influenza virus proteins. We determined the prediction errors between the real and the predicted protein variants by examining dissimilarity in evolutionary trajectory (Grantham distance based on physicochemical properties among the differing amino acids during the evolutionary trajectories), sequence divergence (distribution of amino acid frequencies among sites using the KL divergence), and protein structure (protein folding stability). Additionally, we analyzed the influence of accounting for selection, based on the protein folding stability, on the predictions. We also evaluated birth-death models incorporating either constant or variable global birth-death rates among lineages. Notably, in the variable-rate model, fitness can influence both reproductive success and the rate of molecular evolution.

In general, the sequence and evolutionary trajectory dissimilarities between the real and predicted protein variants were relatively small, with some variations among the study proteins. For the HIV-1 MA protein, the SARS-CoV-2 Mpro and PLpro, and influenza NS1 protein, the sequence dissimilarity was below 10%, 26%, 36%, and 26%, respectively. The low prediction errors for the HIV-1 MA protein were expected because this dataset was derived from an in vitro cell culture experiment that is not influenced by a variety of external factors that could affect the predictions in in vivo experiments. As a rough reference, in traditional ancestral sequence reconstruction (ASR) of protein data with high sequence identity, the error was approximately 2% (Arenas and Posada, 2010). Notice that ASR methods can exhibit low prediction errors due to their statistical evaluation based on a set of original sequences, rather than relying on a single original sequence used in the case of forecasting evolution. The contrasting scenario regarding evolutionary complexity was the HIV-1 PR data, which was sampled from populations (patients) undergoing various antiretroviral treatments (Ferreiro et al., 2022). There, the sequence dissimilarity varied among viral populations, although most were below 30%. A viral population exhibited higher dissimilarities (near 60%) that we believe could be caused by molecular adaptations promoted by the therapies, although this needs formal investigation.

An unexpected result was that the model of neutral evolution produced sequence dissimilarities between the real and predicted protein variants that were quite similar to those obtained with the SCS model (Table 1 and Figures 3 and 4). A few studies indicated that the substitution model has negligible effects on the reconstructed phylogenetic trees (Abadi et al., 2019; Spielman, 2020). Subsequent studies found that the influence of the substitution model on phylogenetic reconstructions is dependent on the diversity of the data, where data with high diversity is more sensitive to the applied substitution model due to containing more evolutionary information to be modeled (Del Amparo and Arenas, 2023). Considering that the studied data present overall low diversity [i.e. sequence identity of 0.973, 0.967, 0.930, 0.802, and 0.817 for the HIV MA data, SARS-CoV-2 Mpro data, SARS-CoV-2 PLpro data, and influenza NS1 protein data (for each prediction time point, T2 and T3), respectively; it is important to note that, in general, longitudinal data derived from monitored evolutionary processes usually show a low diversity because they involve relatively short evolutionary histories, among other factors.], we believe that the influence of the applied substitution models on the prediction of the sequences was small because of the small number of modeled substitution events, as found in phylogenetic reconstructions (; Del Amparo and Arenas, 2023). Actually, in the case of the influenza NS1 protein dataset, which had the highest sequence diversity among the study datasets, the sequences predicted under the SCS models were more similar to the real sequences than those derived from predictions under neutral evolution. Overall, the prediction accuracy varied among the studied evolutionary scenarios; as expected, it was lower in the more complex scenarios. Indeed, datasets with higher sequence diversity contain more evolutionary signals, which can improve prediction quality.

We also evaluated the prediction error between the real and predicted protein variants regarding their folding stability, again comparing the predictions made under a model that considers structural constraints and a model of neutral evolution. In general, the protein variants predicted under the SCS model presented a folding stability close to the folding stability of the respective real protein variants, with differences below 1, 2, 7, and 1 kcal/mol for the HIV MA data, SARS-CoV-2 Mpro data, SARS-CoV-2 PLpro data, and influenza NS1 protein data, respectively. The higher differences (around 9 kcal/mol) were again observed for the HIV-1 PR data. In contrast to the prediction error based on sequence dissimilarity, the prediction error based on folding stability varies between predictions obtained under the SCS model and those obtained under the neutral model. In the studied evolutionary scenarios, the protein variants predicted under the neutral model were less stable and farther from the stability of the real protein variants compared to those predicted under the SCS model. These results were expected because, under SCS models, protein stability can be modeled with greater accuracy than sequence similarity due to selection for maintaining stability in the protein structure despite amino acid changes (Arenas and Bastolla, 2020; Illergård et al., 2009; Pascual-García et al., 2010). Indeed, previous studies showed that models that ignore structural constraints often produce proteins with unrealistic folding instability (Arenas and Bastolla, 2020; Del Amparo et al., 2023), which suggests that accounting for protein folding stability in the modeling of protein evolution is recommended for predicting protein variants with appropriate structural properties.

Therefore, we found a good accuracy in predicting the real folding stability of forecasted protein variants, while predicting the exact sequences was more challenging, which was not surprising considering previous studies. In particular, inferring specific sequences is considerably challenging even for ancestral molecular reconstruction (Arenas and Bastolla, 2020; Arenas et al., 2017). Indeed, observed sequence diversity is much greater than observed structural diversity (Illergård et al., 2009; Pascual-García et al., 2010), and substitutions between amino acids with similar physicochemical properties can yield modeled protein variants with more accurate folding stability, even when the exact amino acid sequences differ. Further work is demanded in the field of substitution models of molecular evolution. We also found that datasets with relatively high sequence diversity can improve the accuracy of the predictions due to containing more evolutionary information. Apart from that, forecasting the folding stability of future real proteins is an important advance in forecasting protein evolution, given the essential role of folding stability in protein function (Bloom et al., 2006; Scheiblhofer et al., 2017) and its variety of applications.

Variation in the global birth-death rate among lineages showed minor effects on prediction accuracy, suggesting a limited role in protein evolution modeling. Molecular evolution parameters, particularly the substitution model, appear to be more critical in this regard.

In the context of protein evolution, substitution models are a critical factor (Arenas et al., 2017; Bordner and Mittelmann, 2014; Echave et al., 2016; Echave and Wilke, 2017; Liberles et al., 2012; Wilke, 2012), and the presented combination with a birth-death model constitutes a first approximation upon which next studies can build to better understand this evolutionary system. Next, the present method assumes that the protein structure is maintained over the studied evolutionary time, which can be generally reasonable for short timescales where the structure is conserved (Illergård et al., 2009; Pascual-García et al., 2010). Over longer evolutionary timescales, structural changes may occur and, in such cases, modeling the evolution of the protein structure would be necessary. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence (Abramson et al., 2024; Jumper et al., 2021), may offer promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real data with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research.

We present a method to simulate forward-in-time protein evolution accounting for evolutionary constraints from the protein structure and a birth-death population process, and where the evolutionary history is influenced by the protein evolution and vice versa. The method is implemented in the computer framework ProteinEvolver2, which is freely distributed with several practical examples and detailed documentation. We believe that implementing methods into freely available phylogenetic frameworks is important to facilitate practical applications, as well as future improvements and evaluations. We applied the method to forecast protein evolution in some viral proteins. We found that the method provides acceptable approximations to the real evolution, especially in terms of protein folding stability, suggesting that combining structural constraints with birth-death population processes in the modeling of protein evolution is convenient. Still, to advance in methods for forecasting protein evolution, we believe that further efforts should be made in the field to improve the modeling of protein evolution, such as the incorporation of site-dependent evolutionary constraints from the protein activity.

Acknowledgements

This work was supported by the Project PID2023-151032NB-C22 funded by MCIU/AEI/10.13039/501100011033 and by FEDER, UE. DF was funded by a fellowship from Xunta de Galicia [ED481A-2020/192]. We thank the “Centro de Supercomputación de Galicia (CESGA)” for the computer resources. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Miguel Arenas, Email: marenas@uvigo.es.

Anne-Florence Bitbol, Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland.

Aleksandra M Walczak, CNRS, France.

Funding Information

This paper was supported by the following grants:

  • Ministerio de Ciencia, Innovación y Universidades PID2023-151032NB-C22 to Luis Daniel González-Vázquez, Miguel Arenas.

  • Agencia Estatal de Investigación PID2023-151032NB-C22 to Luis Daniel González-Vázquez, Miguel Arenas.

  • Federación Española de Enfermedades Raras to Luis Daniel González-Vázquez, Miguel Arenas.

  • Xunta de Galicia ED481A-2020/192 to David Ferreiro.

Additional information

Competing interests

No competing interests declared.

Author contributions

Resources, Data curation, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing.

Resources, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing – review and editing.

Resources, Data curation, Formal analysis, Investigation, Visualization, Writing – review and editing.

Conceptualization, Resources, Software, Funding acquisition, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing.

Additional files

Supplementary file 1. Supplementary tables A–C and references cited in the supplementary file.
elife-106365-supp1.pdf (244.9KB, pdf)
MDAR checklist

Data availability

The computer framework ProteinEvolver2 is freely available from https://github.com/MiguelArenas/proteinevolver (Arenas, 2025). The SARS-CoV-2 data is available from GISAID database with https://doi.org/10.55876/gis8.250206gt. The real and predicted protein variants are available from Zenodo repository at the URL https://doi.org/10.5281/zenodo.15548146.

The following dataset was generated:

Ferreiro D, González-Vázquez LD, Prado-Comesaña A, Arenas M. 2025. Forecasting protein evolution by combining birth-death population models with structurally constrained substitution models. Zenodo.

The following previously published dataset was used:

GISAID 2025. SARS-CoV-2 protein sequences. EpiCoV.

References

  1. Abadi S, Azouri D, Pupko T, Mayrose I. Model selection may not be a mandatory step for phylogeny reconstruction. Nature Communications. 2019;10:934. doi: 10.1038/s41467-019-08822-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung C-C, O’Neill M, Reiman D, Tunyasuvunakool K, Wu Z, Žemgulytė A, Arvaniti E, Beattie C, Bertolli O, Bridgland A, Cherepanov A, Congreve M, Cowen-Rivers AI, Cowie A, Figurnov M, Fuchs FB, Gladman H, Jain R, Khan YA, Low CMR, Perlin K, Potapenko A, Savy P, Singh S, Stecula A, Thillaisundaram A, Tong C, Yakneen S, Zhong ED, Zielinski M, Žídek A, Bapst V, Kohli P, Jaderberg M, Hassabis D, Jumper JM. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493–500. doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Arbiza L, Patricio M, Dopazo H, Posada D. Genome-wide heterogeneity of nucleotide substitution model fit. Genome Biology and Evolution. 2011;3:896–908. doi: 10.1093/gbe/evr080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Arenas M, Posada D. The effect of recombination on the reconstruction of ancestral sequences. Genetics. 2010;184:1133–1139. doi: 10.1534/genetics.109.113423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Arenas M. Simulation of molecular data under diverse evolutionary scenarios. PLOS Computational Biology. 2012;8:e1002495. doi: 10.1371/journal.pcbi.1002495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Arenas M, Dos Santos HG, Posada D, Bastolla U. Protein evolution along phylogenetic histories under structurally constrained substitution models. Bioinformatics. 2013;29:3020–3028. doi: 10.1093/bioinformatics/btt530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Arenas M. Genetic consequences of antiviral therapy on HIV-1. Computational and Mathematical Methods in Medicine. 2015a;2015:395826. doi: 10.1155/2015/395826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Arenas M. Trends in substitution models of molecular evolution. Frontiers in Genetics. 2015b;6:319. doi: 10.3389/fgene.2015.00319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Arenas M, Sánchez-Cobos A, Bastolla U. Maximum-likelihood phylogenetic inference with selection on protein folding stability. Molecular Biology and Evolution. 2016a;32:2195–2207. doi: 10.1093/molbev/msv085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Arenas M, Lorenzo-Redondo R, Lopez-Galindez C. Influence of mutation and recombination on HIV-1 in vitro fitness recovery. Molecular Phylogenetics and Evolution. 2016b;94:264–270. doi: 10.1016/j.ympev.2015.09.001. [DOI] [PubMed] [Google Scholar]
  11. Arenas M, Weber CC, Liberles DA, Bastolla U. ProtASR: an evolutionary framework for ancestral protein reconstruction with selection on folding stability. Systematic Biology. 2017;66:1054–1064. doi: 10.1093/sysbio/syw121. [DOI] [PubMed] [Google Scholar]
  12. Arenas M, Bastolla U. ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability. Methods in Ecology and Evolution. 2020;11:248–257. doi: 10.1111/2041-210X.13341. [DOI] [Google Scholar]
  13. Arenas M. Proteinevolver. db71444GitHub. 2025 https://github.com/MiguelArenas/proteinevolver
  14. Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics. 2006;22:195–201. doi: 10.1093/bioinformatics/bti770. [DOI] [PubMed] [Google Scholar]
  15. Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, Tatusova T, Ostell J, Lipman D. The influenza virus resource at the National Center for Biotechnology Information. Journal of Virology. 2008;82:596–601. doi: 10.1128/JVI.02005-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Barton JP, Goonetilleke N, Butler TC, Walker BD, McMichael AJ, Chakraborty AK. Relative rate and location of intra-host HIV evolution to evade cellular immunity are predictable. Nature Communications. 2016;7:11660. doi: 10.1038/ncomms11660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Bastolla U, Demetrius L. Stability constraints and protein evolution: the role of chain length, composition and disulfide bonds. Protein Engineering, Design & Selection. 2005;18:405–415. doi: 10.1093/protein/gzi045. [DOI] [PubMed] [Google Scholar]
  18. Bastolla U, Porto M, Roman HE, Vendruscolo M. A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank. BMC Evolutionary Biology. 2006;6:43. doi: 10.1186/1471-2148-6-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Bastolla U, Porto M, Roman HE, Vendruscolo M. Structural Approaches to Sequence Evolution. Springer; 2007. [DOI] [Google Scholar]
  20. Bloom JD, Labthavikul ST, Otey CR, Arnold FH. Protein stability promotes evolvability. PNAS. 2006;103:5869–5874. doi: 10.1073/pnas.0510098103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Bloom JD, Neher RA. Fitness effects of mutations to SARS-CoV-2 proteins. Virus Evolution. 2023;9:vead055. doi: 10.1093/ve/vead055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Bordner AJ, Mittelmann HD. A new formulation of protein evolutionary models that account for structural constraints. Molecular Biology and Evolution. 2014;31:736–749. doi: 10.1093/molbev/mst240. [DOI] [PubMed] [Google Scholar]
  23. Bull JJ, Molineux IJ. Predicting evolution from genomics: experimental evolution of bacteriophage T7. Heredity. 2008;100:453–463. doi: 10.1038/sj.hdy.6801087. [DOI] [PubMed] [Google Scholar]
  24. Carneiro M, Hartl DL. Colloquium papers: Adaptive landscapes and protein evolution. PNAS. 2010;107 Suppl 1:1747–1751. doi: 10.1073/pnas.0906192106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Carvajal-Rodríguez A. Simulation of genes and genomes forward in time. Current Genomics. 2010;11:58–61. doi: 10.2174/138920210790218007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Colless DH, Wiley EO. Phylogenetics: the theory and practice of phylogenetic systematics. Systematic Zoology. 1982;31:13420. doi: 10.2307/2413420. [DOI] [Google Scholar]
  27. Dang CC, Le QS, Gascuel O, Le VS. FLU, an amino acid substitution model for influenza proteins. BMC Evolutionary Biology. 2010;10:99. doi: 10.1186/1471-2148-10-99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 2011;27:1164–1165. doi: 10.1093/bioinformatics/btr088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Del Amparo R, Arenas M. Consequences of substitution model selection on protein ancestral sequence reconstruction. Molecular Biology and Evolution. 2022a;39:msac144. doi: 10.1093/molbev/msac144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Del Amparo R, Arenas M. HIV protease and integrase empirical substitution models of evolution: protein-specific models outperform generalist models. Genes. 2022b;13:61. doi: 10.3390/genes13010061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Del Amparo R, Arenas M. Influence of substitution model selection on protein phylogenetic tree reconstruction. Gene. 2023;865:147336. doi: 10.1016/j.gene.2023.147336. [DOI] [PubMed] [Google Scholar]
  32. Del Amparo R, González-Vázquez LD, Rodríguez-Moure L, Bastolla U, Arenas M. Consequences of genetic recombination on protein folding stability. Journal of Molecular Evolution. 2023;91:33–45. doi: 10.1007/s00239-022-10080-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Desai MM, Fisher DS. Beneficial mutation selection balance and the effect of linkage on positive selection. Genetics. 2007;176:1759–1798. doi: 10.1534/genetics.106.067678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. de Visser J, Elena SF, Fragata I, Matuszewski S. The utility of fitness landscapes and big data for predicting evolution. Heredity. 2018;121:401–405. doi: 10.1038/s41437-018-0128-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Diaz-Uriarte R, Vasallo C. Every which way? On predicting tumor evolution using cancer progression models. PLOS Computational Biology. 2019;15:e1007246. doi: 10.1371/journal.pcbi.1007246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Dimmic MW, Rest JS, Mindell DP, Goldstein RA. rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. Journal of Molecular Evolution. 2002;55:65–73. doi: 10.1007/s00239-001-2304-y. [DOI] [PubMed] [Google Scholar]
  37. Eccleston RC, Manko E, Campino S, Clark TG, Furnham N. A computational method for predicting the most likely evolutionary trajectories in the stepwise accumulation of resistance mutations. eLife. 2023;12:e84756. doi: 10.7554/eLife.84756. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites. Nature Reviews. Genetics. 2016;17:109–121. doi: 10.1038/nrg.2015.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Echave J, Wilke CO. Biophysical models of protein evolution: understanding the patterns of evolutionary sequence divergence. Annual Review of Biophysics. 2017;46:85–103. doi: 10.1146/annurev-biophys-070816-033819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Ferreiro D, Khalil R, Gallego MJ, Osorio NS, Arenas M. The evolution of the HIV-1 protease folding stability. Virus Evolution. 2022;8:veac115. doi: 10.1093/ve/veac115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics. 2024a;40:btae096. doi: 10.1093/bioinformatics/btae096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Ferreiro D, Khalil R, Sousa SF, Arenas M. Substitution models of protein evolution with selection on enzymatic activity. Molecular Biology and Evolution. 2024b;41:msae026. doi: 10.1093/molbev/msae026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Fischer A, Vázquez-García I, Mustonen V. The value of monitoring to control evolving populations. PNAS. 2015;112:1007–1012. doi: 10.1073/pnas.1409403112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Fitch WM, Margoliash E. A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome c as a model case. Biochemical Genetics. 1967;1:65–71. doi: 10.1007/BF00487738. [DOI] [PubMed] [Google Scholar]
  45. Fornasari MS, Parisi G, Echave J. Site-specific amino acid replacement matrices from structurally constrained protein evolution simulations. Molecular Biology and Evolution. 2002;19:352–356. doi: 10.1093/oxfordjournals.molbev.a004089. [DOI] [PubMed] [Google Scholar]
  46. Gernhard T. The conditioned reconstructed process. Journal of Theoretical Biology. 2008;253:769–778. doi: 10.1016/j.jtbi.2008.04.005. [DOI] [PubMed] [Google Scholar]
  47. Gerrish PJ, Sniegowski PD. Real time forecasting of near-future evolution. Journal of the Royal Society, Interface. 2012;9:2268–2278. doi: 10.1098/rsif.2012.0119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Gilson AI, Marshall-Christensen A, Choi JM, Shakhnovich EI. The role of evolutionary selection in the dynamics of protein structure evolution. Biophysical Journal. 2017;112:1350–1365. doi: 10.1016/j.bpj.2017.02.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Goldstein RA. The evolution and evolutionary consequences of marginal thermostability in proteins. Proteins. 2011;79:1396–1407. doi: 10.1002/prot.22964. [DOI] [PubMed] [Google Scholar]
  50. Goldstein RA. Population size dependence of fitness effect distribution and substitution rate probed by biophysical model of protein thermostability. Genome Biology and Evolution. 2013;5:1584–1593. doi: 10.1093/gbe/evt110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Gong LI, Suchard MA, Bloom JD. Stability-mediated epistasis constrains the evolution of an influenza protein. eLife. 2013;2:e00631. doi: 10.7554/eLife.00631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Goyal S, Balick DJ, Jerison ER, Neher RA, Shraiman BI, Desai MM. Dynamic mutation-selection balance as an evolutionary attractor. Genetics. 2012;191:1309–1319. doi: 10.1534/genetics.112.141291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Grantham R. Amino acid difference formula to help explain protein evolution. Science. 1974;185:862–864. doi: 10.1126/science.185.4154.862. [DOI] [PubMed] [Google Scholar]
  54. Harmon LJ. In: Phylogenetic Comparative Methods. Harmon LJ, editor. Princeton University Press; 2019. Introduction to birth-death models. [Google Scholar]
  55. Hoban S, Bertorelle G, Gaggiotti OE. Computer simulations: tools for population and evolutionary genetics. Nature Reviews. Genetics. 2012;13:110–122. doi: 10.1038/nrg3130. [DOI] [PubMed] [Google Scholar]
  56. Hudson RR. Properties of a neutral allele model with intragenic recombination. Theoretical Population Biology. 1983;23:183–201. doi: 10.1016/0040-5809(83)90013-8. [DOI] [PubMed] [Google Scholar]
  57. Hudson RR. Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology. 1990;7:1–44. [Google Scholar]
  58. Hudson RR. Island models and the coalescent process. Molecular Ecology. 1998;7:413–418. doi: 10.1046/j.1365-294x.1998.00344.x. [DOI] [Google Scholar]
  59. Illergård K, Ardell DH, Elofsson A. Structure is three to ten times more conserved than sequence--a study of structural response in protein cores. Proteins. 2009;77:499–508. doi: 10.1002/prot.22458. [DOI] [PubMed] [Google Scholar]
  60. Jacquier H, Birgy A, Le Nagard H, Mechulam Y, Schmitt E, Glodt J, Bercot B, Petit E, Poulain J, Barnaud G, Gros PA, Tenaillon O. Capturing the mutational landscape of the beta-lactamase TEM-1. PNAS. 2013;110:13067–13072. doi: 10.1073/pnas.1215206110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Kimura M, Weiss GH. The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics. 1964;49:561–576. doi: 10.1093/genetics/49.4.561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Kingman JFC. The coalescent. Stochastic Processes and Their Applications. 1982;13:235–248. doi: 10.1016/0304-4149(82)90011-4. [DOI] [Google Scholar]
  64. Kullback S, Leibler RA. On information and sufficiency. The Annals of Mathematical Statistics. 1951;22:79–86. doi: 10.1214/aoms/1177729694. [DOI] [Google Scholar]
  65. Lässig M, Łuksza M. Can we read the future from a tree? eLife. 2014;3:e05060. doi: 10.7554/eLife.05060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Lässig M, Mustonen V, Walczak AM. Predicting evolution. Nature Ecology & Evolution. 2017;1:0077. doi: 10.1038/s41559-017-0077. [DOI] [PubMed] [Google Scholar]
  67. Le TK, Vinh LS. FLAVI: An amino acid substitution model for flaviviruses. Journal of Molecular Evolution. 2020;88:445–452. doi: 10.1007/s00239-020-09943-3. [DOI] [PubMed] [Google Scholar]
  68. Lemant J, Le Sueur C, Manojlović V, Noble R. Robust, universal tree balance indices. Systematic Biology. 2022;71:1210–1224. doi: 10.1093/sysbio/syac027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Lemmon AR, Moriarty EC. The importance of proper model assumption in bayesian Phylogenetics. Systematic Biology. 2004;53:265–277. doi: 10.1080/10635150490423520. [DOI] [PubMed] [Google Scholar]
  70. Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning APJ, Dokholyan NV, Echave J, Elofsson A, Gerloff DL, Goldstein RA, Grahnen JA, Holder MT, Lakner C, Lartillot N, Lovell SC, Naylor G, Perica T, Pollock DD, Pupko T, Regan L, Roger A, Rubinstein N, Shakhnovich E, Sjölander K, Sunyaev S, Teufel AI, Thorne JL, Thornton JW, Weinreich DM, Whelan S. The interface of protein structure, protein biophysics, and molecular evolution. Protein Science. 2012;21:769–785. doi: 10.1002/pro.2071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Lind PA, Libby E, Herzog J, Rainey PB. Predicting mutational routes to new adaptive phenotypes. eLife. 2019;8:e38822. doi: 10.7554/eLife.38822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Lobkovsky AE, Wolf YI, Koonin EV. Universal distribution of protein evolution rates as a consequence of protein folding physics. PNAS. 2010;107:2983–2988. doi: 10.1073/pnas.0910445107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Luksza M, Lässig M. A predictive fitness model for influenza. Nature. 2014;507:57–61. doi: 10.1038/nature13087. [DOI] [PubMed] [Google Scholar]
  74. Malcolm BA, Wilson KP, Matthews BW, Kirsch JF, Wilson AC. Ancestral lysozymes reconstructed, neutrality tested, and thermostability linked to hydrocarbon packing. Nature. 1990;345:86–89. doi: 10.1038/345086a0. [DOI] [PubMed] [Google Scholar]
  75. Mendez R, Fritsche M, Porto M, Bastolla U. Mutation bias favors protein folding stability in the evolution of small populations. PLOS Computational Biology. 2010;6:e1000767. doi: 10.1371/journal.pcbi.1000767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Minning J, Porto M, Bastolla U. Detecting selection for negative design in proteins through an improved model of the misfolded state. Proteins. 2013;81:1102–1112. doi: 10.1002/prot.24244. [DOI] [PubMed] [Google Scholar]
  77. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. PNAS. 2011;108:E1293–E1301. doi: 10.1073/pnas.1111471108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Moreira F, Arenas M, Videira A, Pereira F. Evolution of TOP1 and TOP1MT topoisomerases in chordata. Journal of Molecular Evolution. 2023;91:192–203. doi: 10.1007/s00239-022-10091-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Morris DH, Gostic KM, Pompei S, Bedford T, Łuksza M, Neher RA, Grenfell BT, Lässig M, McCauley JW. Predictive modeling of influenza shows the promise of applied evolutionary biology. Trends in Microbiology. 2018;26:102–118. doi: 10.1016/j.tim.2017.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Munck C, Gumpert HK, Wallin AIN, Wang HH, Sommer MOA. Prediction of resistance development against drug combinations by collateral responses to component drugs. Science Translational Medicine. 2014;6:262ra156. doi: 10.1126/scitranslmed.3009940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Navascués M, Depaulis F, Emerson BC. Combining contemporary and ancient DNA in population genetic and phylogeographical studies. Molecular Ecology Resources. 2010;10:760–772. doi: 10.1111/j.1755-0998.2010.02895.x. [DOI] [PubMed] [Google Scholar]
  82. Neher RA, Hallatschek O. Genealogies of rapidly adapting populations. PNAS. 2013;110:437–442. doi: 10.1073/pnas.1213113110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Neher RA, Russell CA, Shraiman BI. Predicting evolution from the shape of genealogical trees. eLife. 2014;3:e03568. doi: 10.7554/eLife.03568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Nickle DC, Heath L, Jensen MA, Gilbert PB, Mullins JI, Kosakovsky Pond SL. HIV-specific probabilistic models of protein evolution. PLOS ONE. 2007;2:e503. doi: 10.1371/journal.pone.0000503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Papkou A, Garcia-Pastor L, Escudero JA, Wagner A. A rugged yet easily navigable fitness landscape. Science. 2023;382:eadh3860. doi: 10.1126/science.adh3860. [DOI] [PubMed] [Google Scholar]
  86. Parisi G, Echave J. Structural constraints and emergence of sequence patterns in protein evolution. Molecular Biology and Evolution. 2001;18:750–756. doi: 10.1093/oxfordjournals.molbev.a003857. [DOI] [PubMed] [Google Scholar]
  87. Parisi G, Echave J. Generality of the structurally constrained protein evolution model: assessment on representatives of the four main fold classes. Gene. 2005;345:45–53. doi: 10.1016/j.gene.2004.11.025. [DOI] [PubMed] [Google Scholar]
  88. Pascual-García A, Abia D, Méndez R, Nido GS, Bastolla U. Quantifying the evolutionary divergence of protein structures: the role of function change and function conservation. Proteins. 2010;78:181–196. doi: 10.1002/prot.22616. [DOI] [PubMed] [Google Scholar]
  89. Pascual-García A, Arenas M, Bastolla U. The molecular clock in the evolution of protein structures. Systematic Biology. 2019;68:987–1002. doi: 10.1093/sysbio/syz022. [DOI] [PubMed] [Google Scholar]
  90. Poon AFY, Kosakovsky Pond SL, Richman DD, Frost SDW. Mapping protease inhibitor resistance to human immunodeficiency virus type 1 sequence polymorphisms within patients. Journal of Virology. 2007;81:13598–13607. doi: 10.1128/JVI.01570-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Rodrigue N, Lartillot N, Bryant D, Philippe H. Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene. 2005;347:207–217. doi: 10.1016/j.gene.2004.12.011. [DOI] [PubMed] [Google Scholar]
  92. Rodrigues JV, Bershtein S, Li A, Lozovsky ER, Hartl DL, Shakhnovich EI. Biophysical principles predict fitness landscapes of drug resistance. PNAS. 2016;113:E1470–E1478. doi: 10.1073/pnas.1601441113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Rubin IN, Ispolatov Y, Doebeli M. Adaptive diversification and niche packing on rugged fitness landscapes. Journal of Theoretical Biology. 2023;562:111421. doi: 10.1016/j.jtbi.2023.111421. [DOI] [PubMed] [Google Scholar]
  94. Ruiz-González MX, Fares MA. Coevolution analyses illuminate the dependencies between amino acid sites in the chaperonin system GroES-L. BMC Evolutionary Biology. 2013;13:156. doi: 10.1186/1471-2148-13-156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology. 1993;234:779–815. doi: 10.1006/jmbi.1993.1626. [DOI] [PubMed] [Google Scholar]
  96. Santos-Pereira A, Triunfante V, Araújo PMM, Martins J, Soares H, Poveda E, Souto B, Osório NS. Nationwide study of drug resistance mutations in HIV-1 infected individuals under antiretroviral therapy in Brazil. International Journal of Molecular Sciences. 2021;22:5304. doi: 10.3390/ijms22105304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Scheiblhofer S, Laimer J, Machado Y, Weiss R, Thalhamer J. Influence of protein fold stability on immunogenicity and its implications for vaccine design. Expert Review of Vaccines. 2017;16:479–489. doi: 10.1080/14760584.2017.1306441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Sella G, Hirsh AE. The application of statistical physics to evolutionary biology. PNAS. 2005;102:9541–9546. doi: 10.1073/pnas.0501865102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Souto B, Triunfante V, Santos-Pereira A, Martins J, Araújo PMM, Osório NS. Evolutionary dynamics of HIV-1 subtype C in Brazil. Scientific Reports. 2021;11:23060. doi: 10.1038/s41598-021-02428-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Spielman SJ. Relative model fit does not predict topological accuracy in single-gene protein Phylogenetics. Molecular Biology and Evolution. 2020;37:2110–2123. doi: 10.1093/molbev/msaa075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Stackhouse J, Presnell SR, McGeehan GM, Nambiar KP, Benner SA. The ribonuclease from an extinct bovid ruminant. FEBS Letters. 1990;262:104–106. doi: 10.1016/0014-5793(90)80164-e. [DOI] [PubMed] [Google Scholar]
  102. Stadler T. Sampling-through-time in birth-death trees. Journal of Theoretical Biology. 2010;267:396–404. doi: 10.1016/j.jtbi.2010.09.010. [DOI] [PubMed] [Google Scholar]
  103. Stadler T. Simulating trees with a fixed number of extant species. Systematic Biology. 2011;60:676–684. doi: 10.1093/sysbio/syr029. [DOI] [PubMed] [Google Scholar]
  104. Thadani NN, Gurev S, Notin P, Youssef N, Rollins NJ, Ritter D, Sander C, Gal Y, Marks DS. Learning from prepandemic data to forecast viral escape. Nature. 2023;622:818–825. doi: 10.1038/s41586-023-06617-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Thornton JW, Need E, Crews D. Resurrecting the ancestral steroid receptor: ancient origin of estrogen signaling. Science. 2003;301:1714–1717. doi: 10.1126/science.1086185. [DOI] [PubMed] [Google Scholar]
  106. Ugalde JA, Chang BSW, Matz MV. Evolution of coral pigments recreated. Science. 2004;305:1433. doi: 10.1126/science.1099597. [DOI] [PubMed] [Google Scholar]
  107. Van Cleve J, Weissman DB. Measuring ruggedness in fitness landscapes. PNAS. 2015;112:7345–7346. doi: 10.1073/pnas.1507916112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Watabe T, Kishino H. Structural considerations in the fitness landscape of a virus. Molecular Biology and Evolution. 2010;27:1782–1791. doi: 10.1093/molbev/msq056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Wilke CO. Bringing molecules back into molecular evolution. PLOS Computational Biology. 2012;8:e1002572. doi: 10.1371/journal.pcbi.1002572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Wiuf C, Posada D. A coalescent model of recombination hotspots. Genetics. 2003;164:407–417. doi: 10.1093/genetics/164.1.407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Wortel MT, Agashe D, Bailey SF, Bank C, Bisschop K, Blankers T, Cairns J, Colizzi ES, Cusseddu D, Desai MM, van Dijk B, Egas M, Ellers J, Groot AT, Heckel DG, Johnson ML, Kraaijeveld K, Krug J, Laan L, Lässig M, Lind PA, Meijer J, Noble LM, Okasha S, Rainey PB, Rozen DE, Shitut S, Tans SJ, Tenaillon O, Teotónio H, de Visser JAGM, Visser ME, Vroomans RMA, Werner GDA, Wertheim B, Pennings PS. Towards evolutionary predictions: Current promises and challenges. Evolutionary Applications. 2023;16:3–21. doi: 10.1111/eva.13513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Wright S. Evolution in mendelian populations. Genetics. 1931;16:97–159. doi: 10.1093/genetics/16.2.97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Wylie CS, Shakhnovich EI. A biophysical protein folding model accounts for most mutational fitness effects in viruses. PNAS. 2011;108:9916–9921. doi: 10.1073/pnas.1017572108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Yang Z, Goldman N, Friday A. Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. Molecular Biology and Evolution. 1994;11:316–324. doi: 10.1093/oxfordjournals.molbev.a040112. [DOI] [PubMed] [Google Scholar]
  115. Yang Z. Among-site rate variation and its impact on phylogenetic analyses. Trends in Ecology & Evolution. 1996;11:367–372. doi: 10.1016/0169-5347(96)10041-0. [DOI] [PubMed] [Google Scholar]
  116. Yang Z. Computational Molecular Evolution. Oxford, England: Oxford University Press; 2006. [DOI] [Google Scholar]
  117. Yoshida K, Hata K, Kawakami K, Hiradate S, Osawa T, Kachi N. Predicting ecosystem changes by a new model of ecosystem evolution. Scientific Reports. 2023;13:15353. doi: 10.1038/s41598-023-42529-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. Zeldovich KB, Chen P, Shakhnovich EI. Protein stability imposes limits on organism complexity and speed of molecular evolution. PNAS. 2007;104:16152–16157. doi: 10.1073/pnas.0705366104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Zhang J, Nei M. Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods. Journal of Molecular Evolution. 1997;44 Suppl 1:S139–S146. doi: 10.1007/pl00000067. [DOI] [PubMed] [Google Scholar]

eLife Assessment

Anne-Florence Bitbol 1

This manuscript introduces a useful protein-stability-based fitness model for simulating protein evolution and unifying non-neutral models of molecular evolution with phylogenetic models. The model is applied to five viral proteins that are of structural and functional importance. While the general modelling approach is solid, and effectively preserves folding stability, the evidence for the model's predictive power remains limited, since it shows little improvement over neutral models in predicting protein evolution. The work should be of interest to researchers developing theoretical models of molecular evolution.

Reviewer #1 (Public review):

Anonymous

Summary:

Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is guided by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which has struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death processes. Unfortunately, though, the model shows little improvement over neutral models in predicting protein sequence evolution, although it can predict protein stability better than models assuming neutral evolution. It appears that more work is needed to determine exactly what aspects of protein sequence evolution are predictable under such non-neutral phylogenetic models.

Major concerns:

(1) The authors have clarified the mapping between birth-death model parameters and fitness, but how fitness is modeled still appears somewhat problematic. The authors assume the death rate = 1 - birth rate. So a variant with a birth rate b = 1 would have a death rate d = 0 and so would be immortal and never die, which does not seem plausible. Also I'm not sure that this would "allow a constant global (birth-death) rate" as stated in line 172, as selection would still act to increase the population mean growth rate r = b - d. It seems more reasonable to assume that protein stability affects only either the birth or death rate and assume the other rate is constant, as in the Neher 2014 model.

(2) It is difficult to evaluate the predictive performance of protein sequence evolution. This is in part due to the fact that performance is compared in terms of percent divergence, which is difficult to compare across viral proteins and datasets. Some protein sequences would be expected to diverge more because they are evolving over longer time scales, under higher substitution rates or under weaker purifying selection. It might therefore help to normalize the divergence between predicted and observed sequences by the expected or empirically observed amount of divergence seen over the timescale of prediction.

(3) Predictability may also vary significantly across different sites in a protein. For example, mutations at many sites may have little impact on structural stability (in which case we would expect poor predictive performance) while even conservative changes at other sites may disrupt folding. I therefore feel that there remains much work to be done here in terms of figuring out where and when sequence evolution might be predictable under these types of models, and when sequence evolution might just be fundamentally unpredictable due to the high entropy of sequence space.

Reviewer #2 (Public review):

Anonymous

In this study, the authors aim to forecast the evolution of viral proteins by simulating sequence changes under a constraint of folding stability. The central idea is that proteins must retain a certain level of structural stability (quantified by folding free energy, ΔG) to remain functional, and that this constraint can shape and restrict the space of viable evolutionary trajectories. The authors integrate a birth-death population model with a structurally constrained substitution (SCS) model and apply this simulation framework to several viral proteins from HIV-1, SARS-CoV-2, and influenza.

The motivation to incorporate biophysical constraints into evolutionary models is scientifically sound, and the general approach aligns with a growing interest in bridging molecular evolution and structural biology. The authors focus on proteins where immune pressure is limited and stability is likely to be a dominant constraint, which is conceptually appropriate. The method generates sequence variants that preserve folding stability, suggesting that stability-based filtering may capture certain evolutionary patterns.

However, the study does not substantiate its central claim of forecasting. The model does not predict future sequences with measurable accuracy, nor does it reproduce observed evolutionary paths. Validation is limited to endpoint comparisons in a few datasets. While KL divergence is used to compare amino acid distributions, this analysis is only applied to a single protein (HIV-1 MA), and there is no assessment of mutation-level predictive accuracy or quantification of how well simulated sequences recapitulate real evolutionary paths. No comparison is made to real intermediate variants available from extensive viral sequencing datasets which gather thousands of sequences with detailed collection date annotation (SARS-CoV-2, Influenza, RSV).

The selection of proteins is narrow and the rationale for including or excluding specific proteins is not clearly justified.

The analyzed datasets are also under-characterized: we are not given insight into how variable the sequences are or how surprising the simulated sequences might be relative to natural diversity. Furthermore, the use of consensus sequences to represent timepoints is problematic, particularly in the context of viral evolution, where divergent subclades often coexist - a consensus sequence may not accurately reflect the underlying population structure.

The fitness function used in the main simulations is based on absolute ΔG and rewards increased stability without testing whether real evolutionary trajectories tend to maintain, increase, or reduce folding stability over time for the particular systems (proteins) that are studied. While a variant of the model does attempt to center selection around empirical ΔG values, this more biologically plausible version is underutilized and not well validated.

Ultimately, the model constrains sequence evolution to stability-compatible trajectories but does not forecast which of these trajectories are likely to occur. It is better understood as a filter of biophysically plausible outcomes than as a predictive tool. The distinction between constraint-based plausibility and sequence-level forecasting should be made clearer. Despite these limitations, the work may be of interest to researchers developing simulation frameworks or exploring the role of protein stability in viral evolution, and it raises interesting questions about how biophysical constraints shape sequence space over time.

eLife. 2025 Sep 24;14:RP106365. doi: 10.7554/eLife.106365.3.sa3

Author response

David Ferreiro 1, Luis Daniel González-Vázquez 2, Ana Prado-Comesaña 3, Miguel Arenas 4

The following is the authors’ response to the current reviews.

Reviewer #1 (Public review):

Summary:

Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is guided by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which has struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death processes. Unfortunately, though, the model shows little improvement over neutral models in predicting protein sequence evolution, although it can predict protein stability better than models assuming neutral evolution. It appears that more work is needed to determine exactly what aspects of protein sequence evolution are predictable under such non-neutral phylogenetic models.

We thank the reviewer for the positive comments about our work. We agree that further work is needed in the field of substitution models of molecular evolution to enable more accurate predictions of specific amino acid sequences in evolutionary processes.

Major concerns:

(1) The authors have clarified the mapping between birth-death model parameters and fitness, but how fitness is modeled still appears somewhat problematic. The authors assume the death rate = 1 - birth rate. So a variant with a birth rate b = 1 would have a death rate d = 0 and so would be immortal and never die, which does not seem plausible. Also I'm not sure that this would "allow a constant global (birth-death) rate" as stated in line 172, as selection would still act to increase the population mean growth rate r = b - d. It seems more reasonable to assume that protein stability affects only either the birth or death rate and assume the other rate is constant, as in the Neher 2014 model.

The model proposed by Neher, et al. (2014), which incorporates a death rate (d) higher than 0 for any variant, was implemented and applied in the present method. In general, this model did not yield results different from those obtained using the model that assumes d = 1 – b, suggesting that this aspect may not be crucial for the study system. Next, the imposition of arbitrary death events based on an arbitrary death rate could be a point of concern. Regarding the original model, a variant with d = 0 can experience a decrease in fitness through the mutation process. In an evolutionary process, each variant is subject to mutation, and Markov models allow for the incorporation of mutations that decrease fitness (albeit with lower probability than beneficial ones, but they can still occur). All this information is included in the manuscript.

(2) It is difficult to evaluate the predictive performance of protein sequence evolution. This is in part due to the fact that performance is compared in terms of percent divergence, which is difficult to compare across viral proteins and datasets. Some protein sequences would be expected to diverge more because they are evolving over longer time scales, under higher substitution rates or under weaker purifying selection. It might therefore help to normalize the divergence between predicted and observed sequences by the expected or empirically observed amount of divergence seen over the timescale of prediction.

AU: The study protein datasets showed different levels of sequence divergence over their evolutionary times, as indicated for each dataset in the manuscript. For some metrics, we evaluated the accuracy (or error) of the predictions through direct comparisons between real and predicted protein variants using percentages to facilitate interpretation: 0% indicates a perfect prediction (no error), while 100% indicates a completely incorrect prediction (total error). Regarding normalization of these evaluations, we respectfully disagree with the suggestion because diverse factors can affect (not only the substitution rate, but also the sample size, structural features of the protein that may affect stability when accommodating different sequences, among others) and this complicates defining a consistent and meaningful normalization criterion. Given that the manuscript provides detailed information for each dataset, we believe that the presentation of the prediction accuracy through direct comparisons between real and predicted protein variants, expressed as percentages of similarity, is the clearest way.

(3) Predictability may also vary significantly across different sites in a protein. For example, mutations at many sites may have little impact on structural stability (in which case we would expect poor predictive performance) while even conservative changes at other sites may disrupt folding. I therefore feel that there remains much work to be done here in terms of figuring out where and when sequence evolution might be predictable under these types of models, and when sequence evolution might just be fundamentally unpredictable due to the high entropy of sequence space.

We agree with this reflection. Mutations can have different effects on folding stability, which are accounted for by the model presented in this study. However, accurately predicting the exact sequences of protein variants with similar stability remains difficult with current structurally constrained substitution models, and therefore, further work is needed in this regard. This aspect is indicated in the manuscript.

We want to thank the reviewer again for taking the time to revise our work and for the insightful and helpful comments.

Reviewer #2 (Public review):

In this study, the authors aim to forecast the evolution of viral proteins by simulating sequence changes under a constraint of folding stability. The central idea is that proteins must retain a certain level of structural stability (quantified by folding free energy, ΔG) to remain functional, and that this constraint can shape and restrict the space of viable evolutionary trajectories. The authors integrate a birth-death population model with a structurally constrained substitution (SCS) model and apply this simulation framework to several viral proteins from HIV-1, SARS-CoV-2, and influenza.

The motivation to incorporate biophysical constraints into evolutionary models is scientifically sound, and the general approach aligns with a growing interest in bridging molecular evolution and structural biology. The authors focus on proteins where immune pressure is limited and stability is likely to be a dominant constraint, which is conceptually appropriate. The method generates sequence variants that preserve folding stability, suggesting that stability-based filtering may capture certain evolutionary patterns.

Correct. We thank the reviewer for the positive comments about our study.

However, the study does not substantiate its central claim of forecasting. The model does not predict future sequences with measurable accuracy, nor does it reproduce observed evolutionary paths. Validation is limited to endpoint comparisons in a few datasets. While KL divergence is used to compare amino acid distributions, this analysis is only applied to a single protein (HIV-1 MA), and there is no assessment of mutation-level predictive accuracy or quantification of how well simulated sequences recapitulate real evolutionary paths. No comparison is made to real intermediate variants available from extensive viral sequencing datasets which gather thousands of sequences with detailed collection date annotation (SARS-CoV-2, Influenza, RSV).

There are several points in this comment.

The presented method accurately predicts folding stability of forecasted variants, as shown through comparisons between real and predicted protein variants. However, as the reviewer correctly indicates, predicting the exact amino acid sequences remains challenging. This limitation is discussed in detail in the manuscript, where we also suggest that further improvements in substitution models of protein evolution are needed to better capture the evolutionary signatures of amino acid change at the sequence level, even between amino acids with similar physicochemical properties. Regarding the time points used for validation, the studied influenza NS1 dataset included two validation points. A key limitation in increasing the number of time points is the scarcity of datasets derived from monitoring protein evolution with sufficient molecular diversity between samples collected at consecutive time points (i.e., at least more than five polymorphic amino acid sites).

As described in the manuscript, calculating Kullback-Leibler (KL) divergence requires more than one sequence per studied time point. However, most datasets in the literature include only a single sequence per time point, typically a consensus sequence derived from bulk population sequencing. Generating multiple sequences per time point is experimentally more demanding, often requiring advanced methods such as single-virus sequencing or amplification of sublineages in viral subpopulations, as was done for the first dataset used in the study (Arenas, et al. 2016), which enabled the calculation of KL divergence. The extent to which the simulated sequences resemble real evolution is evaluated in the method validation. As noted, intermediate time point validation was performed using the influenza NS1 protein dataset. Although, as the reviewer indicates, thousands of viral sequences are available, these are usually consensus sequences from bulk sequencing. Indeed, many viral variants mainly differ through synonymous mutations, where the number of accumulated nonsynonymous mutations is small. For example, from the original Wuhan strain to the Omicron variant, the SARS-CoV-2 proteins Mpro and PLpro accumulated only 10 and 22 amino acid changes, respectively.

Analyzing intermediate variants of concern (i.e., Gamma or Delta) would reduce this number affecting statistics. In addition, many available viral sequences are not consecutive in evolutionary terms (one dataset does not represent the direct origin of another dataset at a subsequent time point), which further limits their applicability in this study. There is little data from monitored protein evolution with consecutive samples. The most suitable studies usually involve in vitro virus evolution, but the data from these studies often show low genetic variability between samples collected at different time points. Finally, it is important to note that the presented method can only be applied to proteins with known 3D structures, as it relies on selection based on folding stability. Non-structural proteins cannot be analyzed using this approach. Future work could incorporate additional selection constraints, which may improve the accuracy of predictions. These considerations and limitations are indicated in the manuscript.

The selection of proteins is narrow and the rationale for including or excluding specific proteins is not clearly justified.

The viral proteins included in the study were selected based on two main criteria, general interest and data availability. In particular, we included proteins from viruses that affect humans and for which data from monitored protein evolution, with sufficient molecular diversity between consecutive time points, is available. These aspects are indicated in the manuscript.

The analyzed datasets are also under-characterized: we are not given insight into how variable the sequences are or how surprising the simulated sequences might be relative to natural diversity. Furthermore, the use of consensus sequences to represent timepoints is problematic, particularly in the context of viral evolution, where divergent subclades often coexist - a consensus sequence may not accurately reflect the underlying population structure.

The manuscript indicates the sequence identity among protein datasets of different time points, along with other technical details. Next, the evaluation based on comparisons between simulated and real sequences reflects how surprising the simulated sequences might be relative to natural diversity, considering that the real dataset is representative. We believe that the diverse study real datasets are useful to evaluate the accuracy of the method in predicting different molecular patterns. Regarding the use of consensus sequences, we agree that they provide an approximation. However, as previously indicated, most of the available data from monitored protein evolution consist of consensus sequences obtained through bulk sequencing. Additionally, analyzing every individual viral sequence within a viral population, which is typically large, would be ideal but computationally intractable.

The fitness function used in the main simulations is based on absolute ΔG and rewards increased stability without testing whether real evolutionary trajectories tend to maintain, increase, or reduce folding stability over time for the particular systems (proteins) that are studied. While a variant of the model does attempt to center selection around empirical ΔG values, this more biologically plausible version is underutilized and not well validated.

The applied fitness function, based on absolute ΔG, is well stablished in the field (Sella and Hirsh 2005; Goldstein 2013). The present study independently predicts ΔG for the real and simulated protein variants at each sampling point. This ΔG prediction accounts not only for negative design, informed by empirical data, but also for positive design based on the study data (Arenas, et al. 2013; Minning, et al. 2013), thereby enabling the detection of variation in folding stability among protein variants. These aspects are indicated in the manuscript. Therefore, in our view, the study provides a proper comparison of real and predicted evolutionary trajectories in terms of folding stability.

Ultimately, the model constrains sequence evolution to stability-compatible trajectories but does not forecast which of these trajectories are likely to occur. It is better understood as a filter of biophysically plausible outcomes than as a predictive tool. The distinction between constraint-based plausibility and sequence-level forecasting should be made clearer. Despite these limitations, the work may be of interest to researchers developing simulation frameworks or exploring the role of protein stability in viral evolution, and it raises interesting questions about how biophysical constraints shape sequence space over time.

The presented method estimates the fitness of each protein variant, which can reflect the relative survival capacity of the variant. Therefore, despite the error due to evolutionary constraints not considered by the method, it indicates which variants are more likely to become fixed over time. In our view, the method does not merely filter plausible variants, rather, it generates predictions of variant survival through predicted fitness based on folding stability and simulations of protein evolution under structurally constrained substitution models integrated with birth-death population genetics approaches. The use of simulation-based approaches for prediction is well established in population genetics. For example, approaches such as approximate Bayesian computation (Beaumont, et al. 2002) rely on this strategy, and it has also been applied in other studies of forecasting evolution (e.g., Neher, et al. 2014). We believe that the distinction between forecasting folding stability and amino acid sequence is clearly shown in the manuscript, including the main text and the figures.

Reviewer #2 (Recommendations for the authors):

I thank the authors for addressing the question about template switching, their clarification was helpful. However, the core concerns I raised remain unresolved: the claim that the method is useful for forecasting is not substantiated. In order to support the paper's central claims or to prove its usefulness, several key improvements could be incorporated:

(1) Systematic analysis of more proteins:

The manuscript would be significantly strengthened by a systematic evaluation of model performance across a broader set of viral proteins, beyond the examples currently shown. Many human influenza and SARS-CoV-2 proteins have wellcharacterized structures or high-quality homology templates, making them suitable candidates. In the light of limited success of the method, presenting the model's behavior across a more comprehensive protein set, including those with varying structural constraints and immune pressures, would help assess generalizability and clarify the specific conditions under which the model is applicable.

Following a comment from the reviewer in a previous revision of the study, we included the analysis of an influenza NS1 protein dataset that contains two evaluation time points. Next, to validate the prediction method, it is necessary to have monitored protein sequences collected at least at two consecutive time points, with sufficient divergence between them to capture evolutionary signatures that allow for proper evaluation. Additionally, many data involve sequences that are not consecutive in evolutionary terms (one dataset is not a direct ancestor of another dataset existing at a posterior time point), which disallows their applicability in this study. Little data from monitored protein evolution with trustable consecutive (ancestor-descendant) samples exist. The most suitable studies often involve in vitro virus evolution, but they usually show low genetic variability between samples collected at different time points. Although thousands of sequences are available for some viruses, they are usually consensus sequences from bulk sequencing and often show a low number of nonsynonymous mutations at the study protein-coding gene between time points. For example, from the original Wuhan strain and the Omicron variant, the SARS-CoV-2 proteins Mpro and PLpro accumulated only 10 and 22 amino acid changes, respectively. Analyzing intermediate variants of concern (i.e., Gamma or Delta) would reduce this number affecting statistics. Thus, in practice, we found scarcity of data derived from monitoring protein evolution, with trustable ancestor and corresponding descendant data at consecutive time points and with sufficient molecular diversity between them (i.e., at least more than five polymorphic amino acid sites). In all, we believe that the diverse viral protein datasets used in the present study, along with the multiple analyzed datasets collected from monitored HIV-1 populations present in different patients, provide a representative application of the method, since notice that similar patterns were generally generated from the analysis of the different datasets.

(2) Present clear data statistics: For each analyzed dataset, the authors should provide basic information about the number of unique sequences, levels of variability, and evolutionary divergence between start and end sequences. This would contextualize the forecasting task and clarify whether the simulations are non-trivial. In particular, it should be shown that the consensus sequence is indeed representative of the viral population at a given time point. In viral evolution we frequently observe co-circulation of subclades and the consensus sequence is then not representative.

For each dataset analyzed, the manuscript provides the sequence identity between samples at the study time points (which also informs about sequence variability), sample sizes, representative protein structure, and other technical details. The study assumes that consensus sequences, typically generated by bulk sequencing, are representative of the viral population. Next, samples at different time points should involve ancestor-descendant relationships, which is a requirement and one of the limitations to find appropriate data for this study, as noted in our previous response.

(3) Explore other metrics for population level sequence comparison:

In the light of possible existence of subclades, mentioned above, the currently used metrics for sequence comparison may underestimate performance of the simulations. It would be sufficient to see some overlap of simulated clades and the observed clades.

We found this to be a good idea. However, in practice, we believe that the criteria used to define subclades could introduce biases into the results. For some metrics, we evaluated the accuracy of the predictions through direct comparisons between all real and predicted protein variants, using percentages to facilitate interpretation. We believe that using subclades could potentially reduce the current prediction errors, but this would complicate the interpretation of the results, as they would be influenced by the subjective criteria used to define the subclades.

Currently, the manuscript presents a plausible filtering framework rather than a predictive model. Without these additional analyses, the main claims remain only partially supported.

Please see our reply to the comment of the reviewer just before the section titled “Recommendations for the authors”.

Response to some rebuttal statements:

(1) "Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016)"

The available Influenza and SARS-CoV-2 data gathers isolates annotated with exact collection dates, providing reach datasets for such analysis.

The available influenza and SARS-CoV-2 sequences are typically derived from bulk sequencing and, therefore, they are consensus sequences. As a result, they cannot be used to calculate KL divergence. Additionally, many of the indicated sequences from databases are not demonstrated to be consecutive in evolutionary terms (one dataset is not a direct ancestor of another dataset existing at a posterior time point), which disallows their applicability in this study. The most suitable studies often involve in vitro virus evolution, but they usually show low genetic variability between samples collected at different time points.

(2) "Regarding extending the analysis to other time points (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is required to properly evaluate the prediction method."

There have been many more variants of concern subsequent to Omicron which circulated in 2021.

A key aspect is the accumulation of diversity in the study proteins across different time points. The SARS-CoV-2 proteins Mpro and PLpro accumulated only 10 and 22 amino acid changes from the original Wuhan variant to Omicron, respectively.

Analyzing intermediate variants of concern (e.g., Gamma or Delta) or those closely related to Omicron would reduce the number of accumulated mutations even further.

We want to thank the reviewer again for taking the time to revise our work and for the insightful and helpful comments.

The following is the authors’ response to the original reviews.

Reviewer #1 (Public review):

Summary:

Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is constrained by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral structural proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which have struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death. Unfortunately, though, the model shows little improvement over neutral models in predicting protein evolution, and this ultimately appears to be due to fundamental conceptual problems with how fitness is modeled and linked to the phylodynamic birth-death model.

AU: We thank the reviewer for the positive comments about our work.

Regarding predictive power, the study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. Next, predicting the exact sequences was more challenging. In this revised version, where we added additional real data, we found that the accuracy of this prediction can vary among proteins (i.e., the SCS model was more accurate than the neutral model in predicting sequences of the influenza NS1 protein at different time points). Still, we consider that efforts are required in the field of substitution models of molecular evolution. For example, amino acids with similar physicochemical properties can result in predictions with appropriate folding stability while different specific sequence. The development of accurate substitution models of molecular evolution is an active area of research with ongoing progress, but further efforts are still needed. Next, forecasting the folding stability of future real proteins is fundamental for proper forecasting protein evolution, given the essential role of folding stability in protein function and its variety of applications. Regarding the conceptual concerns related to fitness modeling, we clarify them in detail in our responses to the specific comments below.

Major concerns:

(1) Fitness model: All lineages have the same growth rate r = b-d because the authors assume b+d=1. But under a birth-death model, the growth r is equivalent to fitness, so this is essentially assuming all lineages have the same absolute fitness since increases in reproductive fitness (b) will simply trade off with decreases in survival (d). Thus, even if the SCS model constrains sequence evolution, the birthdeath model does not really allow for non-neutral evolution such that mutations can feed back and alter the structure of the phylogeny.

We thank the reviewer for this comment that aims to improve the realism of our model. In the model presented (but see later another model, derived from the proposal of the reviewer, that we have now implemented into the framework and applied it to the study data), the fitness predicted from a protein variant is used to obtain the corresponding birth rate of that variant. In this way, protein variants with high fitness have high birth rates leading to overall more birth events, while protein variants with low fitness have low birth rates resulting in overall more extinction events, which has biological meaning for the study system. The statement “All lineages have the same growth rate r = b-d” in our model is incorrect because, in our model, b and d can vary among lineages according to the fitness. For example, a lineage might have b=0.9, d=0.1, r=0.8, while another lineage could have b=0.6, d=0.4, r=0.2. Indeed, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect. Clearly, assuming that all lineages have the same fitness would not make sense, in that situation the folding stability of the forecasted protein variants would be similar under any model, which is not the case as shown in the results. In our model, the fitness affects the reproductive success, where protein variants with a high fitness have higher birth rates leading to more birth events, while those with lower fitness have higher death rates leading to more extinction events. This parameterization is meaningful for protein evolution because the fitness of a protein variant can affect its survival (birth or extinction) without necessarily affecting its rate of evolution. While faster growth rate can sometimes be associated with higher fitness, a variant with high fitness does not necessarily accumulate substitutions at a faster rate. Regarding the phylogenetic structure, the model presented considers variable birth and death events across different lineages according to the fitness of the corresponding protein variants, and this affects the derived phylogeny (i.e., protein variants selected against can go extinct while others with high fitness can produce descendants). We are not sure about the meaning of the term “mutations can feed back” in the context of our system. Note that we use Markov models of evolution, which are well-stablished in the field (despite their limitations), and substitutions are fixed mutations, which still could be reverted later if selected by the substitution model (Yang 2006). Altogether, we find that the presented birth-death model is technically correct and appropriate for modeling our biological system. Its integration with structurally constrained substitution (SCS) models of protein evolution as Markov models follows general approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012). We have now provided a more detailed description of the models in the manuscript.

Apart from these clarifications about the birth-death model used, we could understand the point of the reviewer and following the suggestion we have now incorporated an additional birth-death model that accounts for variable global birth-death rate among lineages. Specifically, we followed the model proposed by Neher et al (2014), where the death rate is considered as 1 and the birth rate is modeled as 1 + fitness. In this model, the global birth-death rate can vary among lineages. We implemented this model into the computer framework and applied it to the data used for the evaluation of the models. The results indicated that, in general, this model yields similar predictive accuracy compared to the previous birth-death model. Thus, accounting for variability in the global birth-death rate does not appear to play a major role in the studied systems of protein evolution. We have now presented this additional birth-death model and its results in the manuscript.

(2) Predictive performance: Similar performance in predicting amino acid frequencies is observed under both the SCS model and the neutral model. I suspect that this rather disappointing result owes to the fact that the absolute fitness of different viral variants could not actually change during the simulations (see comment #1).

As indicated in our previous answer, our study shows a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. Next, predicting the exact sequences was more challenging, which was not surprising considering previous studies. In particular, inferring specific sequences is considerably challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). Indeed, observed sequence diversity is much greater than observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can yield modeled protein variants with more accurate folding stability, even when the exact amino acid sequences differ. As indicated, further work is demanded in the field of substitution models of molecular evolution. Next, in this revised version, where we included analyses of additional real datasets, we found that the accuracy of sequence prediction can vary among datasets. Notably, the analysis of an influenza NS1 protein dataset, with higher diversity than the other datasets studied, showed that the SCS model was more accurate than the neutral model in predicting sequences across different time points. Datasets with relatively high sequence diversity can contain more evolutionary information, which can improve prediction quality. In any case, as previously indicated, we believe that efforts are required in the field of substitution models of molecular evolution. Apart from that, forecasting the folding stability of future real proteins is an important advance in forecasting protein evolution, given the essential role of folding stability in protein function (Scheiblhofer, et al. 2017; Bloom and Neher 2023) and its variety of applications.

Next, also as indicated in our previous response, the birth-death model used in this study accounts for variation in fitness among lineages producing variable reproductive success. The additional birth-death model that we have now incorporated, which considers variation of the global birth-death rate among lineages, produced similar prediction accuracy, suggesting a limited role in protein evolution modeling. Molecular evolution parameters, particularly the substitution model, appear to be more critical in this regard. We have now included these aspects in the manuscript.

(3) Model assessment: It would be interesting to know how much the predictions were informed by the structurally constrained sequence evolution model versus the birth-death model. To explore this, the authors could consider three different models: (1) neutral, (2) SCS, and (3) SCS + BD. Simulations under the SCS model could be performed by simulating molecular evolution along just one hypothetical lineage. Seeing if the SCS + BD model improves over the SCS model alone would be another way of testing whether mutations could actually impact the evolutionary dynamics of lineages in the phylogeny.

In the present study, we compared the neutral model + birth-death (BD) with the SCS model + BD. Markov substitution models Q are applied upon an evolutionary time (i.e., branch length, t) and this allows to determine the probability of substitution events during that time period [P(t) = exp (Qt)]. This approach is traditionally used in phylogenetics to model the incorporation of substitution events over time. Therefore, to compare the neutral and SCS models in terms of evolutionary inference, an evolutionary time is required, in this case it is provided by the birth-death process. Thus, the cases (1) and (2) cannot be compared without an underlined evolutionary history. Next, comparisons in terms of likelihood, and other aspects, between models that ignore the protein structure and the implemented SCS models are already available in previous studies based on coalescent simulations or given phylogenetic trees (Arenas, et al. 2013; Arenas, et al. 2015). There, SCS models outperformed models that ignore evolutionary constraints from the protein structure, and those findings are consistent with the results obtained in the present study where we explored the application of these models to forecasting protein evolution. We would like to emphasize that forecasting the folding stability of future real proteins is a significant finding, folding stability is fundamental to protein function and has a variety of applications. We have now indicated these aspects in the manuscript.

(4) Background fitness effects: The model ignores background genetic variation in fitness. I think this is particularly important as the fitness effects of mutations in any one protein may be overshadowed by the fitness effects of mutations elsewhere in the genome. The model also ignores background changes in fitness due to the environment, but I acknowledge that might be beyond the scope of the current work.

AU: This comment made us realize that more information about the features of the implemented SCS models should be included in the manuscript. In particular, the implemented SCS models consider a negative design based on the observed residue contacts in nearly all proteins available in the Protein Data Bank (Arenas, et al. 2013; Arenas, et al. 2015). This data is distributed with the framework, and it can be updated to incorporate new structures (further details are provided in the distributed framework documentation and practical examples). Therefore, the prediction of folding stability is a combination of positive design (direct analysis of the target protein) and negative design (consideration of background proteins from a database to improve the predictions), thus incorporating background molecular diversity. We have now indicated this important aspect in the manuscript. Regarding the fitness caused by the environment, we agree with the reviewer. This is a challenge for any method aiming to forecast evolution, as future environmental shifts are inherently unpredictable and may affect the accuracy of the predictions. Although one might attempt to incorporate such effects into the model, doing so risks overparameterization, especially when the additional factors are uncertain or speculative. We have now mentioned this aspect in the manuscript.

(5) In contrast to the model explored here, recent work on multi-type birth-death processes has considered models where lineages have type-specific birth and/or death rates and therefore also type-specific growth rates and fitness (Stadler and Bonhoeffer, 2013; Kunhert et al., 2017; Barido-Sottani, 2023). Rasmussen & Stadler (eLife, 2019) even consider a multi-type birth-death model where the fitness effects of multiple mutations in a protein or viral genome collectively determine the overall fitness of a lineage. The key difference with this work presented here is that these models allow lineages to have different growth rates and fitness, so these models truly allow for non-neutral evolutionary dynamics. It would appear the authors might need to adopt a similar approach to successfully predict protein evolution.

We agree with the reviewer that robust birth-death models have been developed applying statistics and, in many cases, the primary aim of those studies is the development and refinement of the model itself. Regarding the study by Rasmussen and Stadler 2019, it incorporates an external evaluation of mutation events where the used fitness is specific for the proteins investigated in that study, which may pose challenges for users interested in analyzing other proteins. In contrast, our study takes a different approach. We implement a fitness function that can be predicted and evaluated for any type of structural protein (Goldstein 2013), making it broadly applicable. Actually, in this revised version we added the analysis of additional data of another protein (influenza NS1 protein) with predictions at different time points. In addition, we provide a freely available and well-documented computational framework to facilitate its use. The primary aim of our study is not the development of novel or complex birthdeath models. Rather, we aim to explore the integration of a standard birth-death model with SCS models for the purpose of predicting protein evolution. In the context of protein evolution, substitution models are a critical factor (Liberles, et al. 2012; Wilke 2012; Bordner and Mittelmann 2013; Echave, et al. 2016; Arenas, et al. 2017; Echave and Wilke 2017), and the presented combination with a birth-death model constitutes a first approximation upon which next studies can build to better understand this evolutionary system. We have now indicated these considerations in the manuscript.

Reviewer #2 (Public review):

Summary:

In this study, "Forecasting protein evolution by integrating birth-death population models with structurally constrained substitution models", David Ferreiro and coauthors present a forward-in-time evolutionary simulation framework that integrates a birth-death population model with a fitness function based on protein folding stability. By incorporating structurally constrained substitution models and estimating fitness from ΔG values using homology-modeled structures, the authors aim to capture biophysically realistic evolutionary dynamics. The approach is implemented in a new version of their open-source software, ProteinEvolver2, and is applied to four viral proteins from HIV-1 and SARS-CoV-2.

Overall, the study presents a compelling rationale for using folding stability as a constraint in evolutionary simulations and offers a novel framework and software to explore such dynamics. While the results are promising, particularly for predicting biophysical properties, the current analysis provides only partial evidence for true evolutionary forecasting, especially at the sequence level. The work offers a meaningful conceptual advance and a useful simulation tool, and sets the stage for more extensive validation in future studies.

We thank the reviewer for the positive comments on our study. Regarding the predictive power, the results showed good accuracy in predicting the folding stability of the forecasted protein variants. In this revised version, where we included analyses of additional real datasets, we found that the accuracy of sequence prediction can vary among datasets. Notably, the analysis of an influenza NS1 protein dataset, with higher diversity than the other datasets studied, showed that the SCS model was more accurate than the neutral model in predicting sequences across different time points. Datasets with relatively high sequence diversity can contain more evolutionary information, which can improve prediction quality. Still, we believe that further efforts are required in the field in improving the accuracy of substitution models of molecular evolution. Altogether, accurately forecasting the folding stability of future real proteins is fundamental for predicting their protein function and enabling a variety of applications. Also, we implemented the models into a freely available computer framework, with detailed documentation and a variety of practical examples.

Strengths:

The results demonstrate that fitness constraints based on protein stability can prevent the emergence of unrealistic, destabilized variants - a limitation of traditional, neutral substitution models. In particular, the predicted folding stabilities of simulated protein variants closely match those observed in real variants, suggesting that the model captures relevant biophysical constraints.

We agree with the reviewer and appreciate the consideration that forecasting the folding stability of future real proteins is a relevant finding. For instance, folding stability is fundamental for protein function and affects several other molecular properties.

Weaknesses:

The predictive scope of the method remains limited. While the model effectively preserves folding stability, its ability to forecast specific sequence content is not well supported.

Our study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. Predicting the exact sequences was more challenging, which was not surprising considering previous studies. In particular, inferring specific sequences is considerably challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). Indeed, observed sequence diversity is much greater than observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can yield modeled protein variants with more accurate folding stability, even when the exact amino acid sequences differ. As indicated, further work is demanded in the field of substitution models of molecular evolution. Next, in this revised version, where we included analyses of additional real datasets, we found that the accuracy of sequence prediction can vary among datasets. Notably, the analysis of an influenza NS1 protein dataset, with higher diversity than the other datasets studied, showed that the SCS model was more accurate than the neutral model in predicting sequences across different time points. Datasets with relatively high sequence diversity can contain more evolutionary information, which can improve prediction quality. In any case, as previously indicated, we believe that efforts are required in the field of substitution models of molecular evolution. Apart from that, forecasting the folding stability of future real proteins is an important advance in forecasting protein evolution, given the essential role of folding stability in protein function (Scheiblhofer, et al. 2017; Bloom and Neher 2023) and its variety of applications. We have now expanded these aspects in the manuscript.

Only one dataset (HIV-1 MA) is evaluated for sequence-level divergence using KL divergence; this analysis is absent for the other proteins. The authors use a consensus Omicron sequence as a representative endpoint for SARS-CoV-2, which overlooks the rich longitudinal sequence data available from GISAID. The use of just one consensus from a single time point is not fully justified, given the extensive temporal and geographical sampling available. Extending the analysis to include multiple timepoints, particularly for SARS-CoV-2, would strengthen the predictive claims. Similarly, applying the model to other well-sampled viral proteins, such as those from influenza or RSV, would broaden its relevance and test its generalizability.

The evaluation of forecasting evolution using real datasets is complex due to several conceptual and practical aspects. In contrast to traditional phylogenetic reconstruction of past evolutionary events and ancestral sequences, forecasting evolution often begins with a variant that is evolved forward in time and requires a rough fitness landscape to select among possible future variants (Lässig, et al. 2017). Another concern for validating the method is the need to know the initial variant that gives rise to the corresponding future (forecasted) variants, and it is not always known. Thus, we investigated systems where the initial variant, or a close approximation, is known, such as scenarios of in vitro monitored evolution. In the case of SARS-CoV-2, the Wuhan variant is commonly used as the starting variant of the pandemic. Next, since forecasting evolution is highly dependent on the used model of evolution, unexpected external factors can be dramatic for the predictions. For this reason, systems with minimal external influences provide a more controlled context for evaluating forecasting evolution. For instance, scenarios of in vitro monitored virus evolution avoid some external factors such as host immune responses. Another important aspect is the availability of data at two (i.e., present and future) or more time points along the evolutionary trajectory, with sufficient genetic diversity between them to identify clear evolutionary signatures. Additionally, using consensus sequences can help mitigate effects from unfixed mutations, which should not be modeled by a substitution model of evolution. Altogether, not all datasets are appropriate to properly evaluate or apply forecasting evolution. These aspects are indicated in the manuscript. Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016). This aspect is now more clearly indicated in the manuscript. Regarding the Omicron datasets, we used 384 curated sequences of the Omicron variant of concern to construct the study data and we believe that it is a representative sample. The sequence used for the initial time point was the Wuhan variant (Wu, et al. 2020), which is commonly assumed to be the origin of the pandemic in SARS-CoV-2 studies. As previously indicated, the use of consensus sequences is convenient to avoid variants with unfixed mutations. Regarding extending the analysis to other time points (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is required to properly evaluate the prediction method. Actually, we noted that earlier variants of concern show a small number of fixed mutations in the study proteins, despite the availability of large numbers of sequences in databases such as GISAID. Additionally, we investigated the evolutionary trajectories of HIV-1 protease (PR) in 12 intra-host viral populations with predictions for up to four different time points. Apart from those aspects, following the proposal of the reviewer, we have now incorporated the analysis of an additional dataset of influenza NS1 protein (Bao, et al. 2008), with predictions for two different time points, to further assess the generalizability of the method. We have now included details of this influenza NS1 protein dataset and the predictions derived from it in the manuscript.

It would also be informative to include a retrospective analysis of the evolution of protein stability along known historical trajectories. This would allow the authors to assess whether folding stability is indeed preserved in real-world evolution, as assumed in their model.

Our present study does not aim to investigate the evolution of the folding stability over time, although it provides this information indirectly at the studied time points. Instead, the present study shows that the folding stability of the forecasted protein variants is similar to the folding stability of the corresponding real protein variants for diverse viral proteins, which provides an important evaluation of the prediction method. Next, the folding stability can indeed vary over time in both real and modeled evolutionary scenarios, and our present study is not in conflict with this. In that regard, which is not the aim of our present study, some previous phylogenetic-based studies have reported temporal fluctuations in folding stability for diverse protein data (Arenas, et al. 2017; Olabode, et al. 2017; Arenas and Bastolla 2020; Ferreiro, et al. 2022).

Finally, a discussion on the impact of structural templates - and whether the fixed template remains valid across divergent sequences - would be valuable. Addressing the possibility of structural remodeling or template switching during evolution would improve confidence in the model's applicability to more divergent evolutionary scenarios.

This is an important point. For the datasets that required homology modeling (in several cases it was not necessary because the sequence was present in a protein structure of the PDB), the structural templates were selected using SWISS-MODEL, and we applied the best-fitting template. We have now included in a supplementary table details about the fitting of the structural templates. Indeed, our proposal assumes that the protein structure is maintained over the studied evolutionary time, which can be generally reasonable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). Over longer evolutionary timescales, structural changes may occur and, in such cases, modeling the evolution of the protein structure would be necessary. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, may offer promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real data with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We have now included this discussion in the manuscript.

Reviewer #1 (Recommendations for the authors):

(1) Abstract: "expectedly, the errors grew up in the prediction of the corresponding sequences" <- Not entirely clear what is meant by "errors grew up" or what the errors grew with.

This sentence refers to the accuracy of sequence prediction in comparison to that of folding stability prediction. We have now clarified this aspect in the manuscript.

(2) Lines 162-165: "Alternatively, if the fitness is determined based on the similarity in folding stability between the modeled variant and a real variant, the birth rate is assumed to be 1 minus the root mean square deviation (RMSD) in folding stability." <- What is the biological motivation for using the RMSD? It seems like a more stable variant would always have higher fitness, at least according to Equation 1.

RMSD is commonly used in molecular biology to compare proteins in terms of structural distance, folding stability, kinetics, and other properties. It offers advantages such as minimizing the influence of small deviations while amplifying larger differences, thereby enhancing the detection of remarkable molecular changes. Additionally, RMSD would facilitate the incorporation of other biophysical parameters, such as structural divergences from a wild-type variant or entropy, which could be informative for fitness in future versions of the method. We have now included this consideration in the manuscript.

(3) Lines 165-166: "In both cases, the death rate (d) is considered as 1-b to allow a constant global (birth-death) rate" <- This would give a constant R = b / (1-b) over the entire phylogenetic tree. For applications to pathogens like viruses with epidemic dynamics, this is extremely implausible. Is there any need to make such a restrictive assumption?

Regarding technical considerations of the model, we refer to our answer to the first public review comment. Next, a constant global rate of evolution was observed in numerous genes and proteins of diverse organisms, including viruses (Gojobori, et al.1990; Leitner and Albert 1999; Shankarappa, et al. 1999; Liu, et al. 2004; Lu, et al. 2018; Zhou, et al. 2019). However, following the comment of the reviewer, and as we indicated in our answer to the first public review comment, we have now implemented and evaluated an additional birth-death model that allows for variation in the global birth-death rate among lineages. We have implemented this additional model in the framework and described it along with its results in the manuscript.

(4) Lines 187-188: "As a consequence, since b+d=1 at each node, tn is consistent across all nodes, according to (Harmon, 2019)." <- This would also imply that all lineages have a growth rate r = b - d, which under a birth-death model is equivalent to saying all lineages have the same fitness!

We clarified this aspect in our answer to the first public review comment. In particular, in the model presented, protein variants with higher fitness have higher birth rates, leading to more birth events, while protein variants with lower fitness have lower birth rates leading to more extinction events, which presents biological meaning for the study system. In our model b and d can vary among lineages according to the corresponding fitness (i.e., a lineage may have b=0.9, d=0.1, r=0.8; while another one may have b=0.6, d=0.4, r=0.2). Since the reproductive success varies among lineages in our model, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect, although it could be interpreted like that in certain models. Fitness affects reproductive success, but fitness and growth rate of evolution are different biological processes (despite a faster growth rate can sometimes be associated with higher fitness, a variant with a high fitness not necessarily has to accumulate substitutions at a higher rate). An example in molecular adaptation studies is the traditional nonsynonymous to synonymous substitution rates ratio (dN/dS), where dN/dS (that informs about selection derived from fitness) can be constant at different rates of evolution (dN and dS). In any case, we thank the reviewer for raising this point, which led us to incorporate an additional birth-death model and inspired some ideas. Thus, following the comment of the reviewer and as indicated in the answer to the first public review comment, we have now implemented and evaluated an additional birthdeath model that allows for variation in the global birth-death rate among lineages. The results indicated that this model yields similar predictive accuracy compared to the previous birth-death model. We have now included these aspects, along with the results from the additional model, in the manuscript.

(5) Line 321-322: "For the case of neutral evolution, all protein variants equally fit and are allowed, leading to only birth events," <- Why would there only be birth events? Lineages can die regardless of their fitness.

AU: In the neutral evolution model, all protein variants have the same fitness, resulting in a flat fitness landscape. Since variants are observed, we allowed birth events. However, it assumed the absence of death events as no information independent of fitness is available to support their inclusion and quantification, thereby avoiding the imposition of arbitrary death events based on an arbitrary death rate. We have now provided a justification of this assumption in the manuscript.

Reviewer #2 (Recommendations for the authors):

(1) Clarify the purpose of the alternative fitness mode ("ΔG similarity to a target variant"):

The manuscript briefly introduces an alternative fitness function based on the similarity of a simulated protein's folding stability to that of a real protein variant, but does not provide a clear motivation, usage scenario, or results derived from it.

The presented model provides two approaches for deriving fitness from predicted folding stability. The simpler approach assumes that a more stable protein variant has higher fitness than a less stable one. The alternative approach assigns high fitness to protein variants whose stability closely matches observed stability, acknowledging that the real observed stability is derived from the real selection process, and this approach considers negative design by contrasting the prediction with real information. For the analyses of real data in this study, we used the second approach, guided by these considerations. We have now clarified this aspect in the manuscript.

(2) Report structural template quality and modeling confidence:

Since folding stability (ΔG) estimates rely on structural models derived from homology templates, the accuracy of these predictions will be sensitive to the choice and quality of the template structure. I recommend that the authors report, for each protein modeled, the template's sequence identity, coverage, and modeling quality scores. This will help readers assess the confidence in the ΔG estimates and interpret how template quality might impact simulation outcomes.

We agree with the reviewer and we have now included additional information in a supplementary table regarding sequence identity, modeling quality and coverage of the structural templates for the proteins that required homology modeling. The selection of templates was performed using the well-established framework SWISS-MODEL and the best-fitting template was chosen. Next, a large number of protein structures are available in the PDB for the study proteins, which favors the accuracy of the homology modeling. For some datasets, homology modeling was not required, as the modeled sequence was already present in an available protein structure. We have now included this information in the manuscript and in a supplementary table.

(3) Clarify whether structural remodeling occurs during simulation:

It appears that folding stability (ΔG) for all simulated protein variants is computed by mapping them onto a single initial homology model, without remodeling the structure as sequences evolve. If correct, this should be clearly stated, as it assumes that the structural fold remains valid across all simulated variants. A discussion on the potential impact of structural drift would be welcome.

We agree with the reviewer. As indicated in our answer to a previous comment, our method assumes that the protein structure is maintained over the studied evolutionary time, which is generally acceptable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). At longer timescales the protein structure could change, requiring the modeling of structural evolution over the evolutionary time. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, can be promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real datasets with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We have now included this discussion in the manuscript.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Ferreiro D, González-Vázquez LD, Prado-Comesaña A, Arenas M. 2025. Forecasting protein evolution by combining birth-death population models with structurally constrained substitution models. Zenodo. [DOI] [PMC free article] [PubMed]
    2. GISAID 2025. SARS-CoV-2 protein sequences. EpiCoV. [DOI]

    Supplementary Materials

    Supplementary file 1. Supplementary tables A–C and references cited in the supplementary file.
    elife-106365-supp1.pdf (244.9KB, pdf)
    MDAR checklist

    Data Availability Statement

    The computer framework ProteinEvolver2 is freely available from https://github.com/MiguelArenas/proteinevolver (Arenas, 2025). The SARS-CoV-2 data is available from GISAID database with https://doi.org/10.55876/gis8.250206gt. The real and predicted protein variants are available from Zenodo repository at the URL https://doi.org/10.5281/zenodo.15548146.

    The following dataset was generated:

    Ferreiro D, González-Vázquez LD, Prado-Comesaña A, Arenas M. 2025. Forecasting protein evolution by combining birth-death population models with structurally constrained substitution models. Zenodo.

    The following previously published dataset was used:

    GISAID 2025. SARS-CoV-2 protein sequences. EpiCoV.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES