Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Aug 8.
Published in final edited form as: Science. 2025 Jun 12;388(6752):eadu1058. doi: 10.1126/science.adu1058

Evolutionary-scale enzymology enables exploration of a rugged catalytic landscape

Duncan F Muir 1,2, Garrison P R Asper 1, Pascal Notin 3,4, Jacob A Posner 1,5, Debora S Marks 3,6, Michael J Keiser 7,8,9, Margaux M Pinney 1,10,*
PMCID: PMC12333904  NIHMSID: NIHMS2097277  PMID: 40504906

Abstract

Quantitatively mapping enzyme sequence-catalysis landscapes remains a critical challenge in understanding enzyme function, evolution, and design. Here, we leverage emerging microfluidic technology to measure catalytic constants—kcat and KM—for hundreds of diverse orthologs and mutants of Adenylate Kinase (ADK). We dissect this sequence-catalysis landscape’s topology, navigability, and mechanistic underpinnings, revealing catalytically heterogeneous neighborhoods organized by domain architecture. These results challenge long-standing hypotheses in enzyme adaptation, demonstrating that thermophilic enzymes are not universally slower than their mesophilic counterparts. Semi-supervised models that combine our data with the rich sequence representations from large protein language models predict orthologous ADK-sequence catalytic parameters better than existing approaches. Our work demonstrates a promising strategy for dissecting sequence-catalysis landscapes across enzymatic evolution, opening new avenues for enzyme engineering and functional prediction.


Natural selection has shaped the catalytic parameters of enzymatic reactions, determining the rates and specificities that govern nearly all biological processes. Enzyme sequence-function relationships are often conceptualized as a landscape traversed by mutational “walks” (Fig. 1A). Understanding the topologies and underlying mechanisms of these sequence-catalysis landscapes can shed light on how enzymes have evolved, and aid in developing predictive models to guide the optimization of enzyme function (1-5).

Figure 1. Mapping the sequence-catalysis of Adenylate Kinase across evolution.

Figure 1.

(A) A schematic of a sequence catalysis landscape is plotted as a surface plot with local and global optima. The X and Y axes represent a continuous protein “sequence space” with the Z-axis representing a biochemical property, namely kcat. Green and blue lines show possible paths explorable by evolution that reach unique optima. (B) The ADK sequence library characterized herein (dots) spans the bacterial tree of life (see Methods, Generation of Taxonomic Trees)(90), with median optimal growth temperature encoded on a color scale. Growth temperatures range from the coldest to the hottest environments on Earth (45). (C) Structures of the primary ADK lid-types are displayed (from left to right): Geobacillus stearothermophilus (Zn2+ Lid, PDB: 1ZIP)(91), Escherichia coli (H-Bond Lid, PDB: 1AKE)(92), and Bartonella henselae (Lidless, AlphaFold3)(93). Sequence entropy from an MSA is encoded at each residue position by increasing color value.

This inherent complexity of protein sequence space has often restricted sequence-function landscapes to simplified illustrations to conceptualize—but not quantify—evolutionary processes (e.g., Fig. 1A) (5). Current experimental explorations of enzyme sequence-catalysis landscapes are limited to narrow regions of the underlying protein sequence space, such as combinatorial mutations at a restricted number of sites (typically <5) or all possible single mutations in deep mutational scans (DMS) and are liable to be stuck in local optima. In contrast, genomic sequencing databases reveal extensive natural variation in enzymes potentially yielding substantial differences in catalytic properties from adaptation to diverse environments (e.g., temperature). Nevertheless, catalytic activities are typically measured for a few (<10) orthologs under consistent conditions. Thus, the topology of sequence-catalysis landscapes at the scale of naturally occurring sequences remains biochemically underexplored and underdetermined.

Existing approaches to model sequence-catalysis relationships are limited by a dearth of catalytic parameters for a given enzyme and narrow explorations of sequence space. Without large-scale catalytic datasets, unsupervised deep-learning methods can learn complex distributions that may approximate these high-dimensional landscapes (6). Protein Language Models (PLMs), trained on the millions of naturally occurring protein sequences, can predict three-dimensional structure (7, 8), reflect the biochemical properties of amino acids (9), and sometimes approximate the functional effects of single mutations (10-13) However, data limitations, especially the reliance on fitness-linked proxies to estimate enzyme function rather than direct kinetic parameters, can confound validation. There is a pressing need for quantitative datasets of gold-standard catalytic parameters collected under consistent conditions to test whether sequence-only models can accurately predict enzyme properties across diverse evolutionary and environmental contexts.

To bridge the “yawning chasm” between sequence data and direct enzyme kinetics (4), we set out to systematically measure and model an evolutionary scale sequence-catalysis landscape for the enzyme adenylate kinase (ADK). Leveraging an emerging microfluidics platform, High-Throughput Microfluidic Enzyme Kinetics (HT-MEK) (14), we measure the Michaelis-Menten parameters kcat (catalytic constant), KM (Michaelis-Menten constant) and kcat/KM (catalytic efficiency) for hundreds of diverse naturally occurring and mutant sequences under consistent conditions. This approach allows us to dissect the topology, navigability, and organization of a sequence-catalysis landscape at the evolutionary scale. We further assess the extent to which learned representations from unsupervised deep-learning models inherently reflect these sequence-catalysis relationships. Finally, we develop supervised and semi-supervised deep-learning models trained on our kinetic data to predict kcat values for these orthologous ADK sequences.

Results and Discussion

High-throughput measurement of ADK catalytic parameters across the bacterial and archaeal tree of life

The enzyme ADK reversibly catalyzes the phosphoryl transfer between two ADP molecules (or ATP and AMP) using a Mg2+ cofactor (Fig. S1) (15). ADK consists of three domains–CORE, AMP-binding (AMP), and LID (Fig. S1)–with an average length of 211 amino acids. The CORE domain primarily forms the nucleotide-binding site and contributes essential catalytic residues. The rate-limiting step of ADK catalysis is a large conformational change that opens two domains, LID, and AMP, enabling product release (16-18).

To map the naturally occurring sequence-catalysis landscape of ADK, we gathered orthologous ADK sequences from bacteria and archaea from protein sequence databases (see Methods). We selected 193 orthologs from organisms adapted to divergent temperatures, as temperature adaptation has been suggested to drive differences in ADK catalytic parameters (16, 19-23) (see Methods, Table S1). The resulting ADK sequence library has an average pairwise sequence identity of 42% and median optimal growth temperatures (TGrowth) ranging from 9 to 96 °C (Fig. 1B, Fig. S2). Despite low average sequence identity, most substrate-contacting residues are >90% conserved, including the key catalytic arginine residues R36, R88, R123, R156, and R167 (E. coli ADK [ecADK] numbering used throughout) (24) (Fig. S3). Furthermore, the LID domain of ADK primarily has three different architectures—“Zn2+-binding”, “H-bond”, and “Lidless”—which share similar residue conservation patterns (Fig. 1C). As most of our orthologs lack experimentally derived structures, we used AlphaFold2 (25) to predict their structures and found them to share the same general fold (Fig. S4).

Measuring Michaelis-Menten kinetics for >102 enzyme sequences is intractable with traditional bench-top biochemistry methods. To overcome this challenge, we extended an emerging microfluidic technology, HT-MEK (14), to assay all orthologous ADK enzyme kinetics in parallel under identical conditions. We recombinantly expressed, purified, and assayed all 193 ADK orthologs on a single HT-MEK device (Fig. 2A). All ADK orthologs were tagged at the C-terminus with a flexible Ser-Gly linker and eGFP, which enabled the purification of each ADK ortholog and subsequent estimation of per-chamber ADK concentrations (Fig. 2B; see Methods). To monitor ADK activity “on-chip”, we coupled the formation of ATP to the production of NADPH (26), which was detected using time-lapse microscopy (Fig. 2B). NADPH measurement required several key modifications to the HT-MEK method and imaging setup to enhance detection of this near-UV fluorophore, which was poorly resolved with the original system (see Methods). Additionally, we optimized the surface coating of the slide that forms the base of the HT-MEK device to improve structural integrity and prevent device failure. Observed initial rates collected over a range of ADP concentrations were normalized by ADK concentration and fit to the Michaelis-Menten equation, obtaining kcat and KM values for each ADK ortholog (Fig. 2C-D, Fig. S5). Across the 1792 chambers in our device, we obtain an average of 6-7 biological replicates for each ADK ortholog per on-chip experiment. Out of the 193 ADK orthologs, 181 expressed and displayed catalytic activity above background, and of those, 175 had bounded KM values under our assay conditions (Table S2, Methods). For ADK orthologs that we expressed off-chip and characterized using more traditional plate reader assays, a strong correlation is observed between on-chip and off-chip kinetic values (Pearson’s r = 0.99 and 0.96 for kcat and KM respectively; Fig. 2E, Fig. S6A, Table S3). The wild-type sequences (26 total) included as internal controls across independent plasmid preparations and different HT-MEK devices demonstrated high kcat and KM reproducibility (Pearson’s r = 0.96, 0.94, respectively, Fig. S7, Table S2, S4). Additionally, we find that the majority of kinetic measurements exhibit a low coefficient of variation (CV = standard deviation/mean), typically less than 0.2, indicating strong reproducibility across chambers (Fig. S8). However, we note that the largest CVs for kcat and KM occur predominantly among our slowest orthologs (Fig. S6B). Multiple controls provide confidence that these naturally occurring ADKs are natively folded on-chip (Supplementary Text, Fig. S9-12).

Figure 2. Michaelis-Menten parameters for hundreds of naturally occurring ADK sequences can be measured in parallel via high-throughput microfluidic devices.

Figure 2.

(A) A single High-Throughput Microfluidic Enzyme Kinetics (HT-MEK) device enabled the expression and purification of up to 1792 enzyme variants in a single experiment. All enzyme variants are tagged with a C-terminal eGFP construct to facilitate capture on a functionalized "pedestal." This pedestal consists of neutravidin proteins (NA) non-specifically bound to a PDMS-coated quartz slide, which in turn binds a biotinylated anti-eGFP VHH nanobody, pulling down the eGFP-tagged ADK in each chamber. (B) ADK activity in the direction of ATP formation is monitored on-chip through coupled production of NADPH, and product formation is measured over the course of the assay with time-lapse inverted fluorescent microscopy. Four chambers containing exemplary orthologs are highlighted, as well as a control chamber that did not contain an ADK-encoding plasmid and thus did not show any expressed enzyme or detectable catalysis. (C) Scatter plot of progress curves for bsADK across multiple substrate concentrations (encoded by color). (D) Mean fits of initial rates to the Michaelis-Menten equation for four ADK orthologs. Shaded regions represent the standard deviation of kcat across biological replicates for each ortholog. Shortened organism names are used for brevity; full names are provided in Table S1. (E) The correlation in kcat relative to bsADK between on-chip and “off-chip” (collected on plate reader) measurements is plotted (Pearson’s r = 0.99, p <0.001). Off-chip measurements are plotted as the average of two technical replicates. On-chip measurements are plotted as the average of biological replicates (N=6-7 on average), with error bars representing standard deviation. A grey dashed line represents the y=x line. The regression line is plotted as solid blue line.

Although the kcat values of all natural enzymes vary by ~5 orders of magnitude (27), the kcat values within the ADK family alone span at least three orders of magnitude, ranging from 1–803 s−1 (Fig. 3A, B, Fig. S13). Thus, naturally occurring sequences performing analogous functions across different organisms can exhibit catalytic activities spanning orders of magnitude despite having superimposable structures, experimental and predicted, and conserved active sites (Fig. 1C, Fig. S3-4). This finding underscores the challenge of predicting catalytic function from sequence alone.

Figure 3. Adenylate kinase catalytic parameters span three orders of magnitude and correlate weakly with phylogeny and environmental conditions.

Figure 3.

(A) Catalytic parameters and TGrowth values for 100/181 ADK orthologs (including those with KM outside the range of the assay) are displayed as a heatmap across a taxonomic tree contstructued from the species studied in this manuscript (Methods). kcat and KM are colored on a log scale. Orthologs with KM outside the range of the assay are labeled with asterisks (*). See Fig. S7 for a heatmap of all 181 ADK orthologs. The consensus ADK sequence (61) is plotted at the bottom (kcat = 5.4 s−1, KM = 804 uM). Shortened organism names are used for brevity; full names are provided in Table S1. (B) Measured kcat values for 175 orthologs span three orders of magnitude. The vertical dashed line represents the kcat for the ADK consensus sequence (61). (C,D) Phylogenetic signal analysis for the 175 orthologs passing assay filters of (C) kcat and (D) TGrowth. Moran’s I index of autocorrelation is plotted as a solid black line, with the 95% confidence interval outlined by dashed black lines. The colored bar at the bottom encodes the significance of autocorrelation, with red and blue representing positive and negative significant autocorrelation, respectively, and black representing nonsignificant autocorrelation. (E) The correlation between optimal growth temperature and kcat for the 175 orthologs passing assay filters is plotted (Pearson’s r = 0.31, p<1.0e-4). The regression line is plotted as a solid blue line, with the 95% CI shaded in blue).

Mapping the catalytic parameters kcat, KM, and kcat/KM to the organismal taxonomic tree shows very little visual organization (Fig. 3A). We next computed the phylogenetic signal, which quantifies the autocorrelation between distance on a phylogenetic tree and continuous traits (Fig. 3C-D, Fig. S14; see Methods) (28). We quantified this relationship for kcat, KM, kcat/KM, and TGrowth and observed that kcat values show a positive correlation over short phylogenetic distance, albeit weaker than TGrowth phylogenetic signal (Fig. 3D). However, across medium to long phylogenetic distances, kcat is decorrelated with phylogeny (Fig. 3C), with high kcat values often interspersed among comparatively low ones. Similar behaviors are observed for KM and kcat/KM (Fig. S14). This result suggests that high catalytic activity has independently evolved multiple times during ADK evolution along distinct lineages. Thus, computational models for predicting catalytic activity must encompass multiple evolutionary—and potentially mechanistic—routes to high activity.

Many thermophilic orthologs remain highly active at mesophilic temperatures, and psychrophilic ADKs are not catalytically superior

Temperature is a pervasive environmental factor reported to drive adaptive changes in enzyme catalytic rates (19, 26, 29-34). Discussions of activity-stability trade-offs propose that stabilizing mutations in thermophilic enzymes increase rigidity during natural evolution, suppressing activity-promoting dynamics (29-32, 35-38). Accordingly, psychrophilic enzymes are frequently reported to have higher kcat values relative to their mesophilic and thermophilic counterparts when measured at a common temperature (29-32, 37). However, these findings typically rely on comparisons of just a pair of sequences and recent analyses of enzyme kinetic data from the BRENDA database (39) question the generality of this model (40).

We systematically tested whether such a trade-off was apparent across a wide range of growth temperatures under consistent conditions. When kcat is plotted against the optimal growth temperature of the corresponding organisms, we do not observe the expected negative correlation but instead a weak positive correlation (Fig. 3E). Similarly weak trends are observed for KMversus growth temperature, providing evidence against the related hypothesis that enzymatic KM values are increased in cold-adapted enzymes (Fig. S15) (29, 32, 34, 41). Since growth temperature correlates with melting temperature, thermodynamic stability, and enzyme inactivation temperatures (42-46), our data show that ADK activity and stability do not universally trade off during natural evolution. While the strong representation of organisms with a TGrowth of 37 °C in the public databases may reflect a default culture temperature, most organisms in this group were identified from mammalian hosts and/or are mammalian pathogens (47). Furthermore, multiple ablations and simulations of this data fail to produce a scenario in which a strong negative correlation between kcat and TGrowth emerges (Fig. S16), increasing our confidence that this observation is not artifactual. Consequently, psychrophilic ADKs are not universally catalytically superior, and thermophilic ADKs are not catalytically limited. In fact, some of the fastest ADKs come from thermophilic organisms, potentially because high ADK activity is needed at increased temperatures to regenerate the ATP pool to combat thermal lability (48). These findings are particularly intriguing for ADK, which has been central to the long-standing debate on the trade-off between thermodynamic stability and catalytic activity, and show that natural evolution can jointly optimize stability and activity, as has been observed in directed evolution campaigns (16, 19, 26, 30, 49-52). The joint optimization of both properties is perhaps facilitated by the independent folding of the CORE and LID domains, allowing the CORE to evolve high thermodynamic stability while the LID retains the necessary flexibility for domain opening along the reaction path (53). Our findings underscore the importance of examining sequence-catalysis landscapes on a broad scale. Although small, localized studies may suggest the existence of an activity-stability trade-off (29, 30, 32, 34-38, 54), the larger sequence space reveals that this trend does not hold universally for natural ADK sequences. Furthermore, our results indicate that ADK catalytic rates are not under strong selection during temperature adaptation. While further experiments will be needed to generalize these findings to other enzyme families with distinct mechanistic and structural features, these results suggest that activity stability tradeoffs are not a universal feature of natural evolution. Instead, bioinformatic trends observed in enzyme sequences across a wide range of TGrowth values are likely driven primarily by changes in stability rather than activity (43).

The evolutionary-scale ADK sequence-catalysis landscape is rugged, with multiple structural neighborhoods achieving distinct activity levels

Although protein “sequence space” is often discussed in conceptual terms, its high dimensionality (20 x sequence length) makes its concrete visualization challenging. We thus visualized the sequence space as a graph: each node corresponds to an orthologous ADK sequence, edges connect these nodes, and the connections are weighted by the number of edits in a multiple sequence alignment (MSA). Using our 175 observed sequences, we traversed this subset of ADK sequence space by minimizing the total edit distance, forming a Minimum Spanning Tree (MST). This MST visually represents the rugged sequence-catalysis landscape of naturally occurring ADKs (Fig. 4A).

Figure 4. Adenylate kinase sequence space features multiple neighborhoods linked to different lid types, each associated with distinct growth temperatures and reaching varying "heights" in activity.

Figure 4.

(A) A Minimum-Spanning Tree (MST) from the all-by-all graph of ADK sequences with edges weighted by edit distance in an MSA. Edge thickness in the MST encodes increasing identity on a log scale. Node color encodes measured kcat value in log scale, and node shape corresponds to lid type. Rough partitions between lid-type neighborhoods are labeled and outlined with dashed lines. Selected orthologs discussed in depth in this study are labeled. Shortened organism names are used for brevity; full names are provided in Table S1. (B) The three major types of LID domains found in ADKs: “Lidless” containing a short loop that still provides catalytic Arg residues (AlphaFold3 prediction of Bartonella henselae)(93), and two lidded variations, one containing hydrogen-bonding network consisting of a conserved His-Ser-Asp-Thr tetrad (H-Bond Lid, PDB: 1AKE)(92) and the other containing a cysteine tetrad chelating a Zn2+ ion (Zn2+ Lid, PDB: 1ZIP)(91). (C) H-Bond and Zn2+ Binding Lid ADKs are significantly faster than lidless counterparts but have similar activity distributions to one another (ANOVA, F=6.92, p-val=0.002; Tukey-HSD, Zn2+ vs. Lidless p-adj=0.002, H-Bond vs. Lidless p-adj=0.006, Zn2+ vs. H-Bond p-adj=0.963). (D) Zn2+ Binding Lid ADKs have a significantly higher associated growth temperature than the other lid-types (ANOVA, F=16.39, p-val=3.11e-7; Tukey-HSD, Zn2+ vs. Lidless p-adj=0.0005, H-Bond vs. Lidless p-adj=0.370, Zn2+ vs. H-Bond p-adj<1.0e-16). Cutoffs for psychrophilicity and thermophilicity are shown as dashed lines (42).

Because the ADK MST shows little organization by catalytic activity, we next explored whether ADK structural differences might organize it. The LID of ADK exists in three general forms: (1) "Lidless" ADKs, characterized by unstructured loops of varying lengths, present across all domains of life (Fig. 4B, top); (2) "Zn2+-binding" lids, which feature a structural Zn2+-binding site typically formed by four Cys residues, found in bacteria and archaea (Fig. 4B, middle); and (3) "H-Bond" lids, which possess an extended hydrogen bond network and are found almost exclusively in bacteria (Fig. 4B, bottom) (55-57). Visualizing the ADK MST reveals that these three lid types—Zn2+-binding, H-Bond, and Lidless—each cluster into distinct neighborhoods (Fig. 4A). A closer examination of these structural neighborhoods reveals that while the two lidded neighborhoods span similar dynamic ranges of catalytic activity, encompassing both the fastest and slowest ADKs, the Lidless sequences are slower (Fig. 4C).

ADKs adapted to different environmental temperatures inhabit different structural neighborhoods within the landscape. ADKs from psychrophilic organisms (TGrowth < 25 °C) have mostly H-Bond lid types, whereas ADKs from thermophilic organisms (TGrowth > 50 °C) have nearly exclusively Zn2+-binding lids (Fig. 4D, Fig. S17). The Zn2+-binding lid may be favored in thermophilic ADKs because it provides stability to the domain at high temperatures, consistent with prior studies that show an increase in melting temperature (Tm) by 15 °C when installing a Zn2+ binding site in the H-Bond lid of E. coli ADK (58). In contrast, Zn2+ can be limiting in marine environments where many psychrophilic organisms live, potentially favoring the H-Bond over the Zn2+-binding lid (59). Thermophilic ADKs are predominantly located at internal nodes within the MST, indicating they share more sequence similarity with other ADKs compared to their mesophilic and psychrophilic counterparts (Fig. S17). This result aligns with previous observations of highly thermostable consensus sequences in enzyme families (60-64). Thus, while the temperature of an organism’s environment does not appear to directly drive changes in catalytic properties (Fig. 3E), it does influence the distribution of organisms across this evolutionary-scale sequence-catalysis landscape, confining psychrophilic and thermophilic ADKs to specific neighborhoods of distinct lid structures (Fig. 4D, Fig. S17).

Navigating between neighborhoods: mutational walks and extra-dimensional bypasses

Navigating across evolutionary landscapes, either naturally or through experimental approaches, requires accessible pathways between distinct neighborhoods without highly deleterious intermediates. To explore the viability of transitioning between these lid-type neighborhoods in our landscape, we first focused on the Zn2+-binding and H-Bond neighborhoods. We generated combinatorial mutations to swap the four key residues that constitute the Zn2-binding motif–Cys130, Cys133, Cys150, and Cys153 (CCCC) in G. stearothermophilus (gsADK)–with the corresponding residues in the H-Bond motif–His126, Ser129, Asp146, and Thr149 (HSDT) in ecADK (Fig. 4B, Fig. S18). We measured kcat and KM for all possible mutations that interconvert the Zn2+-binding and H-Bond motifs in gsADK and ecADK backgrounds in a single high-throughput experiment. When installing a Zn2+-binding motif in the E. coli ADK LID, the final Zn2+-binding motif (CCCC) retains 74% of the activity of the wild-type H-Bond motif (HSDT), consistent with previous studies (58), and there are mutational pathways without highly deleterious intermediates (Fig. 5A). Given the feasibility of a mutational walk from the H-Bond to the Zn2+-binding lid, we anticipated that the reciprocal walk in gsADK—starting with the Zn2+-binding motif (CCCC) and ending with the H-Bond motif (HSDT)—should also be possible. Nearly all mutational trajectories in this direction encounter an unfavorable intermediate or "pit," leading to a large loss of activity or an inexpressible variant (Fig. 5B, Table S4). This non-reciprocal behavior aligns with differences in the three-dimensional context of the two domain architectures. Although the CCCC motif coordinates the Zn2+ ion independently of neighboring residues, the H-Bond motif is more complex, involving additional interactions with second-shell residues, presumably contributing to the observed epistasis of the HSDT motif (Fig. S19). Supporting this model, the H-Bond lid occupies a narrower region of sequence space compared to the Zn2+-binding lid (79% vs. 65% average sequence identity, respectively, Fig. S20).

Figure 5. Traversing the ADK sequence-catalysis landscape through mutational walks and “extra-dimensional bypasses”.

Figure 5.

(A) A graph showing mutational pathways of swapping the chelating cysteine tetrad into the H-Bond lid of ecADK. Nodes represent variants along the pathway, with their kcat value encoded by color. Dashed lines connect variants that are one mutation away from each other. Dashed circles represent variants with KM fit outside of the bounds of the assay. Fully swapping the ecADK H-Bond LID for gsADK’s Zn2+-binding LID is shown as an arrow. (B) A graph showing mutational pathways of swapping the hydrogen-bond motif into the Zn2+ lid of gsADK. Nodes represent variants along the pathway, with their kcat value encoded by color. Dashed lines connect variants that are one mutation away from each other. Dashed circles represent variants with KM fit outside of the bounds of the assay. Fully swapping the gsADK Zn2+-binding LID for ecADK’s H-Bond LID is shown as an arrow. Empty circles represent variants that expressed poorly or displayed activity below the limit of detections and are not connected by edges in the graph. (C) kcat and KM are plotted for LID chimeras of ocADK. Error bars represent standard deviation across biological replicates. The lower detection limit for kcat is plotted as a black dashed line. The bsADK LID chimera displayed activity below the lower limit of detection and is marked with an asterisk. Shortened organism names are used for brevity; full names are provided in Table S1. (D) kcat and KM are plotted for LID chimeras of ddADK. Error bars represent standard deviation across biological replicates. The lower detection limit for kcat is plotted as a black dashed line. The ecADK LID chimera exhibited a KM below the lower bound of the assay and is marked with an asterisk. Shortened organism names are used for brevity; full names are provided in Table S1. (E) Barplot of kcat for consensus sequences, with error bars representing standard deviation across biological replicates. The lidless consensus sequence did not exhibit activity above background.

We next explored whether navigation between these neighborhoods could be achieved through "extra-dimensional bypasses" (i.e., whole LID swaps), thereby avoiding unfavorable valleys (3). Substituting the ecADK H-Bond LID into gsADK results in a highly active ADK that is even faster than wild-type gsADK (Fig. 5B). Interestingly, the reverse experiment, where the entire LID of gsADK is swapped into ecADK, leads to a dramatic decrease in activity, dropping below the level of any of the incremental cysteine mutants. These results suggest that to avoid inactive intermediates, whole LID swaps are necessary to convert a Zn2+-binding lid into an H-Bond lid, whereas, in the reverse direction, it is more favorable to convert an H-Bond lid to a Zn2+-binding lid through single mutational steps.

Considering the existence of Lidless ADKs, which may represent the ancestral form of all ADKs given their presence across all domains of life (57), we explored navigating our landscapes through the insertion or deletion of the LID. Since we found Lidless ADKs to be slower than their lidded counterparts (Fig. 4C), we asked whether their activity could be “rescued” by inserting a LID. Given the variability in the degree of "lidlessness", we selected two Lidless ADKs for our study: O. acuminata ADK (oaADK), which has a seven amino-acid loop and expressed well but showed activity below our detection limit in the initial assay, and D. deserti ADK (ddADK), which has a longer 17 amino-acid loop and displayed above-average activity among Lidless ADKs. We generated chimeras incorporating the LIDs from gsADK, bsADK, ecADK, and vcADK into both oaADK and ddADK backgrounds. For oaADK, we found that three out of four LID chimeras successfully rescued catalytic activity above our detection limit, though measurements exhibited greater variance due to their proximity to the detection threshold (Fig. 5C). In contrast, for ddADK, which already had modest activity, we instead observed a decrease in kcat with a commensurate decrease in KM for measurable lid insertions (Fig. 5D). Therefore, LID insertion does not always increase kcat, complicating the prediction of ADK activity, which depends on LID dynamics but not solely on the LID sequence due to potential functional coupling between these domains (20).

Contributions from multiple neighborhoods catalytically impair the ADK consensus sequence

Consensus sequences have garnered interest for engineering proteins with enhanced stability while preserving catalytic activity, though the retention of activity depends on the specific natural enzyme used as a benchmark (61-63). For ADK, we anticipated that a consensus sequence incorporating elements from all three neighborhoods would produce suboptimal amino acid combinations. Indeed, a previously constructed ADK consensus sequence that spans the landscape has a kcat value of 5 s−1, making it slower than 91% of the naturally occurring sequences we measured (Fig. 3A, Table S2) (61). We independently constructed consensus sequences from each of the three neighborhoods and found that the internal consensus sequences for ADKs with H-Bond and Zn2+-binding LIDs were 57- and 2-fold faster in kcat compared to the consensus sequence for the entire family (Fig. 5E). The larger increase in kcat for the H-Bond lid consensus sequence aligns with the narrower sequence space for this lid type (62% vs. 45% internal pairwise sequence identity for H-Bond and Zn2+-binding LID orthologs, respectively, Fig. S21). The internal consensus sequence for the Lidless ADKs exhibited activity below our detection limit, perhaps due to the variability in loop length among Lidless ADKs (Fig. 5E). These results emphasize the importance of functional coupling between residue positions and the role of intramolecular epistasis in shaping sequence-catalysis landscapes (65-69).

Changes in ADK dynamics tune activity across billions of years of evolution

Since high ADK activity has evolved multiple times in distinct structural contexts, we investigated the molecular mechanisms underlying high activity in each neighborhood to determine if they are general or unique. Prior work in ecADK has shown that interconversion between the closed and open states involves the local unfolding of the LID and that this conformational equilibrium can be tuned with osmolytes and mutations (17, 57, 70-72). In particular, urea, which can stabilize the unfolded state of proteins, favors the more expanded, open forms of ecADK, increasing kcat by 1.7-fold at 2 M urea (71). We hypothesized that if changes in conformational dynamics were driving the differences in ADK catalysis, we would observe this capacity for conformational tuning across most naturally occurring ADK sequences. A closer examination of ADK activity under gradually increasing concentrations of urea (0.0–2.0 M) supports this hypothesis: most naturally occurring ADKs exhibited activation by urea, including for all three lid types, suggesting that the capacity for conformational tuning has been conserved in multiple structural contexts (Fig. 6A, 6B, Fig. S12). While it has been shown that mesophilic enzymes can be activated by low concentrations of denaturants (e.g. ecADK)(71, 73), previous reports of activation of thermophilic enzymes by denaturants were interpreted as increasing motions of these presumably overly rigid enzymes, and predict mesophiles to be less activated by urea than their thermophilic counterparts (42, 74, 75). Here, we systematically demonstrate that low concentrations of urea activate mesophilic ADKs to a similar extent as their thermophilic counterparts (Fig. S22), challenging the model of rigidity-activity trade-offs during temperature adaptation (29, 30, 32, 34-38, 54).

Figure 6. ADK conformational tuning with osmolytes and mutations across evolution.

Figure 6.

(A) Distribution of log2 fold-change in initial reaction rate at saturating substrate concentration relative to 0M urea for 0.5M, 1M, and 2M urea. (B) Mean fold-change in initial rate over 0M urea at 4mM [substrate]. Error bars represent standard deviation across biological replicates. The black dashed line represents no change in the initial rate. (C) Superimposed AF2 models of ppADK (tan) and vcADK (green) with positions 22, 128, and 135 highlighted. (D) Barplot of mean relative catalytic effects (technical replicates, n=2) of mutations at key positions that differ between ppADK and vcADK. Error bars represent standard deviation across replicates. Variants in ppADK background are plotted in tan and vcADK background in green. Amino acid identity at positions 22, 128, and 135 are displayed below each bar. kcat values for ppADK and vcADK mutants were collected off-chip (see Methods). (E) Boxplot of kcat for H-Bond LID ADKs that have a proline (green) or different amino acid (tan) at position 128. t(36)=−3.15, p=0.003. (F) Barplot of log2 fold-change in kcat for Xaa→Pro mutations in selected ADKs with either an H-Bond or Zn2+ LID, organized by a phylogenetic tree (plotted with arbitrary branch lengths). Shortened organism names are used for brevity; full names are provided in Table S1.

To identify mutations that influence ADK conformational dynamics throughout evolution, we turned to orthologs with high sequence similarity but large differences in catalytic rates. We focused on ADKs from V. cholerae (vcADK) and P. profundum (ppADK), as they exhibited the largest ratio of fold-change in kcat to edit-distance–a >50-fold difference in activity over 23 sequence differences–indicating the steepest activity “cliff” in our landscape (Fig. 4A, Fig. 6C). We hypothesized that functionally-relevant sequence positions would be in the LID. There are two key residue differences in the LID of ppADK and vcADK: Ala/Pro at position 128 and Asn/Val at position 135 (Fig. 6C). Indeed, the double mutation Ala128Pro/Asn135Val increases ppADK kcat by 21%, while the reverse mutation, Pro128Ala/Val135Asn, decreases vcADK kcat by 11% (Fig. 6D, Methods). An additional CORE mutation, Ala22Glu, was also selected because its proximity to the LID led us to hypothesize that the increased bulkiness and charge of the Glu side chain could destabilize the enzyme’s closed state by electrostatic repulsion (Fig. 6C). The Ala22Glu mutation increased ppADK kcat by 28%, while the reverse Glu22Ala mutation decreased activity in vcADK by a reciprocal 29% (Fig. 6D). Together, the Ala22Glu/Pro128Ala/Val135Asn mutations account for 49% of the activity difference between ppADK and vcADK (Fig. 6D).

Next, we assessed the generality of these activation mechanisms across the ADK landscape. Pro128 is frequently found in ADKs and shows a statistically significant association with high kcat across diverse ADK sequences (Fig. 6E). We selected 20 H-Bond and Zn2+-binding ADKs and either mutated the residue at position 128 (Xaa) to proline or, in cases where position 128 was already proline, we mutated P128 to alanine. In the Xaa-to-Pro mutations, we primarily observed activating effects in ADKs with H-Bond lids (Fig. 6F, Fig. S23). In contrast, mutational effects at position 22 do not generalize to other orthologs (Fig. S24). Thus, conformational tuning can regulate ADKs separated by billions of years of evolution, whether through mutations or small molecule solutes.

PLMs organize ADK sequence space by structure but not by catalytic activity

We next explored the landscape of our naturally occurring ADK sequences as learned by the protein language model ESM-2 (7) to evaluate the sequence-catalysis relationship of PLMs. We fed representative ADK sequences from sequence databases (~5,000 sequences, see Methods) into the pre-trained 650-million-parameter ESM-2 model to obtain fixed-length embeddings for each ADK ortholog after mean-pooling (Methods). The continuous nature of these embeddings allows us to visualize the traditional concept of landscape using dimensionality reduction techniques like UMAP (76). We observed that the ADKs measured in our library broadly cover the landscape generated from ADK ESM-2 embeddings, with the various lid types forming distinct visual clusters (Fig. 7A). This structural organization aligns with our MST-derived sequence-catalysis landscape, with distinct lid-type neighborhoods (Fig. 4A). Consistent with our MSA, H-Bond ADKs had the highest internal sequence identity among the major lid types, and occupy a narrower distribution in this dimensionality reduction (Fig. 7A, Fig. S21). Nevertheless, UMAP can distort distances between true clusters during dimensionality reduction (77), so we sought to further characterize the organization of ADK ESM-2 embeddings using quantitative metrics. To quantify the lid-type organization, we hierarchically clustered the embeddings based on Euclidean distance (Fig. S25) and computed the adjusted mutual information score (AMI) by lid-type label (Fig. 7B). We found that while ESM-2 embedding clusters exhibit higher lid-type AMI than random, they retain less information than a one-hot encoded MSA (One-Hot MSA) clustered by Hamming distance (Fig. 7C, Fig. S26).

Figure 7. Evaluating the structural and catalytic organization of ADK sequence space learned by a protein language model.

Figure 7.

(A) UMAP of ADK ESM-2 embeddings, with all representative ADK sequences (~5,000, Methods) plotted (gray). Sequences with measured catalytic parameters are encoded with color by lid type. (B) AMI reflects the agreement of a clustering method with respect to another label. In this case, low AMI would suggest poor clustering by Lid-type and vice versa. (C) Barplot comparing the AMI of hierarchical clustering on ESM-2 embeddings or a one-hot encoded MSA using Euclidean and hamming distance, respectively, with respect to lid type. Perfect AMI (1.0) is plotted as a red dashed line, and random AMI (0.0) is plotted as a gray dashed line. (D) UMAP of ADK ESM-2 embeddings with kcat values encoded by color. Grey points represent all representative ADK sequences (~5,000, Methods). (E) Trustworthiness quantifies the level of organization retained between two embeddings: in this case, it quantifies how similar neighboring sequences in ESM-2 space are in “kcat space”. (F) Grouped barplot of trustworthiness of ESM-2 ADK embeddings (by Euclidean distance) and one-hot encoded MSA (by hamming distance) with respect to kcat computed at k=5 neighbors. Trustworthiness is computed for all sequences, as well as by lid type. Perfect trustworthiness is plotted as a red dashed line (kcat vs. kcat) and random as a gray dashed line (kcat vs. an average of 30 shuffles of corresponding embeddings).

We next investigated whether ESM-2 had instead learned a representation that reflects similarity in enzymatic activity. The ESM-2 landscape appears visually rugged when colored by kcat (Fig. 7D), and quantifying kcat organization using the continuous manifold metric of trustworthiness (Methods) reveals that both ESM-2 and One-Hot MSA exhibit essentially random organization in their nearest five neighbors with respect to kcat (0.52 and 0.50 respectively) (Fig. 7F). Consequently, naturally occurring sequences close in the ESM-2 landscape often have vastly different kcat values. Similar relationships apply to KM and kcat/KM (Fig. S27). The trustworthiness of ESM-2 embeddings relative to One-Hot MSA remains largely unchanged when the number of neighbors increases (Fig. S28). Given that ESM-2 has retained some lid-type organization, we also computed trustworthiness within lid types and found that this improves trustworthiness slightly (0.59 for Lidless and H-Bond) when not considering the global landscape (Fig. 7F). In a principal component analysis (PCA) on the ESM-2 embeddings (Fig. S29), we observed weak explained variance between the first principal component and kcat (r2 = 0.097) (Fig. S30). PC1 may reflect lid-type differences, as removing the slower Lidless ADKs (Fig. 4C) weakens the regression (Fig. S30). Thus, while ESM-2 captures some high-level structural organization of ADKs, it fails to meaningfully encode the complex relationship between sequence and catalytic activity, perhaps because the co-evolutionary relationships learned are stronger for structure than for catalysis. These results highlight the limitations of current PLMs in predicting enzyme function from naturally occurring sequences alone.

ADK kcat and KM prediction improve with increasing experimental data and supervision on top of PLMs

Supervised models built on top of pre-trained PLMs can enhance the prediction of mutational fitness effects compared to a zero-shot regime (78-80). However, fitness labels represent an aggregate of multiple underlying biochemical properties, such as catalytic activity and thermodynamic stability, and many models are specifically trained to predict the effect of a single mutation (10). Recently, there have been attempts to model sequence-catalysis relationships directly, with models like DLKcat (81) and TurNuP (82) trained on catalytic turnover rates collated from published literature in the BRENDA database (39). Our dataset of naturally occurring ADK sequences with measured kcat values provides a valuable test set to evaluate the extent to which models like DLKcat have learned the sequence-kcat relationship. When predicting kcat from our sequences and substrate information (Methods), DLKcat performs poorly (spearman rho = −0.09, Fig. S31). We speculate that this poor performance may arise from challenges inherent to the BRENDA training data, collected under many experimental conditions, requiring scaling to standard temperature and pH. Furthermore, BRENDA has suffered data consistency issues from mis-annotations (83).

Apart from dataset inconsistencies, predicting kcat for any naturally occurring enzyme sequence is inherently challenging, given the vast range of enzymatic reactions (e.g., phosphorylation, hydrolysis, oxidation) and mechanisms. Instead, we considered whether we could use our newly collected experimental data to build an ADK-specific model that would be more predictive than a large model like DLKcat. We trained classic lightweight machine learning models–Random Forest (RF), Support Vector Regressor (SVR)–on increasing sub-samples of our dataset (20, 40, 60… 140 examples) and evaluated performance on a held-out 20% of our dataset for kcat and KM (Methods). For kcat prediction, both SVR and RF outperform DLkcat with as few as 20 data points (mean RF spearman rho=0.14, mean SVR spearman rho=0.23, across 30 bootstrapped training set samplings). Performance steadily improved with increasing training data (Fig. 8A).

Figure 8. Improving kcat and KM prediction for ADK sequences with semi-supervised learning.

Figure 8.

The mean test Spearman rho for different models across 30 samplings is plotted against training dataset size for (A) kcat and (B) KM. Models include Random Forest (RF), Support Vector Regression (SVR), ProteinNPT(PNPT) (79), and Convolutional Linear Regression (CLR) (79). For embeddings, SVR used a One-hot encoded MSA, RF used ESM-2 embeddings, and CLR and PNPT used Tranception(84) embeddings. Embeddings and other model hyperparameters were selected based on aggregate (mean) performance for both kcat and KM prediction. Shaded regions represent 95% confidence intervals across 30 training/test set samplings at each dataset size (Methods). A zero-shot evaluation of the Tranception PLM (84) is plotted as a dashed orange line. DLKcat (81) performance evaluated on all 175 sequences is plotted in (A) as a dashed black line.

Having demonstrated that our newly collected experimental data enables lightweight ML models to outperform an existing deep-learning model trained on large literature databases for predicting kcat, we next evaluated state-of-the-art semi-supervised approaches that combine PLMs with our experimental labels to further improve predictions. Specifically, we trained several variants of ProteinNPT (PNPT) (79), a pseudo-generative model that learns a joint representation of protein sequences and property annotations, as well as a convolutional linear regression (CLR) baseline (79). We experimented with different underlying PLM embeddings–ESM2 (7), Tranception (84), MSA Transformer (85)–and prediction targets–kcat only, KM only, or both plus growth temperature and lid type. We also compared with models leveraging one-hot encoding of protein sequences augmented with zero-shot fitness predictions from a Potts model (Fig. S32-33) (78, 86, 87). ProteinNPT outperformed the lightweight regressors in all scenarios, particularly when leveraging Tranception embeddings, and its performance also improved when trained on a greater number of experimental labels (Fig. 8A-B). Additionally, the model variant predicting all targets simultaneously generally outperformed single-target models for kcat and KM (Fig. S34), demonstrating its ability to leverage relationships between different properties, as kcat and KM are correlated for naturally occurring ADKs (Fig. S35), a trend previously observed for mutants of ecADK (17). All models showed better performance in predicting kcat than KM, possibly due to the higher variance observed in our measured KM values compared to kcat (Fig. S8).

Conclusions

This broad exploration of naturally occurring ADK sequences reveals that high catalytic turnover has evolved multiple times across diverse structural and environmental contexts, challenging a universal activity–stability trade-off during temperature adaptation and emphasizing that structurally similar enzymes can exhibit vastly different catalytic rates. Extending this experimental approach to other enzyme families could likewise uncover new catalytic properties missed by narrow sampling, refine existing hypotheses of enzyme evolution, and provide expanded starting points for protein engineering. Combining the rich representations of protein sequences provided by deep-learning models with our custom high-throughput kinetic data yields semi-supervised models that significantly outperform existing models at predicting catalytic parameters of naturally occurring ADK sequences. Targeted sequence selection using techniques like Bayesian optimization or active learning can further enhance training efficiency, and we envision coupling many experimental assays within generative models to discover novel sequences whose activities surpass those reached by natural evolution. Overall, our findings refine evolutionary perspectives on enzyme diversification, extend models of structural and functional variability, and establish a broadly applicable framework for mapping sequence variation onto catalytic function.

Supplementary Material

1

Materials and Methods

Supplementary Text

Figs. S1 to S8

Tables S1 to S5

Acknowledgments:

We thank members of the Pinney and Keiser laboratories, Siyuan Du, Stephanie Crilly, and Tony Capra for their discussions and comments on the manuscript. We thank members of the Fordyce and Herschlag laboratories for helpful discussions and experimental advice. We acknowledge P. Suzuki for the design of the PS1.8K devices. We acknowledge Nicholas Young for the initial HT-MEK illustration modified for this manuscript. This work is associated with provisional patent application No. 63/686,435.

Funding:

This work was funded by a National Institutes of Health (NIH) grant (DP5OD033413), support from the Valhalla Foundation, and CZI grant DAF2018-191905 (DOI 10.37921/550142lkcjzw; M.J.K.) from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation (DOI 10.13039/100014989). D.F.M. was supported by a UCSF Discovery Fellowship. P.N. was supported by a Chan Zuckerberg Initiative Award (Neurodegeneration Challenge Network, CZI2018-191853). D.S.M. holds a Ben Barres Early Career Award from the Chan Zuckerberg Initiative as part of the Neurodegeneration Challenge Network (CZI2018-191853) and is supported by an NIH Transformational Research Award (TR01 1R01CA260415).

Footnotes

Competing interests: D.S.M. is an advisor for Dyno Therapeutics, Octant, Jura Bio, Tectonic Therapeutic, and Genentech and a cofounder of Seismic.

Data and materials availability: Summary tables of all measured kinetic parameters for each ADK variant are provided in the Supplementary Materials. All kinetic data generated in this study is available in a Zenodo repository (DOI:10.5281/zenodo.15022270) (88). Code used for computational analyses and training the machine-learning models described herein is available at https://github.com/pinneylab/adk_evolutionary_landscape (DOI:10.5281/zenodo.15043239) (89).

References and Notes

  • 1.Fragata I, Blanckaert A, Dias Louro MA, Liberles DA, Bank C, Evolution in the light of fitness landscape theory. Trends Ecol. Evol 34, 69–82 (2019). [DOI] [PubMed] [Google Scholar]
  • 2.de Visser JAGM, Krug J, Empirical fitness landscapes and the predictability of evolution. Nat. Rev. Genet 15, 480–490 (2014). [DOI] [PubMed] [Google Scholar]
  • 3.Conrad M, The geometry of evolution. Biosystems. 24, 61–81 (1990). [DOI] [PubMed] [Google Scholar]
  • 4.Yi X, Dean AM, Adaptive Landscapes in the Age of Synthetic Biology. Mol. Biol. Evol 36, 890–907 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.McCandlish DM, Visualizing fitness landscapes. Evolution 65, 1544–1558 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Notin P, Rollins N, Gal Y, Sander C, Marks D, Machine learning for functional protein design. Nat. Biotechnol 42, 216–228 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Smetanin N, Verkuil R, Kabeli O, Shmueli Y, Dos Santos Costa A, Fazel-Zarandi M, Sercu T, Candido S, Rives A, Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). [DOI] [PubMed] [Google Scholar]
  • 8.Wu R, Ding F, Wang R, Shen R, Zhang X, Luo S, Su C, Wu Z, Xie Q, Berger B, Ma J, Peng J, High-resolutionde novostructure prediction from primary sequence, bioRxiv (2022)p. 2022.07.21.500999. [Google Scholar]
  • 9.Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Lawrence Zitnick C, Ma J, Fergus R, Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences, bioRxiv (2020)p. 622803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Notin P, Kollasch AW, Ritter D, van Niekerk L, Paul S, Spinner H, Rollins N, Shaw A, Weitzman R, Frazer J, Dias M, Franceschi D, Orenbuch R, Gal Y, Marks DS, ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction. bioRxiv, doi: 10.1101/2023.12.07.570727 (2023). [DOI] [Google Scholar]
  • 11.Hie B, Zhong ED, Berger B, Bryson B, Learning the language of viral evolution and escape. Science 371, 284–288 (2021). [DOI] [PubMed] [Google Scholar]
  • 12.Meier J, Rao R, Verkuil R, Liu J, Sercu T, Rives A, Language models enable zero-shot prediction of the effects of mutations on protein function, bioRxiv (2021)p. 2021.07.09.450648. [Google Scholar]
  • 13.Hie BL, Yang KK, Kim PS, Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins. Cell Syst 13, 274–285.e6 (2022). [DOI] [PubMed] [Google Scholar]
  • 14.Markin CJ, Mokhtari DA, Sunden F, Appel MJ, Akiva E, Longwell SA, Sabatti C, Herschlag D, Fordyce PM, Revealing enzyme functional architecture via high-throughput microfluidic enzyme kinetics. Science 373 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kerns SJ, Agafonov RV, Cho Y-JJ, Pontiggia F, Otten R, Pachov DV, Kutter S, Phung LA, Murphy PN, Thai V, Alber T, Hagan MF, Kern D, The energy landscape of adenylate kinase during catalysis. Nat. Struct. Mol. Biol 22, 124–131 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wolf-Watz M, Thai V, Henzler-Wildman K, Hadjipavlou G, Eisenmesser EZ, Kern D, Linkage between dynamics and catalysis in a thermophilic-mesophilic enzyme pair. Nat. Struct. Mol. Biol 11, 945–949 (2004). [DOI] [PubMed] [Google Scholar]
  • 17.Ådén J, Verma A, Schug A, Wolf-Watz M, Modulation of a pre-existing conformational equilibrium tunes adenylate kinase activity. J. Am. Chem. Soc 134, 16562–16570 (2012). [DOI] [PubMed] [Google Scholar]
  • 18.Hanson JA, Duderstadt K, Watkins LP, Bhattacharyya S, Brokaw J, Chu J-W, Yang H, Illuminating the mechanistic roles of enzyme conformational dynamics. Proc. Natl. Acad. Sci. U. S. A 104, 18055–18060 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Nguyen V, Wilson C, Hoemberger M, Stiller JB, Agafonov RV, Kutter S, English J, Theobald DL, Kern D, Evolutionary drivers of thermoadaptation in enzyme catalysis. Science 355, 289–294 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bae E, Phillips GN Jr, Roles of static and dynamic domains in stability and catalysis of adenylate kinase. Proc. Natl. Acad. Sci. U. S. A 103, 2132–2137 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Moon S, Jung D, Phillips GN, Bae E, An integrated approach for thermal stabilization of a mesophilic adenylate kinase. Proteins: Struct. Funct. Bioinf 82, 1947–1959 (2014). [DOI] [PubMed] [Google Scholar]
  • 22.Couñago R, Wilson CJ, Peña MI, Wittung-Stafshede P, Shamoo Y, An adaptive mutation in adenylate kinase that increases organismal fitness is linked to stability–activity trade-offs. Protein Eng. Des. Sel 21, 19–27 (2008). [DOI] [PubMed] [Google Scholar]
  • 23.Moon S, Bannen RM, Rutkoski TJ, Phillips GN, Bae E, Effectiveness and limitations of local structural entropy optimization in the thermal stabilization of mesophilic and thermophilic adenylate kinases. Proteins: Struct. Funct. Bioinf 82, 2631–2642 (2014). [DOI] [PubMed] [Google Scholar]
  • 24.Ojeda-May P, Mushtaq AU, Rogne P, Verma A, Ovchinnikov V, Grundström C, Dulko-Smith B, Sauer UH, Wolf-Watz M, Nam K, Dynamic Connection between Enzymatic Catalysis and Collective Protein Motions. Biochemistry 60, 2246–2258 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D, Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Saavedra HG, Wrabl JO, Anderson JA, Li J, Hilser VJ, Dynamic allostery can drive cold adaptation in enzymes. Nature 558, 324–328 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bar-Even A, Noor E, Savir Y, Liebermeister W, Davidi D, Tawfik DS, Milo R, The Moderately Efficient Enzyme: Evolutionary and Physicochemical Trends Shaping Enzyme Parameters. Biochemistry 50, 4402–4410 (2011). [DOI] [PubMed] [Google Scholar]
  • 28.Keck F, Rimet F, Bouchez A, Franc A, phylosignal: an R package to measure, test, and explore the phylogenetic signal. Ecol. Evol 6, 2774–2780 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Siddiqui KS, Cavicchioli R, Cold-adapted enzymes. Annu. Rev. Biochem 75, 403–433 (2006). [DOI] [PubMed] [Google Scholar]
  • 30.Wang W, Swartz J, Rational engineering of enzymes for enhanced cold activity. ACS Catal. 14, 12518–12528 (2024). [Google Scholar]
  • 31.Georlette D, Blaise V, Collins T, D’Amico S, Gratia E, Hoyoux A, Marx J-C, Sonan G, Feller G, Gerday C, Some like it cold: biocatalysis at low temperatures. FEMS Microbiol. Rev 28, 25–42 (2004). [DOI] [PubMed] [Google Scholar]
  • 32.Feller G, Gerday C, Psychrophilic enzymes: hot topics in cold adaptation. Nat. Rev. Microbiol 1, 200–208 (2003). [DOI] [PubMed] [Google Scholar]
  • 33.Elias M, Wieczorek G, Rosenne S, Tawfik DS, The universality of enzymatic rate-temperature dependency. Trends Biochem. Sci 39, 1–7 (2014). [DOI] [PubMed] [Google Scholar]
  • 34.Fields PA, Dong Y, Meng X, Somero GN, Adaptations of protein structure and function to temperature: there is more than one way to “skin a cat”. J. Exp. Biol 218, 1801–1811 (2015). [DOI] [PubMed] [Google Scholar]
  • 35.Závodszky P, Kardos J, Svingor GA Petsko, Adjustment of conformational flexibility is a key event in the thermal adaptation of proteins. Proc. Natl. Acad. Sci. U. S. A 95, 7406–7411 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Jaenicke R, Protein stability and molecular adaptation to extreme conditions. Eur. J. Biochem 202, 715–728 (1991). [DOI] [PubMed] [Google Scholar]
  • 37.D’Amico S, Claverie P, Collins T, Georlette D, Gratia E, Hoyoux A, Meuwis M-A, Feller G, Gerday C, Molecular basis of cold adaptation. Philos. Trans. R. Soc. Lond. B Biol. Sci 357, 917–925 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Jaenicke R, Do ultrastable proteins from hyperthermophiles have high or low conformational rigidity? Proc. Natl. Acad. Sci. U. S. A 97, 2962–2964 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Chang A, Jeske L, Ulbrich S, Hofmann J, Koblitz J, Schomburg I, Neumann-Schaal M, Jahn D, Schomburg D, BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Res. 49, D498–D508 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Stark C, Bautista-Leung T, Siegfried J, Herschlag D, Systematic investigation of the link between enzyme catalysis and cold adaptation. Elife 11 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Fields PA, Somero GN, Hot spots in cold adaptation: localized increases in conformational flexibility in lactate dehydrogenase A4 orthologs of Antarctic notothenioid fishes. Proc. Natl. Acad. Sci. U. S. A 95, 11476–11481 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Vieille C, Zeikus GJ, Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability. Microbiol. Mol. Biol. Rev 65, 1–43 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Pinney MM, Mokhtari DA, Akiva E, Yabukarski F, Sanchez DM, Liang R, Doukov T, Martinez TJ, Babbitt PC, Herschlag D, Parallel molecular mechanisms for enzyme temperature adaptation. Science 371 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Jarzab A, Kurzawa N, Hopf T, Moerch M, Zecha J, Leijten N, Bian Y, Musiol E, Maschberger M, Stoehr G, Becher I, Daly C, Samaras P, Mergner J, Spanier B, Angelov A, Werner T, Bantscheff M, Wilhelm M, Klingenspor M, Lemeer S, Liebl W, Hahne H, Savitski MM, Kuster B, Meltome atlas-thermal proteome stability across the tree of life. Nat. Methods 17, 495–503 (2020). [DOI] [PubMed] [Google Scholar]
  • 45.Engqvist MKM, Correlating enzyme annotations with a large set of microbial growth temperatures reveals metabolic adaptations to growth at diverse temperatures. BMC Microbiol. 18, 177 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Bae E, Phillips GN Jr, Structures and analysis of highly homologous psychrophilic, mesophilic, and thermophilic adenylate kinases. J. Biol. Chem 279, 28202–28208 (2004). [DOI] [PubMed] [Google Scholar]
  • 47.Schober I, Koblitz J, Sardà Carbasse J, Ebeling C, Schmidt ML, Podstawka A, Gupta R, Ilangovan V, Chamanara J, Overmann J, Reimer LC, BacDive in 2025: the core database for prokaryotic strain data. Nucleic Acids Res. 53, D748–D756 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Bains W, Xiao Y, Yu C, Prediction of the maximum temperature for life based on the stability of metabolites to decomposition in water. Life (Basel) 5, 1054–1100 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Daily MD, Phillips GN Jr, Cui Q, Interconversion of functional motions between mesophilic and thermophilic adenylate kinases. PLoS Comput. Biol 7, e1002103 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Moon S, Kim J, Bae E, Structural analyses of adenylate kinases from Antarctic and tropical fishes for understanding cold adaptation of enzymes. Sci. Rep 7, 16027 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Romero PA, Arnold FH, Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol 10, 866–876 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Arnold FH, Advances in protein chemistry. Introduction. Adv. Protein Chem 55, ix–xi (2000). [DOI] [PubMed] [Google Scholar]
  • 53.Rundqvist L, Adén J, Sparrman T, Wallgren M, Olsson U, Wolf-Watz M, Noncooperative folding of subdomains in adenylate kinase. Biochemistry 48, 1911–1927 (2009). [DOI] [PubMed] [Google Scholar]
  • 54.Fields PA, Review: Protein function at thermal extremes: balancing stability and flexibility. Comp. Biochem. Physiol. A Mol. Integr. Physiol 129, 417–431 (2001). [DOI] [PubMed] [Google Scholar]
  • 55.Schulz GE, Schiltz E, Tomasselli AG, Frank R, Brune M, Wittinghofer A, Schirmer RH, Structural relationships in the adenylate kinase family. Eur. J. Biochem 161, 127–132 (1986). [DOI] [PubMed] [Google Scholar]
  • 56.Munier-Lehmann H, Burlacu-Miron S, Craescu CT, Mantsch HH, Schultz CP, A new subfamily of short bacterial adenylate kinases with the Mycobacteriumtuberculosis enzyme as a model: A predictive and experimental study. Proteins: Struct. Funct. Bioinf 36, 238–248 (1999). [PubMed] [Google Scholar]
  • 57.Schrank TP, Wrabl JO, Hilser VJ, Conformational heterogeneity within the LID domain mediates substrate binding to Escherichia coli adenylate kinase: function follows fluctuations. Top. Curr. Chem 337, 95–121 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Perrier V, Burlacu-Miron S, Bourgeois S, Surewicz WK, Gilles A-M, Genetically Engineered Zinc-chelating Adenylate Kinase fromEscherichia coli with Enhanced Thermal Stability *. J. Biol. Chem 273, 19097–19101 (1998). [DOI] [PubMed] [Google Scholar]
  • 59.Bruland KW, Oceanographic distributions of cadmium, zinc, nickel, and copper in the North Pacific. Earth Planet. Sci. Lett 47, 176–198 (1980). [Google Scholar]
  • 60.Porebski BT, Buckle AM, Consensus protein design. Protein Eng. Des. Sel 29, 245–251 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Sternke M, Tripp KW, Barrick D, Consensus sequence design as a general strategy to create hyperstable, biologically active proteins. Proc. Natl. Acad. Sci. U. S. A 116, 11275–11284 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Aerts D, Verhaeghe T, Joosten H-J, Vriend G, Soetaert W, Desmet T, Consensus engineering of sucrose phosphorylase: the outcome reflects the sequence input: Consensus Engineering of Enzymes. Biotechnol. Bioeng 110, 2563–2572 (2013). [DOI] [PubMed] [Google Scholar]
  • 63.Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF, Pasamontes L, van Loon APGM, Wyss M, The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Eng. 15, 403–411 (2002). [DOI] [PubMed] [Google Scholar]
  • 64.Polizzi KM, Chaparro-Riggers JF, Vazquez-Figueroa E, Bommarius AS, Structure-guided consensus approach to create a more thermostable penicillin G acylase. Biotechnol. J 1, 531–536 (2006). [DOI] [PubMed] [Google Scholar]
  • 65.Lunzer M, Golding GB, Dean AM, Pervasive cryptic epistasis in molecular evolution. PLoS Genet. 6, e1001162 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Yang G, Anderson DW, Baier F, Dohmen E, Hong N, Carr PD, Kamerlin SCL, Jackson CJ, Bornberg-Bauer E, Tokuriki N, Higher-order epistasis shapes the fitness landscape of a xenobiotic-degrading enzyme. Nat. Chem. Biol 15, 1120–1128 (2019). [DOI] [PubMed] [Google Scholar]
  • 67.Weinreich DM, Delaney NF, Depristo MA, Hartl DL, Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006). [DOI] [PubMed] [Google Scholar]
  • 68.Acevedo-Rocha CG, Li A, D’Amore L, Hoebenreich S, Sanchis J, Lubrano P, Ferla MP, Garcia-Borràs M, Osuna S, Reetz MT, Pervasive cooperative mutational effects on multiple catalytic enzyme traits emerge via long-range conformational dynamics. Nat. Commun 12, 1621 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Breen MS, Kemena C, Vlasov PK, Notredame C, Kondrashov FA, Epistasis as the primary factor in molecular evolution. Nature 490, 535–538 (2012). [DOI] [PubMed] [Google Scholar]
  • 70.Olsson U, Wolf-Watz M, Overlap between folding and functional energy landscapes for adenylate kinase conformational change. Nat. Commun 1, 111 (2010). [DOI] [PubMed] [Google Scholar]
  • 71.Rogne P, Wolf-Watz M, Urea-dependent adenylate kinase activation following redistribution of structural states. Biophys. J 111, 1385–1395 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Schrank TP, Bolen DW, Hilser VJ, Rational modulation of conformational fluctuations in adenylate kinase reveals a local unfolding mechanism for allostery and functional adaptation in proteins. Proc. Natl. Acad. Sci. U. S. A 106, 16984–16989 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Zhang HJ, Sheng XR, Pan XM, Zhou JM, Activation of adenylate kinase by denaturants is due to the increasing conformational flexibility at its active sites. Biochem. Biophys. Res. Commun 238, 382–386 (1997). [DOI] [PubMed] [Google Scholar]
  • 74.Kujo C, Ohshima T, Enzymological characteristics of the hyperthermostable NAD-dependent glutamate dehydrogenase from the archaeon Pyrobaculum islandicum and effects of denaturants and organic solvents. Appl. Environ. Microbiol 64, 2152–2157 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Wilquet V, Gaspar JA, van de Lande M, Van de Casteele M, Legrain C, Meiering EM, Glansdorff N, Purification and characterization of recombinant Thermotoga maritima dihydrofolate reductase. Eur. J. Biochem 255, 628–637 (1998). [DOI] [PubMed] [Google Scholar]
  • 76.McInnes L, Healy J, Melville J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, arXiv [stat.ML] (2018). http://arxiv.org/abs/1802.03426. [Google Scholar]
  • 77.Chari T, Pachter L, The specious art of single-cell genomics. PLoS Comput. Biol 19, e1011288 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Hsu C, Nisonoff H, Fannjiang C, Listgarten J, Learning protein fitness models from evolutionary and assay-labeled data. Nat. Biotechnol, 1–9 (2022). [DOI] [PubMed] [Google Scholar]
  • 79.Notin P, Weitzman R, Marks DS, Gal Y, ProteinNPT: Improving protein property prediction and design with non-parametric transformers. bioRxivorg, 2023.12.06.570473 (2023). [Google Scholar]
  • 80.Biswas S, Khimulya G, Alley EC, Esvelt KM, Church GM, Low-N protein engineering with data-efficient deep learning. Nat. Methods 18, 389–396 (2021). [DOI] [PubMed] [Google Scholar]
  • 81.Li F, Yuan L, Lu H, Li G, Chen Y, Engqvist MKM, Kerkhoven EJ, Nielsen J, Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction. Nature Catalysis, 1–11 (2022). [Google Scholar]
  • 82.Kroll A, Rousset Y, Hu X-P, Liebrand NA, Lercher MJ, Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning. Nat. Commun 14, 4139 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Rembeza E, Engqvist MKM, Experimental and computational investigation of enzyme functional annotations uncovers misannotation in the EC 1.1.3.15 enzyme class. PLoS Comput. Biol 17, e1009446 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Notin P, Dias M, Frazer J, Marchena-Hurtado J, Gomez A, Marks DS, Gal Y, Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval, arXiv [cs.LG] (2022). http://arxiv.org/abs/2205.13760. [Google Scholar]
  • 85.Rao RM, Liu J, Verkuil R, Meier J, Canny J, Abbeel P, Sercu T, Rives A, “MSA Transformer” in Proceedings of the 38th International Conference on Machine Learning, Meila M, Zhang T, Eds. (PMLR, 18--24 Jul 2021; https://proceedings.mlr.press/v139/rao21a.html)vol. 139 of Proceedings of Machine Learning Research, pp. 8844–8856. [Google Scholar]
  • 86.Hopf TA, Green AG, Schubert B, Mersmann S, Schärfe CPI, Ingraham JB, Toth-Petroczy A, Brock K, Riesselman AJ, Palmedo P, Kang C, Sheridan R, Draizen EJ, Dallago C, Sander C, Marks DS, The EVcouplings Python framework for coevolutionary sequence analysis. Bioinformatics 35, 1582–1584 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Hopf TA, Ingraham JB, Poelwijk FJ, Schärfe CPI, Springer M, Sander C, Marks DS, Mutation effects predicted from sequence co-variation. Nat. Biotechnol 35, 128–135 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Muir D, Pinney M, Evolutionary-scale enzymology enables exploration of a rugged catalytic landscape, Zenodo; (2025); 10.5281/ZENODO.15022270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Muir D, Notin P, Pinneylab/adk_evolutionary_landscape: v.1.0.0 (Zenodo, 2025; 10.5281/ZENODO.15043239). [DOI] [Google Scholar]
  • 90.Parks DH, Chuvochina M, Chaumeil P-A, Rinke C, Mussig AJ, Hugenholtz P, A complete domain-to-species taxonomy for Bacteria and Archaea. Nat. Biotechnol 38, 1079–1086 (2020). [DOI] [PubMed] [Google Scholar]
  • 91.Berry MB, Phillips GN Jr, Crystal structures of Bacillus stearothermophilus adenylate kinase with bound Ap5A, Mg2+ Ap5A, and Mn2+ Ap5A reveal an intermediate lid position and six coordinate octahedral geometry for bound Mg2+ and Mn2+. Proteins 32, 276–288 (1998). [DOI] [PubMed] [Google Scholar]
  • 92.Müller CW, Schulz GE, Structure of the complex between adenylate kinase from Escherichia coli and the inhibitor Ap5A refined at 1.9 Å resolution: A model for a catalytic transition state. J. Mol. Biol 224, 159–177 (1992). [DOI] [PubMed] [Google Scholar]
  • 93.Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung C-C, O’Neill M, Reiman D, Tunyasuvunakool K, Wu Z, Žemgulytė A, Arvaniti E, Beattie C, Bertolli O, Bridgland A, Cherepanov A, Congreve M, Cowen-Rivers AI, Cowie A, Figurnov M, Fuchs FB, Gladman H, Jain R, Khan YA, Low CMR, Perlin K, Potapenko A, Savy P, Singh S, Stecula A, Thillaisundaram A, Tong C, Yakneen S, Zhong ED, Zielinski M, Žídek A, Bapst V, Kohli P, Jaderberg M, Hassabis D, Jumper JM, Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Le DD, Shimko TC, Aditham AK, Keys AM, Longwell SA, Orenstein Y, Fordyce PM, Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding. Proc. Natl. Acad. Sci. U. S. A 115, E3702–E3711 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Fordyce PM, Pincus D, Kimmig P, Nelson CS, El-Samad H, Walter P, DeRisi JL, Basic leucine zipper transcription factor Hac1 binds DNA in two distinct modes as revealed by microfluidic analyses. Proc. Natl. Acad. Sci. U. S. A 109, E3084–93 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Fordyce PM, Hastings R, Aditham A, DelRosso N, Suzuki P, High-throughput thermodynamic and kinetic measurements of transcription factor/DNA mutations reveal how conformational heterogeneity can shape motif selectivity, bioRxiv (2023)p. 2023.11.13.566946. [Google Scholar]
  • 97.Stiller JB, Kerns SJ, Hoemberger M, Cho Y-J, Otten R, Hagan MF, Kern D, Probing the transition state in enzyme catalysis by high-pressure NMR dynamics. Nature Catalysis, 1–9 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Letunic I, Bork P, Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res., doi: 10.1093/nar/gkae268 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M, ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Tamura K, Stecher G, Kumar S, MEGA11: Molecular Evolutionary Genetics Analysis version 11. Mol. Biol. Evol 38, 3022–3027 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES