Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2022 Nov 16;61(50):e202202711. doi: 10.1002/anie.202202711

Design of a Protein with Improved Thermal Stability by an Evolution‐Based Generative Model

Pengfei Tian 1,6,, Adrien Lemaire 2, Fabien Sénéchal 2, Olivier Habrylo 2, Viviane Antonietti 3, Pascal Sonnet 3, Valérie Lefebvre 2, Frederikke Isa Marin 4, Robert B Best 1, Jérôme Pelloux 2,, Davide Mercadante 5,
PMCID: PMC10098751  PMID: 36259321

Abstract

Efficient design of functional proteins with higher thermal stability remains challenging especially for highly diverse sequence variants. Considering the evolutionary pressure on protein folds, sequence design optimizing evolutionary fitness could help designing folds with higher stability. Using a generative evolution fitness model trained to capture variation patterns in natural sequences, we designed artificial sequences of a proteinaceous inhibitor of pectin methylesterase enzymes. These inhibitors have considerable industrial interest to avoid phase separation in fruit juice manufacturing or reduce methanol in distillates, averting chromatographic passages triggering unwanted aroma loss. Six out of seven designs with up to 30 % divergence to other inhibitor sequences are functional and two have improved thermal stability. This method can improve protein stability expanding functional protein sequence space, with traits valuable for industrial applications and scientific research.

Keywords: Coevolution, Molecular Dynamics Simulations, Monte Carlo Simulations, Potts Models, Protein Design


Protein design bears great interest in both research and industry. Evolutionary‐based methods have been promising in designing new sequences based on the concept that the evolutionary fitness of a sequence is closely related to folding stability. Here we present a computational pipeline that, thanks to a combination of sequence‐ and structure‐based methods, can design new sequences translating in functional proteins with enhanced thermal stability.

graphic file with name ANIE-61-0-g003.jpg

Introduction

The development of efficient computational approaches for protein engineering carries enormous biotechnological and medical interest. [1] Even though more and more protein structures are being solved, especially with the recent breakthrough of AlphaFold, [2] systematically improving protein stability through protein engineering remains a big challenge. However, the proteins which have been experimentally characterized as stable and functional are only an extremely tiny subset of those which are theoretically predicted to be stably folded, suggesting rich potential for design. [3] The design and discovery of thermostable protein structures purely based on physics principles have made remarkable progress thanks to advanced techniques such as FoldX [4] and Rosetta. [5] However, predicting the stability and function of sequences with high divergence is still a challenging task, [6] often with several rounds of design iterations having to be applied. [7]

Multi‐residue mutational patterns within sets of protein homologous sequences provide important clues of biophysical restraints, useful to maintain the 3D structure and function of proteins. Previously, evolution‐based generation of new sequences has most commonly been obtained by either randomly recombining fragments of proteins homologous to a target design [8] or via consensus sequence design, [9] in which sequences are generated using information concerning the probability of finding certain residues along a sequence aligned with multiple homologous. Both these approaches ignore global statistical dependency between residues, deemed crucial for protein fitness. [10] Recent studies showed that a statistical model can capture the complex pattern of variants between positions by accounting for both residue conservation and covariation. [11] These kinds of models are very powerful for predicting residues which are close to each other on the 3D structure and for reconstructing epistatic effects between residues. A few deep learning methods have also been devised to model protein fitness landscapes e.g. variational autoencoders (VAEs), [12] generative adversarial networks [13] and autoregressive models, [14] and they can be used to design proteins with nature‐like functions.[ 14a , 15 ] However, the application of these generative models to guide the design of highly divergent sequences with enhanced stability while remaining functional, has not been demonstrated.

The rapidly expanding number of design algorithms has also prompted the development of statistical frameworks able to meaningfully score algorithm effectiveness within the purpose of increasing protein fitness. [16] In this work, we build the statistical model of protein fitness landscape using a Potts‐type model, which has been shown to be better at capturing higher‐order mutational statistics than other methods such as VAEs. [17] Using a similar approach, we have previously successfully designed sequences leading to three different small protein folds (GA, GB and SH3), all featuring 56 amino acids. [15b] However, none of the designed variants had improved stability over the wild type. It has also remained unclear if this method could be used to design functional proteins with improved stability of larger size, since the number of foldable sequences rapidly becomes a much smaller fraction of the total number of possible sequences with increasing protein length. [3]

In this work, we address this question by exploring the high fitness sequence space of a pectin methylesterase enzymes inhibitor (PMEI) from Actinidia chinensis (kiwifruit), here dubbed PMEI1xg2, which is 150 amino acids in length. PMEI1xg2 has been previously co‐crystallized bound to a pectin methylesterase enzyme (PME) from Solanum lycopersicum (tomato), in a 1 : 1 complex (PDB accession code: 1xg2, shown in Figure 1a), in which the inhibitor engages three α‐helices of its four‐helix bundle, to obstruct the substrate's binding groove of the target PME (Figure 1a). [18]

Figure 1.

Figure 1

Complex between PME and PMEI and UMAP analysis. a) Structure of the pectin methylesterase inhibitor (PMEI) from Actinidia chinensis bound to a pectin methylesterase (PME) from Solanum lycopersicum (PDB accession code: 1xg2). The protein structures are colored in grey and green for PME and PMEI respectively. PMEI residues sitting 0.6 nm from the PME binding interface are colored in pink. b) Structure of the PMEI with the predicted co‐evolving residues linked by purple lines. c) Homologous sequences (blue) of PMEI1xg2 are projected onto two dimensions by UMAP analysis. [28] Designed variants 1–4 and variants 5–7 are mapped as pink and yellow, respectively. Green and purple points are representative artificial sequences extracted from Monte Carlo sampling trajectories (Figure 2).

PMEIs have been identified in many different species of plants, where they finely tune the activity of PMEs by direct competition with the enzymes’ substrate. [19] PMEs hydrolyze methyl esterified galacturonic acids in homogalacturonan (HG)‐type pectins: the major polysaccharide component of the plant cell wall. [20] HGs are synthetized as highly methyl esterified polymers (i.e. >80 %) and their subsequent de‐methyl esterification can lead to distinct physico‐chemical properties of the plant cell wall. [21] Variations in the expression levels of PMEI genes are known to trigger dramatic changes in the chemistry of pectin with important effects on plant development. [22] In plants, PMEIs are encoded by large multigenic families, which questions the specificity of their interaction with different PME isoforms and the existence of specific PME‐PMEI pairs.[ 19 , 23 ] A PMEI from one species can interact with PMEs from other species, as shown in the co‐crystallized PME:PMEI complex featuring a tomato PME in complex with a PMEI from kiwifruit. [18] This high degree of promiscuity provides the opportunity to particularly explore the effect of mutations at the binding interface versus the rest of a PMEI structure.

Hence, new designs of PMEIs can be used to both understand the plant life cycle as tools to probe plant‐cell wall physiology, or as industrial tools during the production of fruit juices that are rich in pectins, [24] or to control the quantity of methanol produced by PMEs in processes like distillation. In the case of grape distillates, a second distillation featuring a de‐methylating chromatographic passage is indeed needed to achieve methanol concentrations below the legal limit but it is also responsible for the loss of aroma compounds, ultimately downgrading the quality of the final product. [25] Although PMEIs hold a considerable interest for both industrial applications and a deeper understanding of plant physiology, to the best of our knowledge there has not been any attempt to engineer PMEIs before, but only to optimize molecular biology protocols and use the naturally expressed PMEI from kiwifruit (PME1xg2) to target methanol reduction in distillates.[ 24 , 25 ] Eventually, the design of some PMEIs bearing a higher structural stability or specificity profiles, can be beneficial to both industry and research, with the latter that can adopt designed PMEIs to probe the homeostasis of plant cell wall polysaccharides and their interplay with different PME isoforms. The ability to understand plant cell wall physiology is indeed particularly restrained by the combinatorial explosion of the PME:PMEI pairs arising from the tens of isoforms expressed in plants and their binding promiscuity.

Results and Discussion

Sequence Design of PMEI Variants

The statistical model that we use for the design of PMEIs’ artificial sequences employs two sets of parameters: amino acid conservation (b) and higher‐order inter‐residue covariation (W), which describe the probability of finding that a certain protein sequence (X1,X2,···,XL ) belongs to the PMEI family (blue in Figure 1c) based on the following pattern of sequence variation:

PX1,X2,···,XL=1Zexpi<jWijXi,Xj+ibXi (1)

where i and j correspond to different residue positions, and L is the total length of the protein. b and W can be obtained by the pseudo‐likelihoods method (Figure 2a), [26] or using Boltzmann machines. [27] As illustrated in Figure 2b, residues with the strongest covariance signal are proximal in the 3D structure, suggesting that the model is effectively able to capture the pattern of the homologous sequences (see Supporting Information for details). A likelihood estimator is then used to calculate the likelihood, or probability (fitness score), of a sequence Px , which is then linked to an evolutionary Hamiltonian energy EEH , as:

EEHx=-lnPx (2)

Figure 2.

Figure 2

Workflow for the analysis and design of PMEI variants. a) The workflow adopted in this study for the computational sequence design. b) Native contacts extracted from the pdb structure (gray) versus the 40 top‐ranking contacts predicted from coevolutionary information (red—shown in Figure 1b as purple lines, see Supporting Information for details). The predicted coevolutionary contacts are shown in red in Figure 1a. c) Representative simulated annealing Monte Carlo simulation trajectories sampling the sequence space. For the two different design cases: with and without binding interface constraints, the trajectories of energy are colored in magenta and green respectively. The percentages of sequence identity to the wild type for these two designs are colored in yellow and blue respectively. d) The sequence entropy, related to the number of sequences estimated to have lower EEH of the WT (see the paragraph on sequence entropy in the Supporting Information for detail), is plotted along the co‐evolutionary energy as the solid black line. The energy of variants 1–4 (binding interface restricted from mutating) and variants 5–7 (everything allowed to mutate) are colored in pink and yellow respectively. Histogram of the evolutionary energy of the natural sequences, with the green dotted line representing the evolutionary energy threshold at the value of the WT PMEI1xg2.

Simulated annealing Monte Carlo simulations (Figure 2c) are employed to generate sequences with high‐order mutational statistics resembling the natural sequences of the PMEI protein family (Figure 2d). Sequence generation was achieved by randomly mutating a single amino acid at a time, with the mutation accepted or rejected based on the Metropolis criterion (see Supporting Information for details). Sequences featuring an EEH lower than the starting sequence possibly translate into a higher fitness than the natural sequence, as suggested in the green region of Figure 2d showing that there are many sequences with lower predicted energy than the WT. The mutational landscape is therefore explored by keeping near the minimum described by the EEH .

To challenge the robustness of the fitness model, we selected sequences with high divergence from the wild type (WT). The fraction of sequence identity between designed and natural sequences is kept lower than ≈70 % (Table 1) by adding a penalty term proportional to the similarity of each generated sequence to that of the wild‐type (WT) PMEI1xg2 (see Supporting Information for details).

Table 1.

Sequence properties of the designed proteins and activity obtained from differential scanning fluorimetry experiments and biochemical characterization.

Mutants/WT

IDwt [%][a]

IDns [%][b]

EEH (kBT )[d]

T m [°C][e]

Activity

WT

−341

45.5

Yes (IC50≈7.5 ng μL−1)

Variant 1

67

67

−351

Yes

Variant 2

69

69

−367

Yes

Variant 3

66

66

−368

49.1

Yes (IC50≈17.5 ng μL−1)

Variant 4

70

70

−361

54.7

Yes (IC50≈12.5 ng μL−1)

Variant 5

23

75

−456

Yes (IC50>3 μg/μL)

Variant 6

24

79

−454

Yes

Variant 7

23

77

−432

No

[a] Identity to the wild type. [b] Identity to closest natural sequence from the training data set [c] Evolutionary energy, [d] folding temperature.

To disentangle the interplay between PMEI fitness and functional behavior we envisioned two sequence designs: one where the binding interface (pink in Figure 1a) was not mutated within a radius of 0.6 nm from the interface of the binding PME, and one where every residue was allowed to mutate.

For the chosen PMEI (PMEI1xg2), we found a large number of generated sequences to have lower evolutionary energy than the wild‐type, suggesting that they may be more stable, in agreement with our earlier work on different protein folds. [3] Before a full experimental characterization however, we estimated the expected thermal stability of the designed sequences, by obtaining an estimate of their unfolding time from molecular dynamics (MD) simulations. [15b]

The role of MD simulations in the designed pipeline is to filter the range of sequences identified in the first step, to ensure that the experimental characterization can be focused on the most promising candidates, selected, in this case, for their ability to preserve structural stability in high‐temperature MD simulations. The initial sequence‐based design indeed explores a sequence space that is too large for experimental characterization. Within this space, we selected 99 sequences that had the potential to be more stable. MD simulations suggested that 53.5 % of those sequences would be potentially more stable than the WT based on the estimation of their likelihood to unfold (Figure S1), which was used as a criterion to quantitatively assess the designs’ structural robustness at high temperature (see Supporting Information for details). A relation between co‐evolution of sequences and thermodynamic and kinetic folding stability has indeed been pinpointed for specific proteins [29] or entire proteomes. [30]

Although MD simulations were able to filter out approximately half of the variants proposed by the sequence‐based design, the absolute numbers of sequences potentially more stable than the WT was still too large to be explored experimentally. We thus decided to select, among those, the sequences showing the highest mean unfolding time.

Overall, we chose to experimentally characterize 7 sequences (variants 1–7) out of the generated set (sequences listed in the Supporting Information Tables S1 and S2). Variants 1–4 (red in Figure 1c, and Figure 2d) were designed discarding mutations within 0.6 nm from the PME‐PMEI binding interface, whereas variants 5–7 (yellow in Figure 1c and Figure 2d) were designed without any constraint. The sequence identity between any pair of sequences among variants 1–7 is below 87 %. A large sequence space was sampled during the design of these variants as illustrated in Figure 1c, in which the sampled artificial sequences (obtained using an interval of 50 000 simulation steps) of one Monte Carlo simulation trajectory are colored in green (see Supporting Information for details).

The Experimental Assessment of the Designed PMEI Variants Shows the Retainment of Functionality and an Increased Thermal Stability

The experimental characterization of the designed variants was obtained for 6 out of 7 designs as one of the sequences could not be expressed (variant 7). For variants 1–6 we assessed fitness in terms of functional efficiency and/or increased thermal stability, when we obtained variants with enough purity to confidently estimate the T m of the designs through biophysical methods. Thus, alongside the functional assessment of the designed sequences in terms of their ability to inhibit a target PME from Citrus sinensis (orange), when able to sufficiently purify the expressed variants, we experimentally assessed the melting temperature (T m).

Biochemical assays were used to assess whether the designed variants can be considered as bona fide PMEIs. For this purpose, the selected variants were expressed in the yeast Pichia pastoris using codon‐optimized sequences (Table S1). The choice of Pichia pastoris to produce the designed amino acidic sequences (Table S2) has the advantage to achieve high extraction yields of an expressed target, in line with the need to target high yields if this method is to be used for larger scale production of functional proteins. In this expression system, heterologous proteins can indeed be directly secreted in the growth medium.

Importantly, the expression system used here does not natively encode PMEIs so that PME inhibition of the exogenously expressed PMEIs can be undoubtedly obtained through a series of dilutions of the concentrated supernatants as showed in Figure 3a. Concentrated supernatants that correspond to growth media of centrifugated cultures of 6 out of 7 designed variants, together with the WT template PMEI (PMEI1xg2), showed to be efficiently inhibiting the target PME, albeit showing different potency (Figure 3a).

Figure 3.

Figure 3

Experimental characterization of the activity and thermal stability of PMEI1xg2 and designed variants. a) Gel‐diffusion assay quantifying the inhibitory capacity of the WT and designed PMEIs secreted in the growth media in the native supernatant (SN) or denaturated supernatant (DSN) by incubating with increasing v/v ratios (1/50, 1/20, 1/5) of orange PME with WT or designed PMEIs variants. The disappearance of the halo indicates the inhibition of orange PME by the PMEIs. b) Inhibition assay of orange PME by His‐tag purified PMEI1xg2 (blue), variant 3 (yellow) and variant 4 (orange). c) Bar plot showing the IC50 of PMEI1xg2, variant 3, and variant 4, obtained from the inhibition assays shown in (b). d) Differential scanning fluorimetry (DSF) curves for PMEI1xg2 (blue), variant 3 (yellow) and variant 4 (orange). e) CD spectra obtained for PMEI1xg2, var 3 and var 4 at 30 °C (solid line) or 90 °C (dashed line).

The halo present on the gel, which reveals de‐methyl esterified pectin, is absent in every sample where the supernatant of a PMEI variant was mixed at dilutions of 1 : 5 v/v with the orange PME (see comparison of denaturated supernatant/DSN vs native supernatant/SN). However, variants 1–4 showed higher inhibiting capacity of the target when compared to variants 5 and 6, as for these variants, the inhibition of orange PME was noticed only at a 1 : 5 v/v PMEI dilution. On the other hand, for PMEI1xg2 and variants 1–4, the disappearance of the de‐methyl esterified pectin halo was already observed at a dilution of 1 : 50 v/v and was confirmed at an intermediate dilution of 1 : 20 (Figure 3a).

Encouraged by these results, we further purified PMEI1xg2, and variants 3, 4 and 5 to quantitatively compare the inhibiting capacity of each variant and to have a comparison between variants designed with restraints on mutating interface residues (variants 3 and 4) and a variant designed with every residue being allowed to mutate (variant 5). Noticeably, our attempts to purify variants 1, 2 and 6 did not give satisfactory yields and for those it was not possible to estimate thermal stability. On the other hand, while PMEI1xg2, variants 3 and 4 were purified to homogeneity via affinity chromatography, the purity of variant 5 was not satisfactory to confidently assess the T m, even if it was still suitable to profile protein function via inhibition assays.

Variants 3 and 4 effectively inhibit orange PME activity with the same potency as that of PMEI1xg2 (Figure 3b). The IC50 measured for PMEI1xg2, variant 3 and variant 4 remain in the ng μL−1 range, with the WT being only slightly more efficient (50 % inhibition of the target is achieved with ≈17.5 ng μL−1 of variant 3, ≈12.5 ng μL−1 of variant 4 and ≈7.5 ng μL−1 of PMEI1xg2, respectively) (Figure 3c). In contrast, within the same concentration range, semi‐purified variant 5 was unable to inhibit the target: with the IC50 that could not be detected within the μg μL−1 concentration range (Figure S2). Nevertheless, considering the capacity of all the characterized variants to inhibit the target in the concentrated supernatant, variant 5 can inhibit PME activity if incubated at quantities higher than μg μL−1 (Figure 3a).

The thermal stability of the expressed variants was obtained by both differential scanning fluorimetry (DSF) and circular dichroism (CD). DSF confirmed that both variants 3 and 4 have a T m higher than PMEI1xg2 with estimated values of 49.1 °C and 54.7 °C, respectively, compared to 45.5 °C of the wild‐type template (Figure 3d,e). The DSF profiles show a single thermal unfolding transition suggesting that eventual polydispersity of the sample due to glycosylated isoforms (Figure S3) are not differentially contributing to the estimated T m. At 30 °C the CD signal of both variants 3 and 4 is the one typical of a α‐helical protein, as expected from the resolved structure of the target PMEI, [18] showing that the designs retain a fold similar to the PMEI1xg2, while CD experiments performed at 90 °C show considerable loss of signal for PMEI1xg2, variants 3 and 4 (Figure 3e). Importantly, CD unfolding curves confirmed the trends uncovered by DSF, showing a higher T m for variant 3 and variant 4 compared to the WT (Figure S4).

Molecular Dynamics Simulations Pinpoint the Determinants of Increased Thermal Stability

An analysis of the number of contacts in the collected MD trajectories of the simulated designed and experimentally characterized sequences, showed that the design targeted non‐conservative mutations of charged as well as hydrophobic and polar residues (Figure 4). From an analysis of the co‐crystallized PME:PMEI complex, it can be seen that the complex is predominantly stabilized by intermolecular hydrogen bonds and four salt bridges. [18] To gain insight into how stabilization of the PMEI monomer is achieved, we analyzed the interactions sampled in the MD simulations. An analysis of the intramolecular salt bridges sampled in the simulations revealed that variants 1–6 had all an increased number of intramolecular salt bridges compared to PMEI1xg2, with variants 2 and 4 showing the highest number, in line with previous observations about the importance of salt‐bridges in contributing to protein stability. [31] No clear trend is, on the other hand, observed for hydrogen bonds and hydrophobic interactions (Figure 4a). Interestingly, mutations targeting the insertion of polar amino acids mostly concentrate at the N‐terminal end of the four‐helix bundle, the furthest away from the binding interface.

Figure 4.

Figure 4

Contacts sampled in the molecular dynamics simulations of PMEI1xg2 and the designed variants. a) Probability density function of the number of salt bridges (left panel), hydrophobic contacts (central panel) and hydrogen bonds (right panel). b) The Cα carbons showing the position of the substitutions in the modelled structures are shown as red and blue spheres for negatively and positively charged residues, respectively. Hydrophobic and polar residues substitutions are colored black and green respectively. PMEI structures are shown as ribbons. c) Number of mutations in all the designed variants for positively (blue) and negatively (red) charged, hydrophobic (black) and polar (green) amino acids. The total number of mutations in the design is shown by the magenta bar.

Non‐conservative mutations featuring the insertion of hydrophobic or charged amino acids mostly concentrate in the proximity of the four‐helix bundle core or in helices α3 and α4 that fall in proximity of the binding interface (Figure 4b). Overall, while the total number of mutations for all the variants with unmutated binding interfaces is similar, the relative changes in salt‐bridging, polar and hydrophobic residues is considerably different (Figure 4c).

Conclusion

In this work we were able to design functional PMEIs with an increased fitness measurable with a higher melting temperature. The obtained increases of up to ≈10 °C in melting temperature would be expected to significantly improve the applicability of the designs for industrial applications. The effectiveness of a design obtained by exploring the sequence space has recently been incorporated within historically structure‐based protocols, [32] with the development of tools expanding the scope of sequence search towards the understanding of protein folding. [33] Our study indicates that the fitness landscape only built on sequence information allows the design of functional and more thermostable sequences, which readily diverge from a template—a result that is much harder to achieve with purely structure‐based protocols. Since the sequence space is enormously large, MD simulations can be applied as a further step to narrow down the number of sequences that can then be experimentally tested: enhancing the success rate of the design pipeline and reducing the cost for the experimental characterization of new variants.

MD simulations are known to be a computationally expensive method as they are commonly employed to investigate molecules at a high level of detail, with solvent molecules often explicitly accounted for. Nevertheless, the large advancements in computing power and GPU‐acceleration make the scan of mean unfolding times for a large number of proposed protein variants, an easy target that, depending on computing capabilities, is an effort of a few days to a few weeks.

While sampling the stability of the designs via MD simulations depends from the available computational power, more generally, limitations deriving from the quality of the models for the designed variants are being overcome by the development of robust and evolutionary‐based structure prediction strategies, such as Alphafold. [2]

6 out 7 of the designed variants were successfully expressed and found to be active, suggesting a high success rate of our evolution‐based workflow in creating new functional sequences. Assuming that the threshold of the evolutionary energy is at the value of the WT PMEI1xg2 (vertical dotted line in Figure 2d), the total amount of PMEI‐like sequences with higher fitness (shown in the green area of Figure 2d) are estimated to be 8673 (see paragraph on estimation sequence entropy in the Supporting Information): this suggests that there are plenty of sequences to explore for improved function and stability. The high sequence similarity of the designs in this work is a result of the simulated annealing driving the sequences towards the global energy minimum (fitness maximum). We are mainly interested in the sequences with the lowest energy since they are more likely to be stable and have high fitness. To avoid sequences being too similar we imposed a maximum of 85 % sequence identity between designs. These designs only occupy very narrow sequence space in contrast to the vast space of all the possible sequences. In future work, we will explore the robustness of the method by applying a lower sequence similarity limit.

While the ability to apply designed sequences in industrial workflows is yield‐limited, a limitation to the discovery of new active and more thermally stable sequences is ultimately linked to expressing and purifying designs at an appreciable level of purity.

The expression system proposed here, together with the protocol used to direct the expressed sequence in the expression medium, aims at maximizing production yields to increase batch production. Further development of the design pipelines could be then concentrated on optimizing the presence of specific sequence motifs that could improve the success rate of heterologous expression in systems such as Pichia pastoris.

Considering the peculiar ability of eukaryotic expression systems to modify recombinant constructs with complex post‐translational modifications, a potential development of the design pipeline could move towards the incorporation of additional energy terms to favor or disfavor the introduction of specific motifs in the designed sequences.

Moreover, with the possibility to explicitly consider the coevolutionary information between PME and PMEI, this method can be further optimized to design sequences with higher binding affinity by sampling the sequence space of PMEI conditioned on the given sequence of PME. Our framework also allows to implement the deliberate insertion or avoidance of glycosylation sites or other post‐translational modifications that can increase the stability of a targeted fold, to achieve the desired stability of functional proteins and generate sequences that are fitter for harsher micro‐environments.

Conflict of interest

The authors declare no conflict of interest.

1.

Supporting information

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

Supporting Information

Acknowledgements

DM thanks the School of Chemical Sciences of The University of Auckland, the Nectar Cloud computing infrastructure, and the New Zealand eScience high‐performance computing infrastructure (NeSI) for financial and logistic support. This study utilized the high‐performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD (http://biowulf.nih.gov). JP gratefully acknowledges the funding from the Agence Nationale de la Recherche (ANR) and the “Région Hauts de France”. Open Access publishing facilitated by The University of Auckland, as part of the Wiley ‐ The University of Auckland agreement via the Council of Australian University Librarians.

P. Tian, A. Lemaire, F. Sénéchal, O. Habrylo, V. Antonietti, P. Sonnet, V. Lefebvre, F. Isa Marin, R. B. Best, J. Pelloux, D. Mercadante, Angew. Chem. Int. Ed. 2022, 61, e202202711; Angew. Chem. 2022, 134, e202202711.

Contributor Information

Dr. Pengfei Tian, Email: tianpengfei@gmail.com.

Prof. Jérôme Pelloux, Email: jerome.pelloux@u-picardie.fr.

Dr. Davide Mercadante, Email: davide.mercadante@auckland.ac.nz.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  • 1. Arnold F. H., Angew. Chem. Int. Ed. 2018, 57, 4143–4148; [DOI] [PMC free article] [PubMed] [Google Scholar]; Angew. Chem. 2018, 130, 4212–4218. [Google Scholar]
  • 2. Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A., Bridgland A., Meyer C., Kohl S. A. A., Ballard A. J., Cowie A., Romera-Paredes B., Nikolov S., Jain R., Adler J., Back T., Petersen S., Reiman D., Clancy E., Zielinski M., Steinegger M., Pacholska M., Berghammer T., Bodenstein S., Silver D., Vinyals O., Senior A. W., Kavukcuoglu K., Kohli P., Hassabis D., Nature 2021, 596, 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Tian P., Best R. B., Biophys. J. 2017, 113, 1719–1730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Schymkowitz J., Borg J., Stricher F., Nys R., Rousseau F., Serrano L., Nucleic Acids Res. 2005, 33, W382–W388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Huang P.-S., Boyken S. E., Baker D., Nature 2016, 537, 320–327. [DOI] [PubMed] [Google Scholar]
  • 6. Tiwari M. K., Singh R., Singh R. K., Kim I.-W., Lee J.-K., Comput. Struct. Biotechnol. J. 2012, 2, e201209002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Sun J., Cui Y., Wu B. in Methods in Enzymology, Vol. 648 (Eds.: Weber G., Bornscheuer U. T., Wei R.), Academic Press, New York, 2021, pp. 207–230. [Google Scholar]
  • 8. 
  • 8a. Verschueren E., Vanhee P., van der Sloot A. M., Serrano L., Rousseau F., Schymkowitz J., Curr. Opin. Struct. Biol. 2011, 21, 452–459; [DOI] [PubMed] [Google Scholar]
  • 8b. Zhou J., Panaitiu A. E., Grigoryan G., Proc. Natl. Acad. Sci. USA 2020, 117, 1059–1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. 
  • 9a. Porebski B. T., Buckle A. M., Protein Eng. Des. Sel. 2016, 29, 245–251; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9b. Sternke M., Tripp K. W., Barrick D., Proc. Natl. Acad. Sci. USA 2019, 116, 11275–11284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. 
  • 10a. Poelwijk F. J., Socolich M., Ranganathan R., Nat. Commun. 2019, 10, 4213; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10b. Starr T. N., Thornton J. W., Protein Sci. 2016, 25, 1204–1218; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10c. McCandlish D. M., Rajon E., Shah P., Ding Y., Plotkin J. B., Nature 2013, 497, E1–2; discussion E2–3; [DOI] [PubMed] [Google Scholar]
  • 10d. Storz J. F., Curr. Opin. Struct. Biol. 2018, 50, 18–25; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10e. Dasmeh P., Serohijos A. W. R., Proteins Struct. Funct. Bioinf. 2018, 86, 1242–1250; [DOI] [PubMed] [Google Scholar]
  • 10f. Eccleston R. C., Pollock D. D., Goldstein R. A., Proc. Natl. Acad. Sci. USA 2021, 118, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. 
  • 11a. Hopf T. A., Ingraham J. B., Poelwijk F. J., Schärfe C. P. I., Springer M., Sander C., Marks D. S., Nat. Biotechnol. 2017, 35, 128–135; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11b. Jones D. T., Buchan D. W. A., Cozzetto D., Pontil M., Bioinformatics 2012, 28, 184–190; [DOI] [PubMed] [Google Scholar]
  • 11c. Kamisetty H., Ovchinnikov S., Baker D., Proc. Natl. Acad. Sci. USA 2013, 110, 15674–15679; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11d. Marks D. S., Colwell L. J., Sheridan R., Hopf T. A., Pagnani A., Zecchina R., Sander C., PLoS One 2011, 6, e28766–e28786; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11e. Morcos F., Pagnani A., Lunt B., Bertolino A., Marks D. S., Sander C., Zecchina R., Onuchic J. N., Hwa T., Weigt M., Proc. Natl. Acad. Sci. USA 2011, 108, E1293–E1301; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11f. Schug A., Weigt M., Onuchic J. N., Hwa T., Szurmant H., Proc. Natl. Acad. Sci. USA 2009, 106, 22124–22129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. 
  • 12a. Riesselman A. J., Ingraham J. B., Marks D. S., Nat. Methods 2018, 15, 816–822; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12b. Ding X., Zou Z., C. L. Brooks  III , Nat. Commun. 2019, 10, 5644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Repecka D., Jauniskis V., Karpus L., Rembeza E., Rokaitis I., Zrimec J., Poviloniene S., Laurynenas A., Viknander S., Abuajwa W., Savolainen O., Meskys R., Engqvist M. K. M., Zelezniak A., Nat. Mach. Intell. 2021, 3, 324–333. [Google Scholar]
  • 14. 
  • 14a. Shin J. E., Riesselman A. J., Kollasch A. W., McMahon C., Simon E., Sander C., Manglik A., Kruse A. C., Marks D. S., Nat. Commun. 2021, 12, 2403; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14b. Trinquier J., Uguzzoni G., Pagnani A., Zamponi F., Weigt M., Nat. Commun. 2021, 12, 5800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. 
  • 15a. Russ William P., Figliuzzi M., Stocker C., Barrat-Charlaix P., Socolich M., Kast P., Hilvert D., Monasson R., Cocco S., Weigt M., Ranganathan R., Science 2020, 369, 440–445; [DOI] [PubMed] [Google Scholar]
  • 15b. Tian P., Louis J. M., Baber J. L., Aniana A., Best R. B., Angew. Chem. Int. Ed. 2018, 57, 5674–5678; [DOI] [PMC free article] [PubMed] [Google Scholar]; Angew. Chem. 2018, 130, 5776–5780. [Google Scholar]
  • 16. Malhis N., Jones S. J. M., Gsponer J., Nat. Commun. 2019, 10, 1556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. McGee F., Hauri S., Novinger Q., Vucetic S., Levy R. M., Carnevale V., Haldane A., Nat. Commun. 2021, 12, 6302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Di Matteo A., Giovane A., Raiola A., Camardella L., Bonivento D., De Lorenzo G., Cervone F., Bellincampi D., Tsernoglou D., Plant Cell 2005, 17, 849–858. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Wormit A., Usadel B., Int. J. Mol. Sci. 2018, 19, 2878–2897. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Raiola A., Camardella L., Giovane A., Mattei B., De Lorenzo G., Cervone F., Bellincampi D., FEBS Lett. 2004, 557, 199–203. [DOI] [PubMed] [Google Scholar]
  • 21. 
  • 21a. Sterling J. D., Quigley H. F., Orellana A., Mohnen D., Plant Physiol. 2001, 127, 360–371; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21b. Christiaens S., Van Buggenhout S., Houben K., Jamsazzadeh Kermani Z., Moelants K. R., Ngouemazong E. D., Van Loey A., Hendrickx M. E., Crit. Rev. Food Sci. Nutr. 2016, 56, 1021–1042. [DOI] [PubMed] [Google Scholar]
  • 22. 
  • 22a. Bou Daher F., Chen Y., Bozorg B., Clough J., Jonsson H., Braybrook S. A., eLife 2018, 7, e38161; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22b. Hocq L., Pelloux J., Lefebvre V., Trends Plant Sci. 2017, 22, 20–29. [DOI] [PubMed] [Google Scholar]
  • 23. 
  • 23a. Hothorn M., Wolf S., Aloy P., Greiner S., Scheffzek K., Plant Cell 2004, 16, 3437–3447; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23b. Hocq L., Senechal F., Lefebvre V., Lehner A., Domon J. M., Mollet J. C., Dehors J., Pageau K., Marcelo P., Guerineau F., Kolsek K., Mercadante D., Pelloux J., Plant Physiol. 2017, 173, 1075–1093; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23c. Sénéchal F., Habrylo O., Hocq L., Domon J. M., Marcelo P., Lefebvre V., Pelloux J., Mercadante D., J. Biol. Chem. 2017, 292, 21538–21547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Balestrieri C., Servillo L., Quagiuolo L., Giovane A., Castaldo D., US5053232A, 1989.
  • 25. Lante A., Zocca F., Spettoli P., Lomolino G., Raiola A., Bellincampi D., Lionetti V., Giovane A., Camardella L., WO2008104555A1, 2008.
  • 26. Ekeberg M., Lövkvist C., Lan Y., Weigt M., Aurell E., Phys. Rev. E 2013, 87, 012707. [DOI] [PubMed] [Google Scholar]
  • 27. 
  • 27a.A. Lapedes, B. Giraud, C. Jarzynski, arXiv 2012, 10.48550/arXiv.1207.2484; [DOI]
  • 27b. Figliuzzi M., Barrat-Charlaix P., Weigt M., Mol. Biol. Evol. 2018, 35, 1018–1027. [DOI] [PubMed] [Google Scholar]
  • 28.L. McInnes, J. Healy, J. Melville, arXiv 2020, 10.48550/arXiv.1802.03426. [DOI]
  • 29. 
  • 29a. Lim S. A., Hart K. M., Harms M. J., Marqusee S., Proc. Natl. Acad. Sci. USA 2016, 113, 13045–13050; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29b. Okada J., Okamoto T., Mukaiyama A., Tadokoro T., You D. J., Chon H., Koga Y., Takano K., Kanaya S., BMC Evol. Biol. 2010, 10, 207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Razban R. M., Dasmeh P., Serohijos A. W. R., Shakhnovich E. I., Biophys. J. 2021, 120, 2413–2424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Makhatadze G. I., Loladze V. V., Ermolenko D. N., Chen X., Thomas S. T., J. Mol. Biol. 2003, 327, 1135–1148. [DOI] [PubMed] [Google Scholar]
  • 32. Schmitz S., Ertelt M., Merkl R., Meiler J., PLoS Comput. Biol. 2021, 17, e1008568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Cheung N. J., Yu W., BMC Bioinf. 2019, 20, 455. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

Supporting Information

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.


Articles from Angewandte Chemie (International Ed. in English) are provided here courtesy of Wiley

RESOURCES