Skip to main content
ACS Central Science logoLink to ACS Central Science
. 2024 Feb 5;10(2):367–373. doi: 10.1021/acscentsci.3c01284

Machine Learning to Develop Peptide Catalysts—Successes, Limitations, and Opportunities

Tobias Schnitzer , Martin Schnurr , Andrew F Zahrt , Nader Sakhaee , Scott E Denmark ‡,*, Helma Wennemers †,*
PMCID: PMC10906243  PMID: 38435528

Abstract

graphic file with name oc3c01284_0009.jpg

Peptides have been established as modular catalysts for various transformations. Still, the vast number of potential amino acid building blocks renders the identification of peptides with desired catalytic activity challenging. Here, we develop a machine-learning workflow for the optimization of peptide catalysts. First—in a hypothetical competition—we challenged our workflow to identify peptide catalysts for the conjugate addition reaction of aldehydes to nitroolefins and compared the performance of the predicted structures with those optimized in our laboratory. On the basis of the positive results, we established a universal training set (UTS) containing 161 catalysts to sample an in silico library of ∼30,000 tripeptide members. Finally, we challenged our machine learning strategy to identify a member of the library as a stereoselective catalyst for an annulation reaction that has not been catalyzed by a peptide thus far. We conclude with a comparison of data-driven versus expert-knowledge-guided peptide catalyst optimization.

Short abstract

Statistical learning methods were challenged to identify enantioselective peptide catalysts from a 30,000-member in silico library and compared with expert-knowledge-guided methods.

Introduction

The development of efficient catalytic methods is central in chemical research. An often tedious and time-consuming step is the optimization of the catalyst structure. Linking a catalyst structure to its activity and stereoselectivity is difficult, as capturing the dynamics of a molecule is inherently challenging.1 Methods to describe the dynamic behavior of a catalyst alongside a strategy to implement the conformational properties in a systematic catalyst optimization are hence highly desirable.

Recent advances in data-driven approaches and mathematical modeling have succeeded in predicting the reactivity and stereoselectivity of small-molecule catalysts.26 One key aspect of the success of these endeavors is the numerical representation of the molecules of interest. In enantioselective catalysis, this means capturing the important structural elements responsible for stereoinduction. A critical molecular property for catalyst development is conformational flexibility.1,7,8 Although challenging, a number of studies have indicated that incorporating conformational flexibility into the numerical representation improves the predictive power of mathematical models.911

So far, such mathematical models have provided impressive success in improving small-molecule catalysts.12 In this study, we interrogated whether conformer-dependent representations can assist in catalyst design for tripeptide catalysts. Such organocatalysts have a similar molecular weight but many more degrees of conformational freedom compared to conventional small-molecule catalysts (Figure 1). This dynamic behavior makes this catalyst family a unique challenge and allows for probing the limits of current machine-learning approaches.

Figure 1.

Figure 1

Degrees of conformational freedom of peptide catalyst in the context of other small-molecule catalysts.

Over the past two decades, peptides have been established as potent organocatalysts.13,14 Their structure is modular, and automated, solid-phase synthesis allows for straightforward access to a diverse array of peptides. This ease of access to many structurally and functionally different peptides makes this class of catalysts—in case their many different conformational states with different functions can be successfully represented—ideal for using machine learning and other computer-aided tools for catalyst optimization.

Previously, Sigman and Miller successfully constructed linear regression models to predict reaction outcomes for peptide-catalyzed reactions.15 These studies used modeling retrospectively to interrogate the reaction mechanism and identify those catalyst properties responsible for stereoinduction. In this work, we were primarily interested in: (1) using unsupervised learning to create an ideal screening set of tripeptide catalysts and (2) making a priori predictions and experimental validation of superior catalyst structures. Namely, we planned to use mathematical modeling to select a superior-performing catalyst from a large pool of in silico catalyst candidates. This approach has successfully been applied to predict synthetic catalyst performance through the application of 4D- and 3.5-QSAR concepts.3 These conformer-dependent, grid-based representations, known as Average Steric Occupancy (ASO) and Average Electronic Indicator Field (AEIF) descriptors, have been used to represent cinchona alkaloids, amino alcohol-transition metal complexes, and chiral Brønsted acids.16,17

Here, we extended these descriptors to peptide catalysis and evaluated whether they are capable of representing catalysts with greater conformational flexibility.

Results and Discussion

As a testing ground for machine-learning-guided peptide catalyst optimization, we used Pro-Pro-Xaa-type peptides (Figure 2a). These tripeptides are powerful catalysts for stereoselective C–C bond formations that proceed through an enamine intermediate.13,14 Compared to other chiral amine-based catalysts, Pro-Pro-Xaa peptides stand out for their modularity. This feature allows for structural tuning and, thus, stereoselective reactions adjusted to the steric and stereoelectronic properties of the substrates of interest. As a result, human-guided catalyst optimization methods established Pro-Pro-Xaa catalysts that provide stereoselective access to a range of different addition products at catalyst loadings of as low as 0.05 mol %.18 Examples include aldol19 and conjugate addition reactions with a diverse range of electrophiles, including nitroolefins,18,2025 maleimides,26,27 dicyanoolefins,28 allenamides,29 and vinyl triflones.30

Figure 2.

Figure 2

(a) Pro-Pro-Xaa-type peptide catalysts and (b) ground-state structure of H-dPro-Pro-Glu-NH2 (ref (7)).

These conventional catalyst development studies were guided by detailed conformational and mechanistic studies that revealed, for example, the importance of a balance between flexibility and rigidity,7 the trans/cis ratio of the tertiary amide,18,31endo-N-pyramidalization of the enamine intermediate,32,33 and the role of an internal proton donor for the performance of the tripeptide catalysts.34 Among the many Pro-Pro-Xaa catalysts identified for different reactions, H-dPro-Pro-Glu-NH2 features a rigid ground state,7 but all others (e.g., H-Pro-Pro-Asp-NH2, H-dPro-Pro-Gln-OH, H-dPro-Pro-Gln-OH) are more flexible.

Can Machine Learning Predict an Experimentally Optimized Peptide Catalyst?

At the outset of the study, we asked whether a machine-learning-guided catalyst optimization procedure could identify the same peptide catalysts that were previously obtained by traditional, human-guided optimization methods.18,31 To answer this question, we used the conjugate addition of aldehydes to β-nitroolefins as a model reaction (Scheme 1a). We chose this reaction since the so far optimal catalysts H-dPro-Pro-Glu-NH2, H-dPro-Pip-Glu-NH2 (Pip = l-piperidine-2-carboxylic acid), and H-dPro-MePro-Glu-NH2 (MePro = α-methyl-proline), and analogues thereof, adopt a single, well-defined conformation in the ground state (Figure 2b).7,18,31 This rigid structure arises from a β-turn, an intramolecular salt bridge, and two additional H-bonds.7 The structure becomes more flexible upon enamine formation and toggles back and forth between rigid and flexible conformations throughout the catalytic cycle.7 Thus, the ground-state rigidity of these tripeptides should ease the challenge of the machine-learning approach.

Scheme 1. (a) Conjugate Addition Reactions and Catalysts Used for the Machine-Learning Study and (b) Distribution of Conversion to Product (left), Diastereomeric Ratio of the syn Diastereomers (middle), and Enantiomeric Excess of the syn Enantiomers in the 200 Catalytic Reactions (right).

Scheme 1

Specifically, we used the addition reactions of butanal to (E)-β-nitrostyrene (I), (E)-4-methoxy-β-nitrostyrene (III), and (E)-2-bromo-β-nitrostyrene (IV) and propanal to (E)-β-nitrostyrene (II), performed under previously optimized conditions (1 mol % peptide TFA salt, 1 mol % N-methylmorpholine (NMM), CHCl3/i-PrOH, 9:1, 20 °C, 18 h), as trial reactions.18 Fifty peptides that were available in our laboratories and that contained an N-terminal dPro residue were used to establish a data set encompassing conversion to product and dia- and enantioselectivity for each reaction. Forty one of those peptides contain an N-terminal dPro-Pro motif, and nine contain an amino acid other than Pro in the middle position. Of note, one of the optimal peptides, H-dPro-Pip-Glu-NH2, was not included in the set of catalysts.18

The 200 experiments (50 peptides × 4 reactions) provided a spread of both conversions and product stereoselectivities (Scheme 1b). Half of the experiments proceeded with conversions between 10% and 90%. In approximately 25% of the reactions, less than 10% conversion to product was observed (mainly catalysts without a Pro-Pro motif), and in 25% of the reactions, the γ-nitroaldehyde formed almost quantitatively (mostly catalysts with a Pro-Pro motif and an internal acid). The enantioselectivity ranged from 10% ee to 98% ee and the diastereoselectivity from 58:42 to 98:2 syn/anti.

With the data set in hand, we evaluated whether our previously established ASO and AEIF descriptors can be used for peptide catalysts.16 These descriptors seek to capture the dynamic, steric, and electronic properties of the catalysts in 3D-space with respect to a common structural motif (in this case, the first proline residue). Because tripeptides contain many more rotational degrees of freedom compared to previously modeled small-molecule catalysts,3,16,17,35 they represent a more rigorous test of how well the ASO and AEIF descriptors capture the dynamic nature of catalyst scaffolds (see the Supporting Information for descriptor calculations). To validate the representation, ten catalysts (with each of their associated reactions) were left out of the training set to ensure out-of-sample catalysts in the test set (Figure 3). Nine of these catalysts were selected randomly—as the tenth catalyst, we chose the peptide which provided the highest conversion and stereoselectivity.

Figure 3.

Figure 3

Enantioselectivity and diastereoselectivity predicted vs observed. Black points are training data; red points are the test set.

Two separate modeling tasks were undertaken, and Projection to Latent Structure (PLS) models were used to predict both diastereo- and enantioselectivity. These methods constructed excellent models for both enantioselectivity (MAEtest = 0.22 kcal/mmol) and diastereoselectivity (MAEtest = 0.10 kcal/mol). Further, the two most stereoselective catalysts, H-dPro-Pip-Glu-NH2 and H-dPro-MePro-Glu-NH2, were predicted to be, on average, the most enantioselective catalysts (predicted 97% ee, observed 97% ee on average for the top three reactions with these catalysts). This outcome is remarkable, as these two tripeptides are the best-performing catalysts for conjugate addition reactions between aldehydes and β-nitroolefins.18,31 This excellent match between the traditional, mechanism-guided and modeling-guided results indicated that the descriptors used in this case study constituted a valid representation for tripeptide catalysts that feature a well-defined ground-state conformation.

Can Machine Learning Predict an As-Yet-Unknown Peptide Catalyst?

Next, we asked whether a machine-learning-guided procedure could identify a highly stereoselective peptide catalyst for a new reaction. This endeavor is a significantly greater challenge since most Pro-Pro-Xaa and Pro-Xaa-Yaa peptides other than H-dPro-Pro-Glu-NH2, and analogues thereof, cannot form an intramolecular salt bridge and feature greater conformational space.

This next stage required the construction of a large in silico library of tripeptide catalysts. To ensure accessibility of all potential catalysts, a database of 174 commercially available amino acids was compiled. To limit the domain space, the first residue was limited to dPro. By generating tripeptides in which any of the 174 amino acids can be present at the middle or C-terminal positions, a 30,276-member in silico library of potential tripeptide catalysts was constructed. A critical element of the ML workflow is to then select from this library a representative set of molecules, called a Universal Training Set (UTS).3 For this study, the UTS was identified by the use of unsupervised learning methods that have recently been featured in data science applied to catalysis.17,6,36 To achieve the most rapid avenue for the construction of a large training set, we clustered the 174 amino acids and selected representatives for peptide synthesis. This process involved calculating ASO and AEIF descriptors for the 174 individual amino acids. These descriptors were flattened, concatenated, and scaled, and the dimensionality of the resultant feature vector was reduced by removing dimensions with zero variance or with high correlation. Finally, this space was reduced to the first 20 principal components (∼87% explained variance).

The amino acid space thus created was then clustered using the K-means algorithm with the number of clusters set to 1–50 as informed by the elbow method.37 The distortion of each cluster was calculated and plotted to produce an elbow plot (Figure 4). In this analysis, distortion is a measure used to determine how much the variation between clusters changes as more clusters are selected. Small changes in distortion per cluster indicate that the variation between clusters changes very little when a new cluster is added. This outcome can be interpreted as “diminishing returns” for each new compound added to the set. Although the plot lacks distinctive elbows (indicating a clear cutoff), a sharp decline is noted until six clusters are identified. We selected a range of 6–15 to explore. Prospective training sets for this range were selected by identifying the amino acids nearest to the cluster centroids and using them as exemplars for each cluster. These sets were then qualitatively evaluated by considering at which point the cluster exemplars seemed redundant from a chemical perspective. Using this qualitative analysis, ten cluster exemplars were identified as optimal.

Figure 4.

Figure 4

Elbow plot informing number of clusters and selected amino acids (exemplars of each cluster) for combinatorial construction of the tripeptide UTS.

In addition to these ten algorithmically selected representatives, three more amino acids were added manually. The rationale for this manual addition is twofold: (1) from expert knowledge, we added amino acids that conferred high reactivity and stereoselectivity in previous studies,1834 and (2) the three additional amino acids were expected to be similar to some of the algorithmically selected compounds (according to raw ASO/AEIF, dimensionality-reduced space) but may behave differently in chemical systems. For example, the algorithmically chosen glutamine (Gln) residues are related to manually chosen glutamic acid (Glu) residues, but the amide and carboxylic acid groups have different chemical properties. As such, these additions would be important for supervised methods to learn the difference between these related functionalities (see the Supporting Information).

In view of the demonstrated success of the tripeptide catalysts in nitroolefin conjugate addition reactions, the chemoinformatic workflow was challenged to predict a stereoselective catalyst for a different chemical transformation. So far, the developed tripeptide catalysts focused on reactions that proceed via an enamine intermediate. Now, we focused on a reaction that relies on carbonyl activation via a dienamine species from an α,β-unsaturated aldehyde. Specifically, we selected the annulation of senecialdehyde and 2,6-dimethylquinone as a model, a reaction that can be catalyzed with chiral amines (Scheme 2).38 This reaction was chosen because the annulation proceeds via a dienamine, not an enamine, intermediate and therefore may require a peptide catalyst with a reactivity that has so far not been explored. Thus, this annulation reaction poses a greater challenge to the ML workflow. Furthermore, the reaction takes place at ambient conditions and yields stable UV active enantiomeric products that can be easily analyzed by chiral stationary phase with supercritical fluid chromatography. These practical considerations are important since the workflow requires high-throughput reaction analysis.

Scheme 2. Preoptimization of the Peptide-Catalyzed Annulation Reaction.

Scheme 2

In an initial test reaction, equimolar amounts of senecialdehyde and 2,6-dimethylquinone formed in the presence of 5 mol % H-dPro-Pro-Glu-NH2 the annulation product quantitatively with 57% ee in MeOH. Variations of the reaction parameters (15 different solvents, concentration, and reagent stoichiometries; see the Supporting Information for details) identified CHCl3/MeOH (9:1), a concentration of 0.25 M, and a slight excess of aldehyde (1.5 equiv) as optimal reaction conditions. Under these conditions, 84% conversion to the product was observed at room temperature within 22 h with 86% ee.39

Building on this base, we prepared 161 out of the proposed 169 peptides of the UTS by solid-phase peptide synthesis. Owing to repeated, incomplete amino acid couplings, we excluded the eight remaining peptides from the final training set. The catalytic properties of the 161-member UTS were then tested in the model reaction. The most stereoselective peptide, H-dPro-(4S)Flp-Glu-NH2,40 provided the product with an improved enantioselectivity of 91% ee (Δ = +5% ee). Significantly, the mechanism-agnostic selection protocols sample a wide range of reactivity space, providing a normal distribution of selectivity data (Kolmogorov–Smirnov p-value = 0.07315). This distribution, along with the identification of a good lead catalyst, facilitates subsequent modeling endeavors for catalyst structure refinement. These results also support the hypothesis that this set of molecules is an ideal starting point for any reaction which can be catalyzed by tripeptides, owing to the reaction- and mechanism-agnostic selection protocol. A summary of the distribution of stereoselectivities is depicted in Figure 5.

Figure 5.

Figure 5

Graph depicting the distribution of selectivity values gathered from the UTS (skew = −0.08, kurtosis = −0.49).

With a data set exhibiting a wide range of selectivities and relatively normal distribution, a variety of machine learning methods were tested in an effort to discover a more selective catalyst (Figure 6; for a full description of the methods, see the Supporting Information). Three rounds of predictions were made. In the first round of predictions, neural networks were constructed, and catalysts were selected only on the basis of their predicted selectivity without regard to prediction confidence. These predictions can be thought of as “high-risk, high-reward” targets. Notably, the first round of predictions was poor (predicted, ∼80% ee; experimental, 62% ee). Predictive accuracy of later iterations was achieved by using simpler models and including prediction certainty as a selection criterion. This modification resulted in iterative, predictive improvement over the subsequent rounds (for an in-depth discussion, see the Supporting Information). In two additional iterations, a catalyst bearing an azide instead of a fluorine substituent, H-dPro-(4S)Azp-Glu-NH2,40 with marginal improvement (+1% ee) over the previous best, was identified.

Figure 6.

Figure 6

Iterative optimization process with three rounds of predictions.

Reflecting on the results of the optimization workflow, an obvious strength is in the construction of the ideal training set of tripeptide catalysts. This set of catalysts covered a diverse array of structures that produced a selectivity range from −78% to 91% ee. Most importantly, this workflow identified a superior catalyst in just one round of optimization. Further, we were able to make reasonable models, namely, those with good cross-validation scores. When attempting to implement these models in our optimization workflow, we found challenges in making accurate predictions into sparsely populated regions of chemical space. That prompted us to incorporate prediction certainty metrics into our catalyst selection process, substantially increasing the accuracy of a priori predictions (as evidenced by the increased predictive performance from round 1 to round 2 and round 3 targets). Even then, only modest improvements were made over the original best hit.

This study clearly identified areas for improvement in our workflow. In fine-tuning the catalyst structure to improve selectivity beyond the original best catalyst model, generalizability was particularly challenging. This behavior is unsurprising given that the entire >30,000-member chemical space of tripeptides can be only sparsely sampled experimentally. Three possible explanations for achieving only marginal improvements over the best UTS catalyst include the following: (1) the dynamic nature of peptide catalysts results in indicator fields with many high-variance dimensions, causing challenges in dimensionality reduction which make the resulting models less generalizable (for a full explanation, see the Supporting Information), (2) the combinatorial nature of the training set resulted in some degree of simple “pattern matching”, despite the whole-molecule representation, rather than finding physically meaningful correlations, negatively impacting generalization, or (3) the algorithm did indeed converge on a local maximum given the structures present in the original in silico library.

Each of these explanations represents opportunities for future improvements of our workflow. For point one, unsupervised methods of compressing high-variance indicator fields could provide the same spatial information in a lower-dimensional form. For point two, altering the design of the original set such that it covers the same breadth of chemical space but breaks the combinatorial design would possibly result in more general models. For point three, a possible improvement to our workflow would be the implementation of generative models, effectively yielding a mutable in silico library.

An additional approach to constructing generalizable models would be to calculate the descriptors incorporating known reactive intermediates (e.g., a (di)enamine) rather than just the catalyst structure. This approach may enable the model to identify more direct physicochemical relationships, thereby improving predicted accuracy into sparsely populated regions of chemical space. However, using this type of representation would require recalculation of descriptors for each new reaction to optimize. The current descriptors and the UTS are agnostic to the reaction mechanism. As such, it is currently possible to apply the existing descriptors and UTS directly to any new reaction.

Coda: Comparison of Expert-Knowledge- versus Data-Science-Guided Approaches

It is impossible to directly compare these two approaches because an expert-knowledge-guided identification of a peptide catalyst was not carried out for the annulation reaction; nevertheless, a number of considerations can be highlighted.

  • (1)

    In the very first round of descriptor validation, PLS models gave excellent performance metrics and predicted for both diastereo- and enantioselectivity the highest-performing catalysts (H-dPro-Pip-Glu-NH2 and H-dPro-MePro-Glu-NH2) known to be the best catalysts from prior traditional optimization. These catalysts feature a single well-defined conformation. This excellent match between the mechanism-guided and modeling-guided results is striking.

  • (2)

    In prior catalyst optimization campaigns that used mechanism-based considerations, distinctly different Pro-Pro-Xaa catalysts emerged.19,2327,30 These reactions include aldol and conjugate additions to vinyl triflones,30 maleimides,26,27 or disubstituted nitroolefins.23,24 These “manually” optimized catalysts afford the addition products with high stereoselectivities (typically 90–99% ee) and feature a residue other than Glu in the Xaa position, varied absolute configurations at the α-carbons of the amino acids, alternative coordinating moieties to COOH, substituted Pro residues, or a combination of those features. These expert-knowledge-guided catalyst optimizations built on the common Pro-Pro-Xaa scaffold. A similar approach would have likely led to the identification of the high-performing catalyst H-dPro-(4S)Azp-Glu-NH2 for the annulation reaction with less experimental overhead than the de novo campaign with a large training set.

  • (3)

    De novo approaches ideally identify a new, “out of the box” class of catalysts that is difficult to imagine by a knowledge-guided approach. In prior studies, a combinatorial split-and-mix library approach provided, for example, H-Pro-dAla-dAsp-NH2 as an alternative to the Pro-Pro-Xaa motif for aldol reactions.19 Similarly, machine learning is a powerful tool to guide to new catalysts. In this study, the data-driven approach identified potent catalysts within a “privileged” class. At the same time, the study provided a versatile algorithm and a UTS, ready for identifying tripeptide catalysts for other reactions.

Conclusions

In conclusion, using a machine-learning workflow, we generated predictive models for a nitroolefin conjugate addition reaction, validating our molecular representation for tripeptide catalysts that feature a rigid ground state. We next used this representation in an optimization campaign to identify a yet-unknown tripeptide catalyst for an annulation reaction that proceeds via a different mechanism. The respective products were obtained in >90% enantiomeric excess. The motifs identified are closely related to the “manually” optimized H-dPro-Pro-Glu-NH2 catalysts for reactions with β-nitroolefins. This outcome suggests that H-dPro-Pro-Glu-NH2-type catalysts represent either a global performance optimum of the training set or a local maximum with the global one undiscovered.

In this regard, it is encouraging that the algorithmic selection process was able to identify a high-performing catalyst motif in the first round of experimentation. Subsequent supervised modeling yielded marginal catalyst improvements but also identified multiple opportunities for improvement of our optimization workflow. Finally, this study also yielded a large in silico library of tripeptides, descriptor profiles for the constituent tripeptides, and a UTS of 161 tripeptide catalysts. The combination of these elements comprises a powerful starting point for future optimization campaigns.

Acknowledgments

T.S. thanks the Fonds der Chemischen Industrie (Germany) for a Kekulé Fellowship, and A.F.Z. is grateful to the University of Illinois for Graduate Fellowships. We thank the Swiss National Science Foundation (grant 200020_169423). We are grateful to the US National Science Foundation for financial support of the Molecule Maker Laboratory Institute (NSF CHE2019897) as well as from NSF CHE1900617.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acscentsci.3c01284.

  • Synthetic protocols and analytical data of peptides and catalysis products, and details on the computational methods (PDF)

Author Contributions

§ T.S., M.S., and A.F.Z. contributed equally.

The authors declare no competing financial interest.

Supplementary Material

oc3c01284_si_001.pdf (16.3MB, pdf)

References

  1. Crawford J. M.; Sigman M. S. Conformational Dynamics in Asymmetric Catalysis: Is Catalyst Flexibility a Design Element?. Synthesis 2019, 51, 1021–1036. 10.1055/s-0037-1611636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ahneman D. T.; Estrada J. G.; Lin S.; Dreher S. D.; Doyle A. G. Predicting Reaction Performance in C-N cross-Coupling Using Machine Learning. Science 2018, 360, 186–190. 10.1126/science.aar5169. [DOI] [PubMed] [Google Scholar]
  3. Zahrt A. F.; Henle J. J.; Rose B. T.; Wang Y.; Darrow W. T.; Denmark S. E. Prediction of Higher-selectivity Catalysts by Computer-driven Workflow and Machine Learning. Science 2019, 363, eaau5631 10.1126/science.aau5631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Reid J. P.; Sigman M. S. Holistic Prediction of Enantioselectivity in Asymmetric Catalysis. Nature 2019, 571, 343–348. 10.1038/s41586-019-1384-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Sandfort F.; Strieth-Kalthoff F.; Kühnemund M.; Beecks C.; Glorius F. A Structure-Based Platform for Predicting Chemical Reactivity. Chem. 2020, 6, 1379–1390. 10.1016/j.chempr.2020.02.017. [DOI] [Google Scholar]
  6. Hueffel J. A.; Sperger T.; Funes-Ardoiz I.; Ward J. S.; Rissanen K.; Schoenebeck F. Accelerated Dinuclear Palladium Catalyst Identification Through Unsupervised Machine Learning. Science 2021, 374, 1134–1140. 10.1126/science.abj0999. [DOI] [PubMed] [Google Scholar]
  7. Rigling C.; Kisunzu J. K.; Duschmalé J.; Häussinger D.; Wiesner M.; Ebert M.-O.; Wennemers H. Conformational Properties of a Peptidic Catalyst: Insights from NMR Spectroscopic Studies. J. Am. Chem. Soc. 2018, 140, 10829–10838. 10.1021/jacs.8b05459. [DOI] [PubMed] [Google Scholar]
  8. Metrano A. J.; Abascal N. C.; Mercado B. Q.; Paulson E. K.; Hurtley A. E.; Miller S. J. Diversity of Secondary Structure in Catalytic Peptides with Beta-Turn-Biased Sequences. J. Am. Chem. Soc. 2017, 139, 492–516. 10.1021/jacs.6b11348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Brethomé A. V.; Fletcher S. P.; Paton R. S. Conformational Effects on Physical-Organic Descriptors: The Case of Sterimol Steric Parameters. ACS Catal. 2019, 9, 2313–2323. 10.1021/acscatal.8b04043. [DOI] [Google Scholar]
  10. Zahrt A. F.; Rinehart N. I.; Denmark S. E. A Conformer-Dependent, Quantitative Quadrant Model. Eur. J. Org. Chem. 2021, 2021, 2343–2354. 10.1002/ejoc.202100027. [DOI] [Google Scholar]
  11. Zahrt A. F.; Denmark S. E. Evaluating Continuous Chirality Measure as a 3D Descriptor in Chemoinformatics Applied to Asymmetric Catalysis. Tetrahedron. 2019, 75, 1841–1851. 10.1016/j.tet.2019.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Zahrt A. F.; Athavale S. V.; Denmark S. E. Quantitative Structure-Selectivity Relationships in Enantioselective Catalysis: Past, Present, and Future. Chem. Rev. 2020, 120, 1620–1689. 10.1021/acs.chemrev.9b00425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Metrano A. J.; Chinn A. J.; Shugrue C. R.; Stone E. A.; Kim B.; Miller S. J. Asymmetric Catalysis Mediated by Synthetic Peptides, Version 2.0: Expansion of Scope and Mechanisms. Chem. Rev. 2020, 120, 11479–11615. 10.1021/acs.chemrev.0c00523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Wennemers H. Asymmetric catalysis with peptides. Chem. Commun. 2011, 47, 12036–12041. 10.1039/c1cc15237h. [DOI] [PubMed] [Google Scholar]
  15. Crawford J. M.; Stone E. A.; Metrano A. J.; Miller S. J.; Sigman M. S. Parameterization and Analysis of Peptide-Based Catalysts for the Atroposelective Bromination of 3-Arylquinazolin-4(3H)-ones. J. Am. Chem. Soc. 2018, 140, 868–871. 10.1021/jacs.7b11303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Henle J. J.; Zahrt A. F.; Rose B. T.; Darrow W. T.; Wang Y.; Denmark S. E. Development of a Computer-Guided Workflow for Catalyst Optimization. Descriptor Validation, Subset Selection, and Training Set Analysis. J. Am. Chem. Soc. 2020, 142, 11578–11592. 10.1021/jacs.0c04715. [DOI] [PubMed] [Google Scholar]
  17. Zahrt A. F.; Rose B. T.; Darrow W. T.; Henle J. J.; Denmark S. E. Computational Methods for Training Set Selection and Error Assessment Applied to Catalyst Design: Guidelines for Deciding which Reactions to Run First and which to Run Next. React. Chem. Eng. 2021, 6, 694–708. 10.1039/D1RE00013F. [DOI] [Google Scholar]
  18. Schnitzer T.; Wennemers H. Influence of the Trans/Cis Conformer Ratio on the Stereoselectivity of Peptidic Catalysts. J. Am. Chem. Soc. 2017, 139, 15356–15362. 10.1021/jacs.7b06194. [DOI] [PubMed] [Google Scholar]
  19. Krattiger P.; Kovasy R.; Revell J. D.; Ivan S.; Wennemers H. Increased Structural Complexity Leads to Higher Activity: Peptides as Efficient and Versatile Catalysts for Asymmetric Aldol Reactions. Org. Lett. 2005, 7, 1101–1103. 10.1021/ol0500259. [DOI] [PubMed] [Google Scholar]
  20. Wiesner M.; Revell J. D.; Wennemers H. Tripeptides as Efficient Asymmetric Catalysts for 1,4-Addition Reactions of Aldehydes to Nitroolefins – A Rational Approach. Angew. Chem., Int. Ed. 2008, 47, 1871–1874. 10.1002/anie.200704972. [DOI] [PubMed] [Google Scholar]
  21. Wiesner M.; Upert G.; Angelici G.; Wennemers H. Enamine Catalysis with Low Catalyst Loadings – High Efficiency via Kinetic Studies. J. Am. Chem. Soc. 2010, 132, 6–7. 10.1021/ja9068112. [DOI] [PubMed] [Google Scholar]
  22. Schnitzer T.; Wennemers H. Deactivation of Secondary Amine Catalysts via Aldol Reaction–Amine Catalysis under Solvent-Free Conditions. J. Org. Chem. 2020, 85, 7633–7640. 10.1021/acs.joc.0c00665. [DOI] [PubMed] [Google Scholar]
  23. Duschmale J.; Wennemers H. Adapting to Substrate Challenges: Peptides as Catalysts for Conjugate Addition Reactions of Aldehydes to α,β-Disubstituted Nitroolefins. Chem.—Eur. J. 2012, 18, 1111–1120. 10.1002/chem.201102484. [DOI] [PubMed] [Google Scholar]
  24. Kastl R.; Wennemers H. Peptide-Catalyzed Stereoselective Conjugate Addition Reactions Generating All-Carbon Quaternary Stereogenic Centers. Angew. Chem., Int. Ed. 2013, 52, 7228–7232. 10.1002/anie.201301583. [DOI] [PubMed] [Google Scholar]
  25. Schnitzer T.; Budinska A.; Wennemers H. Organocatalysed Conjugate Addition Reactions of Aldehydes to Nitroolefins with anti Selectivity. Nat. Catal. 2020, 3, 143–147. 10.1038/s41929-019-0406-4. [DOI] [Google Scholar]
  26. Grünenfelder C.; Kisunzu J.; Wennemers H. Peptide-Catalyzed Stereoselective Conjugate Addition Reactions of Aldehydes to Maleimide. Angew. Chem., Int. Ed. 2016, 55, 8571–8574. 10.1002/anie.201602230. [DOI] [PubMed] [Google Scholar]
  27. Vastakaite G.; Grünenfelder C. E.; Wennemers H. Peptide-Catalyzed Stereoselective Conjugate Addition Reaction of Aldehydes to C-Substituted Maleimides. Chem.—Eur. J. 2022, 28, e202200215 10.1002/chem.202200215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Schnitzer T.; Wennemers H. Thieme Chemistry Journals Awardees – Where Are They Now? A Stereoselective Tripeptide Catalyst for Conjugate Addition Reactions of Acetophenones to Dicyanoolefins. Synlett 2017, 28, 1282–1286. 10.1055/s-0036-1588964. [DOI] [Google Scholar]
  29. Nicholls L. D. M.; Wennemers H. Synergistic Peptide and Gold Catalysis: Enantioselective Addition of Branched Aldehydes to Allenamides. Chem.—Eur. J. 2021, 27, 17559–17564. 10.1002/chem.202103197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Budinská A.; Wennemers H. Organocatalytic Synthesis of Triflones Bearing Two Non-adjacent Stereogenic Centers. Angew. Chem., Int. Ed. 2023, 62, e202300537 10.1002/anie.202300537. [DOI] [PubMed] [Google Scholar]
  31. Schnitzer T.; Rackl J. W.; Wennemers H. Stereoselective Peptide Catalysis in Complex Environments – From River Water to Cell Lysates. Chem. Sci. 2022, 13, 8963–8967. 10.1039/D2SC02044K. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Schnitzer T.; Möhler J. S.; Wennemers H. Effect of the Enamine Pyramidalization Direction on the Reactivity of Secondary Amine Organocatalysts. Chem. Sci. 2020, 11, 1943–1947. 10.1039/C9SC05410C. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Möhler J. S.; Schnitzer T.; Wennemers H. Amine Catalysis with Substrates Bearing N-Heterocyclic Moieties Enabled by Control over the Enamine Pyramidalization Direction. Chem.—Eur. J. 2020, 26, 15623–15628. 10.1002/chem.202002966. [DOI] [PubMed] [Google Scholar]
  34. Duschmale J.; Wahl J.; Wiesner M.; Wennemers H. Effects of Internal and External Carboxylic Acids on the Reaction Pathway of Organocatalytic 1,4-Addition Reactions between Aldehydes and Nitroolefins. Chem. Sci. 2013, 4, 1312–1318. 10.1039/C2SC21832A. [DOI] [Google Scholar]
  35. Rinehart N. I.; Saunthwal R. K.; Wellauer J.; Zahrt A. F.; Schlemper L.; Shved A. S.; Bigler R.; Fantasia S. A Machine-learning Tool to Predict Substrate-adaptive Conditions for Pd-catalyzed C–N Couplings. Science. 2023, 6661, 965–972. 10.1126/science.adg2114. [DOI] [PubMed] [Google Scholar]
  36. Gensch T.; Smith S. R.; Colacot T. J.; Timsina Y. N.; Xu G.; Glasspoole B. W.; Sigman M. S. Design and Application of a Screening Set for Monophosphine Ligands in Cross-Coupling. ACS Catal. 2022, 12, 7773–7780. 10.1021/acscatal.2c01970. [DOI] [Google Scholar]
  37. Thorndike R. L. Who Belongs in the Family?. Psychometrika 1953, 18, 267–276. 10.1007/BF02289263. [DOI] [Google Scholar]
  38. Johansen T. K.; Villegas Gomez C.; Bak J. R.; Davis R. L.; Jorgensen K. A. Organocatalytic Enantioselective Cycloaddition Reactions of Dienamines with Quinones. Chem.—Eur. J. 2013, 19, 16518–16522. 10.1002/chem.201303526. [DOI] [PubMed] [Google Scholar]
  39. A natural extension of our computational workflow would be to simultaneously optimize catalyst structure and reaction conditions. The solvent dependence is, for example, a factor that has a significant effect on the reaction outcome, in particular in the case of catalysts with large conformer space, and is not captured in the ML workflow.
  40. For a previous report with this peptide catalyst, see:; Schnitzer T.; Wennemers H. Effect of g-substituted Proline Derivatives on the Performance of the Peptidic Catalyst H-dPro-Pro-Glu-NH2. Synthesis 2018, 50, 4377–4382. 10.1055/s-0037-1609547. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

oc3c01284_si_001.pdf (16.3MB, pdf)

Articles from ACS Central Science are provided here courtesy of American Chemical Society

RESOURCES