The application of modern machine learning to challenges in atomistic simulation is gaining attraction.
Abstract
The application of modern machine learning to challenges in atomistic simulation is gaining attraction. We present new machine learning models that can predict the energy of the oxidative addition process between a transition metal complex and a substrate for C–C cross-coupling reactions. In turn, this quantity can be used as a descriptor to estimate the activity of homogeneous catalysts using molecular volcano plots. The versatility of this approach is illustrated for vast libraries of organometallic catalysts based on Pt, Pd, Ni, Cu, Ag, and Au combined with 91 ligands. Out-of-sample machine learning predictions were made on a total of 18 062 compounds leading to 557 catalyst candidates falling into the ideal thermodynamic window. This number was further refined by searching for candidates with an estimated price lower than 10 US$ per mmol. The 37 catalyst finalists are dominated by palladium phosphine ligand combinations but also include the earth abundant transition metal (Cu) with less common ligands. Our results indicate that modern statistical learning techniques can be applied to the computational discovery of readily available and promising catalyst candidates.
1. Introduction
Chemists constantly pursue new molecular systems that provide increasingly higher yields and better control of selectivity. Rather than blindly searching for promising catalysts to meet their needs, numerous tools that aid in identifying the most appropriate species have been developed. These range from high-throughput screening1,2 (including combinatorial methods3,4), which quickly evaluates reaction conditions and the structures of catalysts, to multidimensional modeling based on a design of experiments (DoE),5 that relates steric and structural descriptors (e.g., Charton values and Sterimol parameters) to enantioselectivity. Such methods have found broad application in asymmetric homogeneous catalysis.6–14 On the other hand, the tremendous increase in computer power accompanied by methodological advancements has also made computational studies of catalytic processes commonplace.15 While virtually any catalytic system can be subjected to computational analysis, often the conclusions reached are not transferable and provide little insight into the best way to develop more active and selective catalysts. Thus, a tool that assesses the properties of untested catalysts based on a simple energetic or structural criterion would rapidly accelerate the discovery pace of new species. Indeed, similar concepts involving the mapping of a difficult to determine quantity onto an easily obtained variable have been a central pillar of catalysis and physical organic chemistry for more than 80 years, and are at the core of familiar concepts such as the Bell–Evans–Polanyi principle,16,17 the Hammett equation,18–21 or the Brønsted catalysis equation.22 Today, volcano plots,23,24 which relate easily accessible descriptor variables directly to catalytic performance, accomplish this objective and find regular use in the fields of heterogeneous catalysis25–27 and electrocatalysis.28–34
Based on knowledge of a chosen descriptor variable, volcano plots function by discriminating catalytic performance using Sabatier's principle.35 Sabatier conceived the notion of an ideal catalyst that should not bind a substrate too strongly or too weakly. The unique volcano shape facilitates rapid discrimination of catalytic activity. Species positioned highest on the plot (generally on or near the volcano plateau or peak) have the best profiles and fulfill Sabatier's principle. Species located along the left- and right-slopes have less ideal profiles and can be characterized as having either overly strong (left) or overly weak (right) substrate/catalyst interactions. While being commonly used in heterogeneous and electrocatalysis, and frequently invoked for homogeneous systems,36–38 only recently have these appealing tools been concretely realized for molecular catalysts.39 Corminboeuf and co-workers have pioneered the use of molecular volcano plots to study various aspects of prototypical C–C cross-coupling reactions in order to gauge the feasibility of using these tools to identify attractive homogeneous catalysts.39–42 Subsequent work has also focused on adapting volcano plots for applications in homogeneous catalysis via the inclusion of kinetics (as opposed to the typically used thermodynamic based criteria) of the catalytic cycle.43
The use of molecular volcano plots involves establishing linear scaling relationships that relate the quantitative value of a descriptor to the thermodynamic or kinetic performance of the catalyst. As such, this tool has clear utility in high-throughput screening applications that search for prospective catalysts by computing the value of this descriptor for any species desired. However, currently both the geometries and energies associated with multiple forms of each catalysts must be determined through a relatively slow process involving density functional theory computations. Clearly, increasing the speed at which the descriptor variable can be determined would result in an overall increase in the discovery pace of new catalysts because prospective species could be screened more rapidly. One route with the potential to provide virtually instantaneous access to the descriptor involves the application of quantum machine learning (ML) models, i.e., ML models which can be trained on, and used to predict, quantum properties.44–46 The application of ML models to estimate volcano plot energy descriptors offers increased speed for two reasons: first, the energy based value can be immediately accessed for any desired species, and second, the need to establish a precise geometry of the catalyst can be circumvented by also including this task into the ML model, as already demonstrated within the Δ-ML approach.47 As such, the ML model can predict an accurate descriptor value from an approximated 3D structure of a catalyst.
While, generally speaking, applications of machine learning methods in chemistry are still in their infancy, their use has begun to appear in the fields of materials science48–53 and catalysis.54–61 For example, a gradient-boosting regression method62 has been used to predict the d-band center of mono and bimetallic surfaces63 and to estimate CO adsorption energies on Pt nanoparticles,64 while a local similarity kernel could predict the catalytic activity of nanoparticles.65 Moreover, applications of support vector machines (SVMs)66 were able to anticipate CO2 uptake in metal organic frameworks (MOFs)67 by developing an atomic property-weighted radial distribution function (AP-RDF) based descriptor68 that captures geometric and chemical features of periodic systems. Predictive structure–reactivity models have identified promising Pt-based electrocatalysts for the oxygen reduction reaction,69 while artificial neural networks (ANNs) have recognized multimetallic alloys possessing high selectivity for electrochemical CO2 reduction to C2 species.70,71 Recently, Nørskov investigated various machine learning based approaches72 to systematically search for the active sites of bimetallic (nickel gallium) nanoparticles,73 to construct Pourbaix surface phase diagrams,74 and to identify probable mechanisms of hydrocarbon–syngas reactions on rhodium(111).75 Rappe and co-workers also exploited the regularized random forest machine learning algorithm,76 and discovered the key role played by structure and charge descriptors (namely the Ni–Ni bond length and the Ni residual charge) in the hydrogen evolution reaction activity of Ni2P(0001).
Despite the considerable amount of progress in applying ML models to chemical problems, each of the aforementioned contributions tackled issues surrounding heterogeneous catalysis, while ML applications to homogeneous catalysis remain exceedingly rare.54,77 Significant advances with ML models to obtain fundamental molecular electronic properties (e.g., atomization or total energies of molecules) have been made,78–85 however, the prediction of catalytic cycle intermediates energies has never been attempted to the best of our knowledge. The purpose of this work is to demonstrate how ML models can be used to accelerate the screening of prospective homogeneous catalyst candidates, thereby enabling the computational discovery of novel catalytic materials. To this end, we selected the problem of finding catalysts for the Suzuki–Miyaura C–C cross-coupling reaction (Fig. 1).86–88 Specifically, we trained and applied ML models using the reaction energy associated with oxidative addition (eqn (1)), which has previously been shown to be a descriptor variable for analyzing the catalytic cycle thermodynamics using volcano plots.39 Although kinetic profiles are obviously important for obtaining a full and accurate description of catalytic performance, here we rely on a simplified thermodynamic picture (Fig. 1), which can be exploited to rapidly discriminate between catalysts with promising or inadequate energy profiles.39–42
L1ML2 + (C C)Br → L1M(Br) (C C)L2, ΔERxn A | 1 |
Using machine learning models of this quantity, along with previously constructed molecular volcano plots, it is possible to screen thousands of potential catalysts with controlled accuracy (by virtue of learning curves) and at a negligible computational overhead.
2. Computational details
The initial set of Cartesian coordinates for each catalyst was obtained by converting Simplified Molecular Input Line Entry System (SMILES) formats (i.e., a line notation for entering and representing molecules and reactions)89,90 into three-dimensional structures with the 3D structure generator operation (i.e., gen3d operation) of the OpenBabel software (see the ESI† for details).91 To generate target energy values for the training and test complexes, we proceeded as follows: computations involving geometry optimization and electronic energies were generated and executed via the AiiDA automated platform.92 Gas phase geometry optimizations were computed at the B3LYP93–95-D3 (ref. 96 and 97) with 3-21G (for Ni, Pd, Cu and Ag complexes)98–101 and a def2-SVP102 basis set (for Pt and Au complexes) in Gaussian09.103 Single point energies were computed at the B3LYP-D3/def2-TZVP level.104 The oxidation states of the catalysts were adjusted to comply with the dominant 14e–/16e– nature of the complexes in the Suzuki cross-coupling reaction. The reaction electronic energies (eqn (1)) were computed and used as a descriptor (see a volcano plot in Fig. 2) for training the machine learning models. The ML models were trained and applied using the Quantum Machine Learning toolkit QMLcode.105
The reference volcano plot associated with the catalytic cycle of Fig. 1 was constructed according to the procedure outlined in our previous work39,43 (detailed description of the procedure can be found in the ESI†) using the same theory level as for the descriptors of the machine learning training set. Note that despite the relatively modest level of theory used herein (engendered by the large computational effort associated with generating the training set for the ML model), the geometries and key energetic properties are in line with those previously computed (see Table S1†).39,40 Similarly, we previously demonstrated that the same set of linear free-energy scaling relationships capably describe variations in the number of coordinated ligands (i.e., bis vs. monoligated), as well as different oxidation or spin states of the catalyst.39,42,43 Rather than predicting the entire volcano plot, the most essential property is the descriptor [ΔE(Rxn A)] (eqn (1)), which can be machine learned, as well as knowledge about its target value, i.e., the energy range corresponding to the ideal plateau region (extending from –32.1 to –23.0 kcal mol–1, Fig. 2).
3. Methods
3.1. Database
The training procedure relies upon constructing a large database of catalysts that are obtained through combining various ligands and metals. These species are then used for training and testing the ML models which, in turn, are used to predict descriptor values so rapidly and with such accuracy that large libraries can be scanned in order to identify acceptable catalyst candidates. Ninety-one ligands including CO, phosphines, N-heterocyclic carbenes and pyridines were combined with six transition metals (Ni, Pd, Pt, Cu, Ag, and Au) to form the database. All possible metal/ligand combinations (i.e., L1 and L2, where L1ML2 is equivalent to L2ML1) of catalytic cycle intermediates 1 and 2 (Fig. 1) lead to a total library consisting of 25 116 species for each intermediate (see the ESI† for a complete list of ligands used). Rather than providing the optimized structures for each ligand to build the catalysts, the geometries of catalytic cycle intermediates 1 and 2 for each database entry were created by converting SMILES strings (Fig. 3)89,90,106 to Cartesian coordinates using the OpenBabel implementation91 of the Merck Molecular Force Field method (MMFF94).107–111 This database was divided into two subsets: (i) the training/test set used within cross-validated learning curves (see details on the cross-validation procedure in Section 3.2) for which the computed descriptor values [ΔE(Rxn A)] were used as a reference and (ii) the prediction set on which the model was applied to screen candidates based on their ML predicted descriptor values. Since collecting reference data for the training and test sets involves costly DFT geometry relaxations, we proceeded in two steps:112 first, an initial training set made of complexes involving a diverse set of ligands (72 in total) with Pd (2595 complexes).113 Secondly, a small subset of illustrative ligands (12) with each of five other metals (Pt, Au, Ag, Cu, Ni) (390 complexes) was created. The final set consisted of a total of 7054 reaction energy values corresponding to our descriptor. All DFT optimized geometries and computed electronic energies of each intermediate 1 and 2 as well as the associated ΔE(Rxn A) values are provided in the ESI.† ML models were trained on this set (vide infra), and out-of-sample predictions were then made on the prediction set that consisted of all the other complexes (18 062 in total). Note that included in this set are 19 realistic ligands that have already been employed in experimental settings (i.e., ligand no. 72–90 in Fig. 3).114–116
3.2. Training
To begin the machine learning process, information intrinsically contained within each three-dimensional structure must be transformed into a suitable representation. The approach selected to represent a molecule has a crucial impact on the learning curve (for a recent example of a study discussing the role of the representation, see ref. 83). It is of particular importance to construct a meaningful relationship between the representation and the catalyst candidate, that will be learned by the machine learning algorithm. For this reason, all the relevant variables for computing the target properties (in our case ΔE(Rxn A)) should be represented in the chosen machine learning representation of the molecule. Over the last few years, increasingly improved representations44,78,82,117,118 that progressively encode increasing amounts of physical information have been proposed. Here, we focus on the sorted Coulomb Matrix (CM), the first representation introduced for ML models trained throughout chemical space and used to predict quantum properties,44 a two-body bagged variant of the CM with superior performance, the Bag of Bonds (BoB),78 and the recently proposed Spectrum of London and Axilrod–Teller–Muto potential (SLATM).119 The CM representation consists of a square atom by atom matrix, where the diagonal elements model the potential energies of free atoms while the off-diagonal elements correspond to the Coulomb nuclear repulsion between atom pairs. In the BoB representation, CM elements are bagged (e.g., C–C, C–N, C–H, etc. are accounted for in separate bags.). SLATM is based on the dissociative limits of intermolecular dispersion contributions between unpolarized moieties. They account for interatomic two-body terms through London's dispersion curve (rather than Coulomb), and for the three-body terms according to Axilrod–Teller–Muto.120,121
We stress that our principal objective is to describe the oxidative addition step directly from rough-coordinate estimates obtained from the SMILES structure (i.e., without providing accurate geometry as an input). After conversion from SMILES to coordinates, we map our input representation onto the corresponding continuous label value (here ΔE(Rxn A)) using kernel ridge regression (KRR),122 which solves nonlinear problems by mapping data from the input space into a high-dimensional linear feature space (kernel trick). A Laplacian kernel function is used for the CM and BoB representations, and a Gaussian kernel for the SLATM representation (more details in the ESI†). The quality of the models is evaluated by reporting test errors, which can be obtained by separating the dataset into training and test frames and calculating the average error (typically a mean absolute error (MAE)) for the predictions on the out-of-sample test set. This random sub-sampling cross-validation procedure46 was used to shuffle the dataset randomly into different training sets. For every shuffling step the MAE for the model was calculated and the procedure repeated ten times for every training set size N. Afterwards, the errors for the different models were averaged into a single cross-validated error. Note that this error remains a random variable that is dependent on the initial splitting of the training/test datasets. When plotted on a log–log scale, successful learning is indicated by linearly decaying behavior for large training set sizes, as already suggested by Vapnik and others in the nineties.123,124
4. Results and discussion
4.1. Machine learning
In order to verify the performance and validity of our ansatz, we have trained and tested machine learning models for various training set sizes. The resulting learning curves, depicted in Fig. 4, demonstrate the efficiency and accuracy of the learning process in terms of a near-linear decay of test error with training set size. While learning is observed for all representations, the learning curves illustrate the impact of the molecular representation on the off-set and slope. Overall, the performances of the ML models based on the SLATM and BoB are very similar (for the largest training set, the MAE is 2.61 kcal mol–1 and 2.73 kcal mol–1 respectively) and superior to CM (largest training set MAE = 3.05 kcal mol–1). Despite these small variations, it is obvious that efficient learning is achieved by all three representations. This result contrasts with findings in ref. 51 where the CM was claimed to be of little use when constructing ML models for transition metal complexes. The poor performance of CM is more likely due to inappropriate choice of properties (electronic spin-states) than to the molecular systems themselves. It seems intuitive that any purely structure and composition based representation will struggle to account for various electronic states. When it comes to simple electronic ground state properties, such as the oxidative addition step studied here, Fig. 4 clearly demonstrates that the CM is very applicable to the machine learning modeling of properties of transition metal complexes. We also note that the BoB representation performs surprisingly well for this problem. We ascribe this behavior to the bagging which allows the model to place appropriate weights to bonds involving the transition metal.
The energy range for the descriptors of the training set (corresponding to the x-axis of the molecular volcano plot) is ≈120 kcal mol–1 (Fig. 2). We therefore considered the ML model to be sufficiently well converged for the task of picking catalysts, once the learning curve dropped to less than 3 kcal mol–1 (i.e., 2% of the descriptor range). The most efficient representations, SLATM and BoB, reached this threshold with a training set of 7054 binding energies. The following discussion will thus be based on the less sophisticated representation, BoB. All the predictions associated with the other two representations are presented in the ESI.† It is important to reiterate that while the machine learning models were trained on DFT reaction energies obtained for DFT optimized geometries, the molecular representations in the test set were constructed solely from the coordinates directly obtained from SMILES conversion.
The heterogeneity of the training set112 (i.e., unequal representation of the six transition metals) has been looked into by evaluating the individual predictions of the BoB based machine learning model on each metal separately. The resulting learning curves depicted in the inset of Fig. 4 demonstrate that learning is attained for all metals. For the largest training set size, the target MAE of 3 kcal mol–1 is achieved for Pd, Pt, Ag and Au, while the Ni and Cu metal complexes are less accurately described (best MAE = 3.74 and 4.04 kcal mol–1, respectively). These larger errors certainly originate from the smaller sample of Ni complexes and from copper-ligand combinations featuring ligands that are less frequent in the rest of the training set. This leads to a larger energy range in the descriptor variables which can be seen as a broader distribution/width (see the histograms (Fig. S2 and S3) in the ESI†). Overall, however, the ML performance for Ni and Cu-based complexes is still useful as it is not more than 5% of the descriptor's energy range (i.e., inferior to 5 kcal mol–1).
4.2. Catalyst prediction
The trained ML models were subsequently exploited to predict the energy based descriptor of 18 062 potential out-of-sample catalysts with negligible computational cost (vide supra). At this point, it is worth noting that out-of-sample predictions that involve ligands not previously seen by the models should be considered with more care. Additionally, the predictive power of the model would be limited for catalysts that would suffer from a convergence problem in an actual computation.113 Because we are interested only in the catalysts predicted to have the best thermodynamic profile for the Suzuki–Miyaura reaction, emphasis was placed on a narrow range of descriptor energy values (from –32.1 to –23.0 kcal mol–1) corresponding roughly to the plateau of the volcano. However, the same ML models would be relevant to the analysis of other cross-coupling reaction variants differing only by the width of the plateau region.40 Using the BoB model, 557 catalysts were identified that fell into this region. A brief examination of the metal distribution (Fig. 5) yields expected results, namely that catalysts incorporating group 10 metals (Ni, Pd, Pt) appear more frequently than their group 11 (Cu, Ag, Au) counterparts. This finding is in line with our earlier DFT-based molecular volcano plot analysis of the same reaction.39–41
A prevalent metal identified by the ML model is palladium, which has 265 species that appear on or near the volcano plateau (Fig. 5). The large number of Pd catalysts attests to the accuracy of the ML models, as these species have a rich history in catalyzing cross-coupling reactions.125–128 On the other hand, Pt catalysts are virtually experimentally unknown129 and those that have been tested tend to show only moderate catalytic ability.130 Nonetheless, their significant presence on the volcano plateau does align with our earlier DFT-based evaluations.39–41 Indeed, we previously postulated that the presence of Pt based catalysts on top of the volcano may indicate that the problem with these species is less thermodynamic and more kinetic in nature.40 In addition, others have speculated that an enhanced M–R bond strength causes transmetallation in these species to be sluggish.131 Despite being well-known cross-coupling catalysts,132 only a handful of Ni based species are predicted by the ML model to appear near the volcano plateau. However, in its current state, the ML models consider only a single oxidation state, that for Ni corresponds to a Ni(0)/Ni(ii) based catalytic cycle. Thus, the more catalytically active Ni(i) oxidation state, which is accessed via a one-electron redox process133 and generally shifts Ni catalysts from the strong-binding side of the volcano onto the plateau,42 is currently not assessed by the ML models (vide supra) but incorporation of alternative catalytic oxidation states represents an appealing future improvement of the current model. The volcano plot also reveals the influence of ligand type on the thermodynamics of the catalytic cycle. For example, Fig. 6 clearly indicates that phosphine ligands generally outperform N-heterocyclic carbene and pyridine ligands when combined with group 10 metal (Ni, Pd, and Pt) complexes. More interesting is the presence of oxazole ligands for Pd metals. While the use of the monodentate variant (e.g., ligands no. 78–80) appears elusive in the literature, the chemistry associated with the use of bidentate bis(oxazole) ligands for cross-coupling reactions is relatively well established.134
By far, the vast majority of the coinage metal (group 11) catalysts have very weak binding energies and, correspondingly, lie on the right (weak-binding) slope of the volcano. Indeed, no Au or Ag based catalyst has sufficiently strong binding energy to appear on the volcano plateau (Fig. 5). This finding directly agrees with experimental and computational studies that have found Ag and Au catalysts to have unfavorable free energies associated with oxidative addition.135 On the other hand, a handful (20) of Cu based catalysts are found to have nearly ideal thermodynamic profiles. While instances of Cu-based Suzuki coupling have appeared in the literature,136,137 these catalysts tend to employ bidentate acetylacetone (acac) or acetate/triflate ligands.138,139 Thus, it is interesting to note that each of the thermodynamically most appealing Cu catalysts involves either a tris(dimethylamino)phosphine or bulky N-heterocyclic carbene (Fig. 6). These findings represent a potentially interesting research direction that should be explored in more depth and that has been revealed solely through the application of ML models coupled with molecular volcano plots.
Finally, a more refined selection of catalysts was obtained based on their estimated price per mmol (Fig. 7). Among the 557 catalysts with promising thermodynamic profiles, 37 complexes have an estimated price less than 10 US$ per mmol. These species include earth abundant metals (copper with tris(dimethylamino)phosphine) and a multitude of more standard palladium phosphine combinations.
5. Conclusions
We have trained and used machine learning models to dramatically accelerate the descriptor screening of 18 062 homogeneous catalysts for the Suzuki–Miyaura C–C cross-coupling reaction. The model was based on the capability of molecular volcano plots to identify thermodynamically attractive candidates with respect to a simple energy descriptor. Overall, we have identified 37 promising low-cost complexes featuring palladium and copper combined with both standard and less expected ligands. Our findings also indicate that machine learning can be used to screen thousands of catalysts, and that previously introduced machine learning representations can be used for property predictions of transition-metal complexes. Exploitation of a Δ-machine learning approach represents an appealing future improvement of the proposed ML models.47,117
Conflicts of interest
There are no conflicts to declare.
Acknowledgments
The National Centre of Competence in Research (NCCR) Materials Revolution: Computational Design and Discovery of Novel Materials (MARVEL) of the Swiss National Science Foundation (SNSF) and the EPFL are acknowledged for financial support. C. C. thanks Dr Matthew D. Wodrich and Prof. Konrad Patkowski for extensive discussion and critical reading of the manuscript. Prof. Jerome Waser is acknowledged for helpful discussions.
Footnotes
†Electronic supplementary information (ESI) available: Details of the ligand dataset, machine learning predictions, price of metals and ligands and all Cartesian coordinates (.xyz) are included in the file SuppInfo.tar.bz2. See DOI: 10.1039/c8sc01949e
References
- Collins K. D., Gensch T., Glorius F. Nat. Chem. 2014;6:859–871. doi: 10.1038/nchem.2062. [DOI] [PubMed] [Google Scholar]
- Jakel C., Paciello R. Chem. Rev. 2006;106:2912–2942. doi: 10.1021/cr040675a. [DOI] [PubMed] [Google Scholar]
- Reetz M. T. Angew. Chem., Int. Ed. 2001;40:284–310. [PubMed] [Google Scholar]
- Senkan S. Angew. Chem., Int. Ed. 2001;40:312–329. [PubMed] [Google Scholar]
- Sigman M. S., Harper K. C., Bess E. N., Milo A. Acc. Chem. Res. 2016;49:1292–1301. doi: 10.1021/acs.accounts.6b00194. [DOI] [PubMed] [Google Scholar]
- Santanilla A. B., Regalado E. L., Pereira T., Shevlin M., Bateman K., Campeau L.-C., Schneeweis J., Berritt S., Shi Z.-C., Nantermet P., Liu Y., Helmy R., Welch C. J., Vachal P., Davies I. W., Cernak T., Dreher S. D. Science. 2015;347:49–53. doi: 10.1126/science.1259203. [DOI] [PubMed] [Google Scholar]
- Friedfeld M. R., Shevlin M., Hoyt J. M., Krska S. W., Tudge M. T., Chirik P. J. Science. 2013;342:1076–1080. doi: 10.1126/science.1243550. [DOI] [PubMed] [Google Scholar]
- Robbins D. W., Hartwig J. F. Science. 2011;333:1423–1427. doi: 10.1126/science.1207922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sigman M. S., Jacobsen E. N. J. Am. Chem. Soc. 1998;120:4901–4902. [Google Scholar]
- Reetz M. T. Angew. Chem., Int. Ed. 2002;41:1335–1338. doi: 10.1002/1521-3773(20020415)41:8<1335::aid-anie1335>3.0.co;2-a. [DOI] [PubMed] [Google Scholar]
- Chen Z.-M., Hilton M. J., Sigman M. S. J. Am. Chem. Soc. 2016;138:11461–11464. doi: 10.1021/jacs.6b06994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Niemeyer Z. L., Milo A., Hickey D. P., Sigman M. S. Nat. Chem. 2016;8:610–617. doi: 10.1038/nchem.2501. [DOI] [PubMed] [Google Scholar]
- Guo J.-Y., Minko Y., Santiago C. B., Sigman M. S. ACS Catal. 2017;7:4144–4151. [Google Scholar]
- Harper K. C., Sigman M. S. Science. 2011;333:1875–1878. doi: 10.1126/science.1206997. [DOI] [PubMed] [Google Scholar]
- Sperger T., Sanhueza I. A., Kalvet I., Schoenebeck F. Chem. Rev. 2015;115:9532–9586. doi: 10.1021/acs.chemrev.5b00163. [DOI] [PubMed] [Google Scholar]
- Evans M. G., Polanyi M. Trans. Faraday Soc. 1938;34:11–24. [Google Scholar]
- Bell R. P. Proc. R. Soc. London, Ser. A. 1936;154:414–429. [Google Scholar]
- Hammett L. P. J. Am. Chem. Soc. 1937;59:96–103. [Google Scholar]
- Hammett L. P. Chem. Rev. 1935;17:125–136. [Google Scholar]
- Hammett L. P. Trans. Faraday Soc. 1938;34:156–165. [Google Scholar]
- Santiago C. B., Milo A., Sigman M. S. J. Am. Chem. Soc. 2016;138:13424–13430. doi: 10.1021/jacs.6b08799. [DOI] [PubMed] [Google Scholar]
- Brønsted J. N., Pedersen K. J. Z. Phys. Chem. 1924;108:185–235. [Google Scholar]
- Parsons R. Trans. Faraday Soc. 1958;54:1053–1063. [Google Scholar]
- Gerischer H. Bull. Soc. Chim. Belg. 1958;67:506–527. [Google Scholar]
- Calle-Vallejo F., Loffreda D., Koper M. T. M., Sautet P. Nat. Chem. 2015;7:403–410. doi: 10.1038/nchem.2226. [DOI] [PubMed] [Google Scholar]
- Man I. C., Su H.-Y., Calle-Vallejo F., Hansen H. A., Martinez J. I., Inoglu N. G., Kitchin J., Jaramillo T. F., Nørskov J. K., Rossmeisl J. ChemCatChem. 2011;3:1159–1165. [Google Scholar]
- Dau H., Limberg C., Reier T., Risch M., Roggan S., Strasser P. ChemCatChem. 2010;2:724–761. [Google Scholar]
- Vorotnikov V., Vlachos D. G. J. Phys. Chem. C. 2015;119:10417–10426. [Google Scholar]
- Kiss I. Z., Kazsu Z., Gaspar V. Phys. Chem. Chem. Phys. 2009;11:7669–7677. doi: 10.1039/b905295j. [DOI] [PubMed] [Google Scholar]
- Bockris J. O., Otagawa T. J. Electrochem. Soc. 1984;131:290–302. [Google Scholar]
- Trasatti S. Electrochim. Acta. 1984;29:1503–1512. [Google Scholar]
- Greeley J., Markovic N. M. Energy Environ. Sci. 2012;5:9246–9256. [Google Scholar]
- Nørskov J. K., Bligaard T., Logadottir A., Kitchin J. R., Chen J. G., Pandelov S., Stimming U. J. Electrochem. Soc. 2005;152:J23–J26. [Google Scholar]
- Seh Z. W., Kibsgaard J., Dickens C. F., Chorkendorff I., Nørskov J. K., Jaramillo T. F. Science. 2017;355:eaad4998. doi: 10.1126/science.aad4998. [DOI] [PubMed] [Google Scholar]
- Sabatier P., La catalysise en chimie organique, Librairie polytechnique, 1913. [Google Scholar]
- Kozuch S., Shaik S. Acc. Chem. Res. 2011;44:101–110. doi: 10.1021/ar1000956. [DOI] [PubMed] [Google Scholar]
- Ananikov V. P., Understanding Organometallic Reaction Mechanisms and Catalysis: Computational and Experimental Tools, Wiley, 2014. [Google Scholar]
- Swiegers G., Mechanical Catalysis: Methods of Enzymatic, Homogeneous, and Heterogeneous Catalysis, Wiley, 2008. [Google Scholar]
- Busch M., Wodrich M. D., Corminboeuf C. Chem. Sci. 2015;6:6754–6761. doi: 10.1039/c5sc02910d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Busch M., Wodrich M. D., Corminboeuf C. ACS Catal. 2017;7:5643–5653. [Google Scholar]
- Busch M., Wodrich M. D., Corminboeuf C. ChemCatChem. 2018;10:1592–1597. [Google Scholar]
- Wodrich M. D., Sawatlon B., Busch M., Corminboeuf C. ChemCatChem. 2018;10:1586–1591. [Google Scholar]
- Wodrich M. D., Busch M., Corminboeuf C. Chem. Sci. 2016;7:5723–5735. doi: 10.1039/c6sc01660j. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rupp M., Tkatchenko A., Müller K.-R., von Lilienfeld O. A. Phys. Rev. Lett. 2012;108:058301. doi: 10.1103/PhysRevLett.108.058301. [DOI] [PubMed] [Google Scholar]
- Montavon G., Rupp M., Gobre V., Vazquez-Mayagoitia A., Hansen K., Tkatchenko A., Müller K.-R., von Lilienfeld O. A. New J. Phys. 2013;15:095003. [Google Scholar]
- Hansen K., Montavon G., Biegler F., Fazli S., Rupp M., Scheffler M., von Lilienfeld O. A., Tkatchenko A., Müller K.-R. J. Chem. Theory Comput. 2013;9:3404–3419. doi: 10.1021/ct400195d. [DOI] [PubMed] [Google Scholar]
- Ramakrishnan R., Dral P. O., Rupp M., von Lilienfeld O. A. J. Chem. Theory Comput. 2015;11:2087–2096. doi: 10.1021/acs.jctc.5b00099. [DOI] [PubMed] [Google Scholar]
- Le T., Epa V. C., Burden F. R., Winkler D. A. Chem. Rev. 2012;112:2889–2919. doi: 10.1021/cr200066h. [DOI] [PubMed] [Google Scholar]
- Raccuglia P., Elbert K. C., Adler P. D. F., Falk C., Wenny M. B., Mollo A., Zeller M., Friedler S. A., Schrier J., Norquist A. J. Nature. 2016;533:73–76. doi: 10.1038/nature17439. [DOI] [PubMed] [Google Scholar]
- von Lilienfeld O. A. Angew. Chem., Int. Ed. 2018;57:4164–4169. doi: 10.1002/anie.201709686. [DOI] [PubMed] [Google Scholar]
- Janet J. P., Kulik H. J. J. Phys. Chem. A. 2017;121:8939–8954. doi: 10.1021/acs.jpca.7b08750. [DOI] [PubMed] [Google Scholar]
- Janet J. P., Chan L., Kulik H. J. J. Phys. Chem. Lett. 2018;9:1064–1071. doi: 10.1021/acs.jpclett.8b00170. [DOI] [PubMed] [Google Scholar]
- Janet J. P., Kulik H. J. Chem. Sci. 2017;8:5137–5152. doi: 10.1039/c7sc01247k. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maldonado A. G., Rothenberg G. Chem. Soc. Rev. 2010;39:1891–1902. doi: 10.1039/b921393g. [DOI] [PubMed] [Google Scholar]
- Ras E.-J., Louwerse M. J., Rothenberg G. Catal. Sci. Technol. 2012;2:2456–2464. [Google Scholar]
- Ras E.-J., Rothenberg G. RSC Adv. 2014;4:5963–5974. [Google Scholar]
- Madaan N., Shiju N. R., Rothenberg G. Catal. Sci. Technol. 2016;6:125–133. [Google Scholar]
- Vignola E., Steinmann S. N., Vandegehuchte B. D., Curulla D., Stamatakis M., Sautet P. J. Chem. Phys. 2017;147:054106. doi: 10.1063/1.4985890. [DOI] [PubMed] [Google Scholar]
- Kitchin J. R. Nature Catalysis. 2018;1:230–232. [Google Scholar]
- Timoshenko J., Lu D., Lin Y., Frenkel A. I. J. Phys. Chem. Lett. 2017;8:5091–5098. doi: 10.1021/acs.jpclett.7b02364. [DOI] [PubMed] [Google Scholar]
- Noh J., Back S., Kim J., Jung Y. Chem. Sci. 2018;9:5152–5259. doi: 10.1039/c7sc03422a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friedman J. H. Comput. Stat. Data Anal. 1999;38:367–378. [Google Scholar]
- Takigawa I., Shimizu K.-i., Tsuda K., Takakusagi S. RSC Adv. 2016;6:52587–52595. [Google Scholar]
- Gasper R., Shi H., Ramasubramaniam A. J. Phys. Chem. C. 2017;121:5612–5619. [Google Scholar]
- Jinnouchi R., Asahi R. J. Phys. Chem. Lett. 2017;8:4279–4283. doi: 10.1021/acs.jpclett.7b02010. [DOI] [PubMed] [Google Scholar]
- Cortes C., Vapnik V. Mach. Learn. 1995;20:273–297. [Google Scholar]
- Fernandez M., Boyd P. G., Daff T. D., Aghaji M. Z., Woo T. K. J. Phys. Chem. Lett. 2014;5:3056–3060. doi: 10.1021/jz501331m. [DOI] [PubMed] [Google Scholar]
- Fernandez M., Trefiak N. R., Woo T. K. J. Phys. Chem. C. 2013;117:14095–14105. [Google Scholar]
- Xin H., Holewinski A., Linic S. ACS Catal. 2012;2:12–16. [Google Scholar]
- Ma X., Li Z., Achenie L. E. K., Xin H. J. Phys. Chem. Lett. 2015;6:3528–3533. doi: 10.1021/acs.jpclett.5b01660. [DOI] [PubMed] [Google Scholar]
- Li Z., Wang S., Chin W. S., Achenie L. E., Xin H. J. Mater. Chem. A. 2017;5:24131–24138. [Google Scholar]
- Rasmussen C. E. and Williams C. K. I., Gaussian processes for machine learning, MIT Press, Cambridge, Mass, 2006. [Google Scholar]
- Ulissi Z. W., Tang M. T., Xiao J., Liu X., Torelli D. A., Karamad M., Cummins K., Hahn C., Lewis N. S., Jaramillo T. F., Chan K., Nørskov J. K. ACS Catal. 2017;7:6600–6608. [Google Scholar]
- Ulissi Z. W., Singh A. R., Tsai C., Nørskov J. K. J. Phys. Chem. Lett. 2016;7:3931–3935. doi: 10.1021/acs.jpclett.6b01254. [DOI] [PubMed] [Google Scholar]
- Ulissi Z. W., Medford A. J., Bligaard T., Nørskov J. K. Nat. Commun. 2017;8:14621. doi: 10.1038/ncomms14621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wexler R. B., Martirez J. M. P., Rappe A. M. J. Am. Chem. Soc. 2018;140:4678–4683. doi: 10.1021/jacs.8b00947. [DOI] [PubMed] [Google Scholar]
- Landrum G. A., Penzotti J. E., Putta S. Meas. Sci. Technol. 2005;16:270–277. [Google Scholar]
- Hansen K., Biegler F., Ramakrishnan R., Pronobis W., Von Lilienfeld O. A., Müller K.-R., Tkatchenko A. J. Phys. Chem. Lett. 2015;6:2326–2331. doi: 10.1021/acs.jpclett.5b00831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rupp M., Ramakrishnan R., von Lilienfeld O. A. J. Phys. Chem. Lett. 2015;6:3309–3313. doi: 10.1021/acs.jpclett.5b00831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramakrishnan R., Dral P. O., Rupp M., von Lilienfeld O. A. Sci. Data. 2014;1:140022. doi: 10.1038/sdata.2014.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faber F., Lindmaa A., von Lilienfeld O. A., Armiento R. Int. J. Quantum Chem. 2015;115:1094–1101. [Google Scholar]
- Faber F. A., Hutchison L., Huang B., Gilmer J., Schoenholz S. S., Dahl G. E., Vinyals O., Kearnes S., Riley P. F., von Lilienfeld O. A. J. Chem. Theory Comput. 2017;13:5255–5264. doi: 10.1021/acs.jctc.7b00577. [DOI] [PubMed] [Google Scholar]
- Huang B., von Lilienfeld O. A. J. Chem. Phys. 2016;145:161102. doi: 10.1063/1.4964627. [DOI] [PubMed] [Google Scholar]
- Bereau T., Andrienko D., von Lilienfeld O. A. J. Chem. Theory Comput. 2015;11:3225–3233. doi: 10.1021/acs.jctc.5b00301. [DOI] [PubMed] [Google Scholar]
- Browning N. J., Ramakrishnan R., von Lilienfeld O. A., Roethlisberger U. J. Phys. Chem. Lett. 2017;8:1351–1359. doi: 10.1021/acs.jpclett.7b00038. [DOI] [PubMed] [Google Scholar]
- Miyaura N., Yamada K., Suzuki A. Tetrahedron Lett. 1979;20:3437–3440. [Google Scholar]
- Miyaura N., Suzuki A. Chem. Rev. 1995;95:2457–2483. [Google Scholar]
- Suzuki A. Angew. Chem., Int. Ed. 2011;50:6722–6737. doi: 10.1002/anie.201101379. [DOI] [PubMed] [Google Scholar]
- Weininger D., Weininger A., Weininger J. L. J. Chem. Inf. Model. 1989;29:97–101. [Google Scholar]
- Weininger D. Proc. Edinb. Math. Soc. 1970:1–14. [Google Scholar]
- O'Boyle N. M., Banck M., James C. A., Morley C., Vandermeersch T., Hutchison G. R. J. Cheminf. 2011;3:33. doi: 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pizzi G., Cepellotti A., Sabatini R., Marzari N., Kozinsky B. Comput. Mater. Sci. 2016;111:218–230. [Google Scholar]
- Becke A. D. J. Chem. Phys. 1993;98:5648–5652. [Google Scholar]
- Lee C., Yang W., Parr R. G. Phys. Rev. B: Condens. Matter Mater. Phys. 1988;37:785–789. doi: 10.1103/physrevb.37.785. [DOI] [PubMed] [Google Scholar]
- Stephens P. J., Devlin F. J., Chabalowski C. F., Frisch M. J. J. Phys. Chem. 1994;98:11623–11627. [Google Scholar]
- Grimme S., Antony J., Ehrlich S., Krieg H. J. Chem. Phys. 2010;132:154104. doi: 10.1063/1.3382344. [DOI] [PubMed] [Google Scholar]
- Grimme S., Ehrlich S., Goerigk L. J. Comput. Chem. 2011;32:1456–1465. doi: 10.1002/jcc.21759. [DOI] [PubMed] [Google Scholar]
- Ditchfield R., Hehre W. J., Pople J. A. J. Chem. Phys. 1971;54:724–728. [Google Scholar]
- Binkley J. S., Pople J. A., Hehre W. J. J. Am. Chem. Soc. 1980;102:939–947. [Google Scholar]
- Gordon M. S., Binkley J. S., Pople J. A., Pietro W. J., Hehre W. J. J. Am. Chem. Soc. 1982;104:2797–2803. [Google Scholar]
- Pietro W. J., Francl M. M., Hehre W. J., DeFrees D. J., Pople J. A., Binkley J. S. J. Am. Chem. Soc. 1982;104:5039–5048. [Google Scholar]
- Weigend F., Ahlrichs R. Phys. Chem. Chem. Phys. 2005;7:3297. doi: 10.1039/b508541a. [DOI] [PubMed] [Google Scholar]
- Frisch M. J., Trucks G. W., Schlegel H. B., Scuseria G. E., Robb M. A., Cheeseman J. R., Scalmani G., Barone V., Petersson G. A., Nakatsuji H., Li X., Caricato M., Marenich A., Bloino J., Janesko B. G., Gomperts R., Mennucci B., Hratchian H. P., Ortiz J. V., Izmaylov A. F., Sonnenberg J. L., Williams-Young D., Ding F., Lipparini F., Egidi F., Goings J., Peng B., Petrone A., Henderson T., Ranasinghe D., Zakrzewski V. G., Gao J., Rega N., Zheng G., Liang W., Hada M., Ehara M., Toyota K., Fukuda R., Hasegawa J., Ishida M., Nakajima T., Honda Y., Kitao O., Nakai H., Vreven T., Throssell K., Montgomery Jr J. A., Peralta J. E., Ogliaro F., Bearpark M., Heyd J. J., Brothers E., Kudin K. N., Staroverov V. N., Keith T., Kobayashi R., Normand J., Raghavachari K., Rendell A., Burant J. C., Iyengar S. S., Tomasi J., Cossi M., Millam J. M., Klene M., Adamo C., Cammi R., Ochterski J. W., Martin R. L., Morokuma K., Farkas O., Foresman J. B. and Fox D. J., Gaussian 09, Revision D.01, Gaussian, Inc., Wallingford CT, 2016.
- Weigend F., Ahlrichs R. Phys. Chem. Chem. Phys. 2005;7:3297–3305. doi: 10.1039/b508541a. [DOI] [PubMed] [Google Scholar]
- Christensen A. S., Faber F. A., Huang B., Bratholm L. A., Tkatchenko A., Müller K. R. and von Lilienfeld O. A., QML: A Python Toolkit for Quantum Machine Learning, v0.3.1, 2017, 10.5281/zenodo.817332. [DOI]
- The trans isomerism constraint was imposed using the general chiral specification syntax of the SMILES notation (i.e., the @SP square-planar class symbol) as depicted (on the top right-hand corner) in Fig. 3
- Halgren T. A. J. Comput. Chem. 1996;17:490–519. [Google Scholar]
- Halgren T. A. J. Comput. Chem. 1996;17:520–552. [Google Scholar]
- Halgren T. A. J. Comput. Chem. 1996;17:553–586. [Google Scholar]
- Halgren T. A., Nachbar R. B. J. Comput. Chem. 1996;17:587–615. [Google Scholar]
- Halgren T. A. J. Comput. Chem. 1996;17:616–641. [Google Scholar]
- To refine the accuracy of the ML model in the targeted descriptor energy range, i.e., the top of the volcano, we exploited the trained model to predict the binding energies on a subset of complexes combining the 5 metals (Pt, Au, Ag, Cu, and Ni) and 72 ligands (from no. 0 to 71) and selected the molecules for which the ML predicted reaction energy was in the selected range (as opposed to randomly selecting additional candidates to extend the training set)
- Due to convergence problems, exactly 2595 binding energies from Pd complexes were used in the training set
- Lei P., Meng G., Ling Y., An J., Szostak M. J. Org. Chem. 2017;82:6638–6646. doi: 10.1021/acs.joc.7b00749. [DOI] [PubMed] [Google Scholar]
- Martin R., Buchwald S. L. Acc. Chem. Res. 2008;41:1461–1473. doi: 10.1021/ar800036s. [DOI] [PMC free article] [PubMed] [Google Scholar]
- David S. S., Stephen L. B. Angew. Chem., Int. Ed. 2008;47:6338–6361. [Google Scholar]
- Bartok A. P., De S., Poelking C., Bernstein N., Kermode J. R., Csanyi G., Ceriotti M. Sci. Adv. 2017;3:e1701816. doi: 10.1126/sciadv.1701816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faber F. A., Christensen A. S., Huang B., von Lilienfeld O. A. J. Chem. Phys. 2018;148:241717. doi: 10.1063/1.5020710. [DOI] [PubMed] [Google Scholar]
- Huang B. and Anatole von Lilienfeld O., ArXiv e-prints, 1707.04146, 2017.
- Axilrod B. M., Teller E. J. Chem. Phys. 1943;11:299–300. [Google Scholar]
- Muto Y. J. Phys. Soc. Jpn. 1943;17:629. [Google Scholar]
- Tibshirani R. and Friedman J., The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2001. [Google Scholar]
- Vapnik V., The nature of statistical learning theory, Springer science & business media, 2013. [Google Scholar]
- Cortes C., Jackel L. D., Solla S. A., Vapnik V. and Denker J. S., Advances in Neural Information Processing Systems, 1994, pp. 327–334. [Google Scholar]
- de Meijere A., Brase S. and Oestreich M., Metal-Catalyzed Cross-Coupling Reactions and More, Wiley-VCH, Weinheim, 2014. [Google Scholar]
- Colacot T., New Trends in Cross-Coupling: Theory and Applications, The Royal Society of Chemistry, Cambridge, 2015. [Google Scholar]
- Nishihara Y., Applied Cross-Coupling Reactions, Springer-Verlag, Berlin, 2013. [Google Scholar]
- Molander G. A., Cross-Coupling and Heck-Type Reactions, Thieme, Stuttgart, 2013. [Google Scholar]
- Bedford R. B., Hazelwood S. L., Albisson D. A. Organometallics. 2002;21:2599–2600. [Google Scholar]
- Mateo C., Fernandez-Rivas C., Cardenas D. J., Echavarren A. M. Organometallics. 1998;17:3661–3669. [Google Scholar]
- Ananikov V. P., Musaev D. G., Morokuma K. Organometallics. 2005;24:715–723. doi: 10.1021/om050255r. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han F.-S. Chem. Soc. Rev. 2013;42:5270–5298. doi: 10.1039/c3cs35521g. [DOI] [PubMed] [Google Scholar]
- Tasker S. Z., Standley E. A., Jamison T. F. Nature. 2014;509:299–309. doi: 10.1038/nature13274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang D., Wang Q. Coord. Chem. Rev. 2015;286:1–16. [Google Scholar]
- Livendahl M., Goehry C., Maseras F., Echavarren A. M. Chem. Commun. 2014;50:1533–1536. doi: 10.1039/c3cc48914k. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maaliki C., Thiery E., Thibonnet J. Eur. J. Org. Chem. 2017;2:209–228. [Google Scholar]
- Thapa S., Shrestha B., Gurung S. K., Giri R. Org. Biomol. Chem. 2015;13:4816–4827. doi: 10.1039/c5ob00200a. [DOI] [PubMed] [Google Scholar]
- Rao H. S. P., Rao A. V. B. J. Org. Chem. 2015;80:1506–1516. doi: 10.1021/jo502446k. [DOI] [PubMed] [Google Scholar]
- Hoshi M., Kawamura N., Shirakawa K. Synthesis. 2006;12:1961–1970. [Google Scholar]