Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2025 Sep 17;65(19):9856–9864. doi: 10.1021/acs.jcim.5c01449

Integrating Machine Learning into Free Energy Perturbation Workflows

Donald J M van Pinxteren 1, Willem Jespers 1,*
PMCID: PMC12529772  PMID: 40958764

Abstract

Free energy perturbation (FEP) methods are among the most accurate tools in structure-based drug design for predicting protein–ligand binding affinities. However, their adoption remains limited due to high computational demands and complex setup procedures. This review explores how integrating machine learning (ML), especially active learning (AL) and deep learning (DL), can enhance the efficiency, accessibility, accuracy, and precision of FEP workflows. It examines three key areas where ML has been successfully applied: sampling strategies, protocol optimization, and force field development. AL algorithms can significantly reduce the number of FEP calculations needed during virtual screening by guiding the molecule selection. DL-based protein–ligand cofolding methods such as AlphaFold, NeuralPLexer, and DragonFold enable the automated generation of accurate complex structures for FEP, bypassing traditional docking and preparation steps. Additionally, ML-derived neural network potentials (NNPs), trained on quantum mechanical data, offer improved force field accuracy, although at the cost of higher computational expenses. This review emphasizes a hybrid approach combining human expertise with ML tools as the most promising strategy for accelerating and democratizing FEP-based drug discovery. Future developments in this interdisciplinary space are expected to expand the scope and impact of computer-aided drug design across pharmaceutical and materials science applications.


graphic file with name ci5c01449_0006.jpg


graphic file with name ci5c01449_0004.jpg

1. Introduction

In the field of drug discovery and optimization, computational methods have been evolving rapidly. Due to increasing computational capacity, open-source chemical databases and protein structures, and improvements in efficiency, reliability, and accuracy in methods, computer-aided drug discovery (CADD) has become an important tool in drug discovery and optimization. , CADD includes ligand-based drug design (LBDD), where compounds are typically screened based on chemical similarity to the structures of known ligands, and structure-based drug design (SBDD), where a new drug is designed based on the structure of a protein. The latter includes methods such as molecular docking, molecular dynamics (MD) simulations, and free energy perturbation (FEP) methods. With the introduction and constant improvement of machine learning (ML)-based protein structure prediction models like AlphaFold, more high-resolution protein structures are becoming available, and SBDD is fully adopted as a ligand discovery and optimization method in the traditional drug discovery process. ,

While molecular docking and MD simulations can give great insights into the interactions ligands have with a protein’s binding site, FEP methods can be used to predict binding affinities quantitatively. In this review, we will focus on two applications of FEP, namely, absolute free energy perturbation (ABFE) and relative free energy perturbation (RBFE). Each of these methods has in common that they scale the potential energies of two states through a series of small steps, popularly referred to as alchemical changes. During each step, MD simulations are used to calculate the change in free energy by using, for instance, FEP, thermodynamic integration (TI), or the (multistate) Bennet acceptance ratio ((M)­BAR). By summing these energy differences across all steps, FEP yields the overall binding free energy difference between two states in a thermodynamic cycle. In RBFE, these include two ligands in a solvent and protein leg; for ABFE, these are the original ligand and its noninteracting dummy atoms using the same legs.

RBFE is the most commonly used method in drug discovery campaigns. It is used in lead optimization processes and is less expensive and more accurate than ABFE. , However, RBFE requires that the alchemical changes between the two ligands are minor, and a reference ligand is needed to calculate the actual binding affinity. This potentially can be overcome using ABFE, although modeling the relevant (pseudo) apo state of the protein is a key challenge. In drug discovery campaigns, residue FEP is commonly used to calculate the effect of protein mutations to ligand binding or alternatively to predict changes in affinity of a peptide to a protein.

Since the systematic investigation of the performance of FEP in retrospective analyses, the prospective application of FEP methods has been reported in various cases. For instance, Bayer utilized FEP to develop inhibitors targeting the Kirsten rat sarcoma (KRAS) gene with the G12C mutation, enabling the identification of a new scaffold with a novel binding mode that enhanced affinity. Schrödinger applied FEP potency predictors to guide the design of a tyrosine kinase 2 (TYK2) inhibitor with improved binding characteristics. Janssen leveraged FEP to discover new [1,2,4]­triazolo­[1,5-a]­pyrimidine phosphodiesterase 2A (PDE2A) inhibitors, including compounds with unexpectedly high activity. Similarly, AstraZeneca employed FEP in the lead optimization of an ATP-competitive c-MET inhibitor. Additionally, two studies by Tandaric et al. and Majellaro et al. used QligFEP and QresFEP to elucidate the binding mode of the adenosine A2B receptor (A2BAR). ,

FEP methods have demonstrated significant value and accuracy in the drug discovery and optimization process. However, they are still hindered by challenges such as high computational demands and the significant effort required for system preparation. Thus, developing more efficient methods that require less computational power and data input could greatly enhance the ability to screen larger molecular libraries and explore the chemical space more comprehensively. This advancement would accelerate early-stage drug discovery and potentially lower the cost and time associated with identifying promising drug candidates. Such improvements could also make these tools more accessible to research groups with limited computational resources.

Machine learning (ML) algorithms, particularly deep learning (DL) and active learning (AL), hold significant potential to enhance the efficiency and accuracy of data-driven processes across various scientific domains. , Their integration into FEP methodologies presents a promising opportunity to improve both the reliability and the computational performance of these techniques. Recent studies have demonstrated the application of ML in optimizing force fields, automating FEP protocol development, and improving sampling strategies. Furthermore, a comprehensive review by Qian et al. provides a detailed overview of recent advances in alchemical transformation methods, and a paper by Mey et al. elaborates on the best practices for alchemical free energy calculations allowing a systematic application of FEP. , In this review, we aim to provide further insights into how the convergence of ML and FEP may further empower CADD within both academic and industrial pharmaceutical research.

2. Machine Learning Implementations in Free Energy Perturbation Methods

This section addresses three critical components of FEP workflows that are being enhanced through the integration of ML: sampling strategies, protocol setup, and force field development. These areas of advancement aim to reduce the number of FEP calculations required for comprehensive compound library screening, improve the accuracy and robustness of FEP setup procedures, and increase the accuracy of FEP and underlying MD simulations. Each subsection provides an in-depth exploration of the relevant methodologies and ML-driven innovations that contribute to more efficient and reliable FEP computations, summarized in Table .

1. Overview of the Three Sections of the FEP Workflow Discussed in This Review and the Use of ML Therein.

subsection section of the FEP workflow references
2.1 AL for FEP Sample Selection screening efficiency ,,,,
2.2 FEP Protocol and Protein Setup FEP preparation ,,,,,,,,,,,,,
2.3 Force Field Enhancements MD accuracy and efficiency ,,,,,,,,,,,,

2.1. AL for FEP Sample Selection

AL has emerged as a promising strategy for improving sampling efficiency in the screening of chemical libraries using FEP. Although still relatively underexplored, several studies have demonstrated the application of AL frameworks in which quantitative structure–activity relationship (QSAR) models are trained on FEP-generated data to prioritize the selection of new candidate molecules. The primary objective of this approach is to maximize the identification of high-affinity ligands while minimizing the number of costly FEP simulations required.

Scheme shows a graphical overview of the general AL framework. A chemical library of interest is selected. This set is iteratively split into subsets: a training set for the ML model, a subset for the FEP calculations, and an independent test set. After FEP calculations are performed on the subset, the results are used to retrain the QSAR model, and its performance is assessed on the test set. In the next iteration, a new subset of the test set is acquired for FEP calculations to add to the training set for the QSAR model. Generally, two types of selection criteria can be used. An exploitative or greedy acquisition focuses on the compounds most likely to have the highest binding affinity. Alternatively, an explorative acquisition focuses on compounds that have the highest uncertainty in the predicted binding affinity from the machine learning model.

1. Schematic Overview of Active Learning Enhanced Sampling Techniques for Library Screening with FEP.

1

A typically used metric to assess the performance of an AL-driven screen is the recall, which is defined as the number of high-affinity compounds found divided by the total number of high-affinity compounds present in the data set. By focusing on this metric, the model does not miss potent candidates while spending resources on the most informative or promising molecules. Key parameters investigated in AL-FEP workflows include the choice of ML algorithm for QSAR modeling, molecular descriptors used as input features, initial training set composition, batch size, acquisition function, and the number of iterative selection rounds, which we discuss in more detail in the next paragraphs.

To assess the efficiency of AL-FEP in exploring the chemical space, Khalak et al. tested and optimized multiple molecular descriptors and acquisition methods. Three types of descriptors were tested: RDKit-generated molecular fingerprints; MedusaNet descriptors representing physics-based descriptors; and protein–ligand interaction fingerprints, named PLEC. ,− In addition, Khalak et al. tested the acquisition method in the AL loop: random selection (selecting random compounds at each iteration); greedy selection (choosing the top predicted binders at each iteration); uncertainty selection (picking compounds with the highest uncertainty in the binding affinity prediction); mixed strategy (first selecting the top predicted binders for three iterations and then the most uncertain); and a narrowing strategy (combining broad selection in the first three iterations with a subsequent switch to a greedy approach). During the initial iterations, several models were trained, each using different sets of the ligand descriptors described earlier, and the five models with the lowest cross-validation root-mean-square error (RMSE) were identified. From each of these models, the 20 best predicted binders were then selected. Regarding the acquisition method, overall, uncertain, and random ligand selection sampling broadly covered the chemical library and offered a better overall description of the chemical space. In the case of the molecular descriptors, RDKit’s molecular fingerprints outperformed the interaction fingerprints as well as the physics-based descriptors. However, to efficiently identify the most potent binders, other strategies, such as a greedy or narrowing approach, were recommended by the authors.

Thompson et al. tested different ML algorithms, acquisition methods, initial subset selection methods, and batch sizes of 20, 40, 60, 80, or 100 molecules per iteration. In an effort to systematically assess the effect of several optimization strategies for AL, Thompson et al. created a data set of 10,000 RBFE calculations on a congeneric series. The biggest impact in performance came from different batch sizes per iteration. Almost no difference in performance was measured between the initial subset selection methods and ML algorithm choice. They observed a slight improvement in performance with greedy acquisition methods. More importantly, they found that using batch sizes of 60 or more molecules per iteration was sufficient to identify at least 50% of the top-binding compounds within five iterations (sampling a total of 300 molecules). In contrast, smaller batch sizes required more iterations to achieve the same level of high-affinity compound identification across all active learning setup configurations. For example, 10 iterations with a batch size of 20, sampling a total of 200 compounds, would recall around 40% of the 100 highest affinity compounds with the same AL setups. As conclusion, they sampled 5% of a 10,000 data set identifying 50% of the top molecules. Notably, Thompson et al. state in the discussion that the AL-FEP method configuration depends on the data set and target the experiment is performed on. For this data set, they concluded that a simple random initial subset with a greedy acquisition function performs as well as complex methods trying to balance an exploring and exploiting method tested in previous studies.

This is supported by a study by Gorantla et al., who tested multiple setup configurations for four different targets. These setups used Chemprop or a Gaussian process regression model as ML algorithm; a random, uncertainty-based, greedy, or a mixed acquisition method; increasing initial batch sizes of 60 or 120 molecules for an exploration phase; and iterative batch sizes with 20, 30, 60, or 120 molecules per iteration for an exploitation phase. For more chemically diverse and bigger data sets, the recall number benefits from a bigger initial batch size. The bigger the initial batch size is, the higher are the recall numbers for all data sets, but experiments with smaller data sets could benefit from the trade-off of saving on computational costs with a smaller number of samples overall. For iterative batch sizes, smaller batch sizes resulted in higher recall numbers, indicating that these batch sizes favor precision and recall. Bigger training sets for the ML model in the exploitation phase could lead to noise and worsen predictive performance.

Gusev et al. suggest combining an exploring strategy in the first iterations with a narrowing (exploiting) technique afterward. After selecting a diverse initial sample set, five cycles with a balanced selection method, one cycle with a randomly selected sample set, and one cycle with an exploitive selection method were used. Gusev et al. used AutoML to select an ML model algorithm, hyperparameters, and data representation for every iteration to generate the best ML model configuration for each cycle. AutoML was presented with molecular descriptors PLEC, Morgan fingerprints, RDKit 3D molecular fingerprints, and RDKit molecular fingerprints; and ML algorithms linear regression, Gaussian process regression, and Gaussian Process regression with Tanimoto kernel. ,, This resulted in the configuration of five different ML models for seven iterative cycles. This led to identifying 133 potential new inhibitors for severe acute respiratory syndrome coronavirus 2 papain-like protease while sampling 253 compounds out of a data set of 8175.

Recently, a study by Lonsdale et al. presented a systematic benchmarking of AL-FEP for ligand potency prediction in lead optimization. A distinctive focus of their study is the role of chemical diversity, comparing AL-FEP performance on two bromodomain inhibitor series: one with a constant core scaffold and another allowing for core modifications. They demonstrate that while AL-FEP can efficiently generate predictive ML models within a few rounds, the effectiveness is highly context-dependent. With a low scaffold diversity, models rapidly achieve high accuracy and effectively enrich potent binders. However, pools with diverse scaffolds challenge the ML model’s performance, leading to reduced prediction quality for underrepresented scaffolds. Lonsdale et al. test a range of selection strategies, including explore–exploit mixture strategies, a grid-diversified selection method, and a greedy-diversified selection method. They show that protocol choice determines whether the model excels at finding top binders or provides broad accuracy across the potency space, especially when looking at diverse scaffolds. While a greedy selection strategy may be effective when maximizing binding potency alone, for optimizing multiple properties or exploring multiple end points, an exploratory model that maintains high-potency predictive accuracy across multiple scaffolds is preferable.

Collectively, these five studies underline the potential of AL strategies to enhance the efficiency of FEP screening, with reported speed improvements of up to 20-fold compared to conventional brute-force methods. The effectiveness of AL can be further optimized by adjusting various parameters, such as acquisition strategies, batch sizes, and initial training sets, based on the specific objectives of a given experiment. Caution however is warranted, as the optimal combination of these parameters is target and data set dependent, making it challenging to draw general conclusions. Efforts to build community benchmarks, such as those by Thompson et al., are needed to progress the field. However, even with the large (10,000 ligands) data set reported in their work, they selected similar compounds to the reference compound, which may reduce the predictive accuracy of AL-FEP methods when trying to find chemical diverse new compounds. It is additionally worth mentioning that these five papers are retrospective studies, and whether the optimization strategies truly improve screening effectiveness remains to be studied prospectively.

2.2. FEP Protocol and Protein Setup

With recent advancements in the efficiency of FEP calculations, the primary bottleneck has shifted from computation to the meticulous setup required for accurate simulations. Each protein system necessitates a customized approach, and ligand data sets must be precisely configured to ensure reliable binding affinity predictions. Critical factors such as the protonation states of titrable residues (particularly histidine) and correct inclusion of relevant (buried) water molecules within the binding site can significantly affect the accuracy of FEP outcomes. , These system-specific variables demand deep insight and understanding of the protein–ligand complex and careful experimental design. ML and AL protocols could streamline these processes with less human interaction and preparation steps. Scheme a illustrates the current process used to prepare the FEP calculations. Scheme b provides a graphical overview of the preparation steps using recently developed modeling (ML) tools. In this section, we will discuss these tools in more detail.

2. Comparison of the Schematic Workflow of Protein Setup for FEP Methods .

2

a (A) The traditional workflow includes protein structure acquisition, preparation, and binding pose creation before FEP. (B) An alternative workflow using cofolding techniques.

To streamline the process of FEP protocol optimization, de Oliveira et al. developed the FEP Protocol Builder (FEP-PB), an AL protocol that automates the creation of FEP+ protocols. This protocol enables optimization across multiple aspects of the FEP setup, including protein structure selection, custom core definition, equilibration time scale, water model and sampling strategies, lambda scheme, replica exchange settings for both protein and ligand, enhanced sampling settings for ligands, reference ligand selection, force field choice, and the tautomeric, protonation, and orientation states of residues. First, it generates random protocols for a series of compounds from a training set, which are subjected to a short 1 ns FEP simulation. The performance is assessed using the RMSE of the FEP predicted and experimental binding free energies, and an AutoML model is trained to predict and select top protocols, iteratively refining them with extended simulations for convergence before testing final protocols on the test set to identify the best-performing and robust model. In some cases, the optimized model outperformed protocols designed by experienced computational chemists in both speed and predictive accuracy.

Cofolding methods are emerging as promising alternatives to traditional/AI docking techniques. These approaches predict protein–ligand complexes directly through joint folding of the ligand and protein, potentially yielding more accurate representations of binding poses, protein conformations, and local protonation states. Notable advancements in this area include AlphaFold3 by DeepMind and HelixFold3 by Furui and Ohue, both reporting improved accuracy in predicting protein–ligand complexes compared to conventional docking approaches. , NeuralPLexer3 leverages a multiscale deep learning architecture to predict protein–ligand complex structures with a reported improvement in capturing ligand-induced conformational changes compared to AlphaFold3 and improves computational efficiency by being up to 15 times faster. In parallel, several open-source alternatives to AlphaFold3 have been developed, including Chai-1, Boltz-1, and Boltz-1x. While Boltz-1 primarily focuses on single-chain protein structure prediction in comparison to AlphaFold2, Chai-1 and Boltz-1x expand functionality toward multimolecular complex prediction using multiple sequence alignments (MSAs), chemical inputs, or even a single sequence.

Boitreaud et al. report that Chai-1 attains a performance comparable to AlphaFold3 across a range of targets, although its accuracy diminishes when wild-type residues are substituted with modified amino acids, a limitation likely rooted in its training on explicitly optimized structures and sequences. Additionally, Chai-1 performs well at predicting individual molecular structures but struggles with resolving correct relative orientations within complexes. In contrast, Boltz-1 and Boltz-1x exhibit comparable structural prediction performance and demonstrate superior physical plausibility, as measured by physical validity scores.

The latest advancement in this family, Boltz-2, extends beyond structural prediction by incorporating a binding affinity estimation into its modeling framework. This dual capability enables Boltz-2 to generate both high-quality protein–ligand complex structures and approximate free energy predictions in a single step. Such integration is particularly promising for use in AL-FEP workflows, where rapid prescreening and prioritization of candidate compounds are essential. By supplying both structural models and binding affinity estimates, Boltz-2 offers an efficient means of selecting the most informative compounds for full FEP simulations, thereby reducing the computational cost while maintaining predictive accuracy.

Charm Therapeutics combined cofolding with FEP using an algorithm named DragonFold. This tool aims to skip traditional docking procedures by predicting protein–ligand structures, increasing the accuracy of the protein structure and the binding position of the ligand in the binding pocket, while skipping a time-consuming preparation step. In addition, truncated representations of the system are introduced, similar to other methods such as MCPRO and QligFEP. , . A study on PFKFB3 showed that similar performance on the cofolded system could be achieved compared to the X-ray and docking starting structures. In addition, it was shown that the performance on the truncated systems was equivalent (∼1 kcal/mol MUE on average) to the full systems but with an overall increase in performance of 2- to 4-fold.

Third-party benchmarking studies have recently provided further insights into the capabilities and limitations of cofolding and AI methods in protein–ligand complex prediction. Large-scale evaluations such as PoseX and Runs N’ Poses have compared AI cofolding approaches, including AlphaFold3, Chai-1, Boltz-1/1x, Protenix, and NeuralPLexer3, against both traditional docking and each other using independent data sets released after the methods’ training cutoffs. Collectively, these studies reveal that while cofolding models outperform classical docking in pose prediction and structural plausibility in favorable scenarios, their success remains strongly tied to the similarity of the test system to training data, raising concerns about overfitting and generalization to novel targets. , The Runs N’ Poses benchmark demonstrates that cofolding models can struggle to correctly predict ligand binding modes for drug-like molecules not well-represented in their training sets, although they perform better with promiscuous ligands or cofactors encountered more frequently in experimental structures. Meanwhile, physical validity analyses confirm that recent advances (e.g., Boltz-1x) lead to improved stereochemistry and plausibility of generated complexes compared to prior iterations , Crucially, benchmarking also highlights persistent issues such as handling of chirality, ranking of alternative poses, and reduced accuracy when significant conformational rearrangements or post-translational modifications are present. These findings underline both the promise and the challenges of cofolding.

2.3. Force Field Enhancements

MD simulations and FEP calculations are critically dependent on the accuracy and reliability of the underlying force fields (FFs). Current FEP workflows predominantly rely on molecular mechanics (MM) force fields, which use physics-based models to simulate interatomic interactions such as bond stretching, angle bending, torsional rotation, and nonbonded forces. A variety of both open-source and commercial MM force fields have been developed, many of which offer comparable levels of accuracy across a broad range of chemical systems. Despite ongoing improvements, limitations in the transferability and parametrization of MM force fields remain a source of uncertainty in binding free energy predictions, highlighting the need for continued refinement and benchmarking across diverse molecular contexts.

While molecular mechanics (MM) force fields offer the efficiency required for large-scale simulations, they are limited in their ability to capture complex electronic effects. In contrast, more accurate force fields can be derived from quantum mechanical (QM) or hybrid QM/MM methods, which directly model electronic interactions. However, the computational cost of these methods makes them impractical for routine use in FEP workflows. As a result, QM calculations are typically restricted to small, chemically diverse molecules and used in conjunction with MM force fields to improve the accuracy in specific contexts, for example, the use of density functional theory calculations done by Bui et al. for geometry optimization of potential drug molecules.

To bridge the gap between accuracy and efficiency, machine-learning-derived force fields, commonly termed neural network potentials (NNPs), have been developed. These models are trained on high-level QM data and aim to replicate QM-level accuracy with significantly reduced computational cost. Scheme shows a graphical overview of the generation of MM (3a) and QM/MM or NNP (3b) force fields. Recent efforts have focused on enhancing the transferability of NNPs across different molecular states, as demonstrated by Kovács et al. in the development of MACE-OFF. In addition, attention has been directed toward improving data efficiency through innovative training strategies and increasing robustness against noisy input data.

3. Schematic Workflow for Obtaining (A) MM Force Fields and (B) QM/NN Force Fields.

3

Despite encouraging results, several challenges remain in the practical integration of the NNP and MM/NNP hybrid force fields. For instance, widely used models such as ANI are currently limited in scope to small molecules. ,, These methods typically remain significantly slower and more resource-intensive than conventional MM force fields, which limit their scalability in high-throughput settings or larger systems such as protein systems. Continued advancements in elemental coverage, solvent modeling, and computational efficiency will be essential for the broader adoption of NNP-based force fields in FEP applications.

In this context, significant progress has been made by Anstine et al. with the development of AIMNet2, a neural network potential designed to overcome several enduring limitations of previous machine learning potentials. AIMNet2 offers broad elemental coverage and supports both neutral and charged molecular states. Its architecture effectively integrates machine-learned short-range interactions with physics-based long-range terms such as explicit electrostatics and dispersion, enabling accurate representation of both local and nonlocal effects. Benchmark studies demonstrate that AIMNet2 achieves accuracy on par with hybrid density functional theory methods across a variety of molecular modeling challenges, including conformer generation, interaction energy calculations, geometry optimization, and molecular dynamics, while considerably outperforming traditional semiempirical approaches in both chemical space coverage and computational efficiency. Importantly, AIMNet2 models are made available as open-source tools and can be applied directly to diverse chemical systems without requiring retraining, representing a substantial advancement toward the integration of machine-learning-based force fields into scalable, automated molecular simulation workflows.

Zariquiey et al. conducted a systematic evaluation of the ANI-2x NNP for RBFE applications, benchmarking its performance against the GAFF2 force field and the commercial FEP+ platform across a range of protein–ligand systems. Their analysis revealed that ANI-2x did not produce substantially different conformational ensembles compared to traditional force fields, with only modest improvements in predictive accuracy observed. However, these gains came at a significant computational cost, approximately 8-fold higher than FEP+ when run on an NVIDIA RTX 4090 GPU. While NNP force fields like ANI-2x offer the potential for enhanced accuracy in challenging systems, they currently lack the efficiency and scalability of established MM force fields. Nonetheless, they may be valuable in cases where MM force fields fail to yield reliable results, serving as an intermediate between classical and fully QM-based methods.

A comprehensive study by Karwounopoulos et al. benchmarked the ANI-2x neural network potential, applied both for ML/MM end-state corrections and for reparametrizing torsion potentials within MM force fields across several drug discovery benchmark systems. Their findings show that ML/MM methods achieve comparable accuracy to well-parametrized MM force fields, with average errors of around 0.8–0.9 kcal/mol. However, ML/MM end-state corrections exhibited higher variance and computational costs in comparison to the torsion refitting approach, which provides a computationally efficient alternative without sacrificing accuracy. The limited improvement from ML/MM mechanical embedding is attributed to the unchanged MM treatment of protein–ligand interactions, underscoring the need for enhanced phase-space overlap and more advanced embedding schemes in future developments. These insights emphasize that ML-driven torsion parameter refinement can be a practical strategy for improving force fields in automated drug discovery pipelines with minimal computational overhead.

More recently, the same group introduced a hybrid NNP/MM approach, QuantumBind-RBFE, which was also benchmarked against GAFF2 and ANI-2x. Quantumbind-RBFE uses the NNP AceFF1.0, showing improved accuracy in binding free energy predictions and supporting larger integration timesteps (2 versus 1 fs) without compromising simulation stability on the same GPU hardware (AceFF1.0 model: https://huggingface.co/Acellera/AceFF-1.0). These enhancements mark progress toward more practical ML-augmented force fields for FEP workflows. However, AceFF1.0, like any ML model, is dependent on the diversity and representativeness of its training data.

To further address the computational challenges of NNPs, Tkaczyk et al. developed an alternative strategy that reweights MM force field simulations using ANI-2x. This method offers a more computationally efficient pathway than direct NNP-based dynamics while retaining the accuracy benefits associated with neural network potentials. When evaluated for single-step FEP and nonequilibrium (NEQ) free energy calculations, ANI-2x reweighting yielded mixed results: the performance in NEQ calculations was promising, but single-step FEP results were found to be less accurate. These findings highlight the ongoing trade-offs between speed, accuracy, and methodological robustness in efforts to integrate ML-derived potentials into FEP simulations. Continued development of hybrid workflows, expanded training sets, and adaptive integration strategies will be crucial for realizing the full potential of NNPs in drug discovery pipelines.

The OpenMM simulation environment has also integrated support for hybrid NNP/MM methodologies, in which neural network potentials are applied to small molecules while the surrounding environment is treated with conventional molecular mechanics force fields. Building on this capability, Ding et al. introduced a deep-learning-based potential model, known as OpenMM-DeepMD, which enables the use of high-precision machine learning force fields implemented in C++/CUDA and accessible through a Python interface. The DeepMD plugin was rigorously validated through a series of benchmarks, including energy conservation tests, thermodynamic ensemble consistency, structural and kinetic property evaluations, and hydration free energy calculations. In all cases, the results were consistent with the experimental reference data, confirming both the accuracy and robustness of the implementation.

Notably, the plugin supports both fixed- and adaptive-region deep potential/MM (DP/MM) simulations, allowing for dynamic partitioning of the system into machine learning and classical regions. Performance benchmarks demonstrated simulation speeds of up to 159 ns/day on an NVIDIA 1060 Ti GPU, showcasing the method’s practical efficiency. By integration of the deep potential model directly into OpenMM, this plugin enables accurate, flexible, and scalable ML-based molecular simulations, including advanced applications such as free energy calculations and hybrid modeling. This development represents a meaningful step toward the routine use of machine learning force fields in large-scale, physics-informed molecular simulations.

3. Conclusion and Perspectives

In recent years, significant progress has been made in enhancing FEP methodologies through the integration of ML techniques. This review has outlined key developments across several fronts, including sampling strategies, automated protocol generation, protein structure prediction, and cofolding models, as well as advancements in force field design. While improvements in sampling and protocol automation are primarily aimed at increasing the efficiency and scalability of FEP workflows, innovations in protein modeling and force field development target higher accuracy and broader applicability.

One of the most promising directions involves the use of ML-enhanced or NNP-hybrid force fields. Although these approaches are currently more computationally intensive than classical MM force fields, they have the potential to become standard practice as computational resources continue to advance. Importantly, the development of high-accuracy force fields extends beyond drug discovery, with significant implications for materials science, chemical engineering, and other domains that rely on molecular dynamics simulations. This cross-disciplinary relevance is likely to accelerate innovation, particularly if research findings are openly shared across academia, industry, and related scientific fields.

Cofolding models such as Boltz-1x and AlphaFold3 also hold considerable potential to simplify and democratize FEP workflows. The traditional protein setup for FEP requires deep expertise in structural biology and computational biochemistry, often making it inaccessible to nonspecialists. In contrast, cofolding models generate plausible protein–ligand complex structures using only the primary sequence and ligand input, significantly lowering the technical barrier. However, the current models are not without limitations. First, their performance drops with novel drug-like molecules not represented in training sets. Advances like Boltz-1x improve stereochemistry and physical validity, yet challenges remain in chirality handling, pose ranking, accuracy with major conformational changes, and modeling of wild-type substitutions. Second, all cofolding approaches still require greater computational resources compared to standard docking and protein preparation methods. Recent developments such as DragonFold and its truncated models represent an important step forward by offering lower computational demands while maintaining compatibility with FEP requirements.

Overall, cofolding shows promise but also clear limitations in generalizability and pose prediction reliability. In parallel, automated protocol-building tools such as the FEP Protocol Builder (FEP-PB) further reduce the user burden, enabling both novice and experienced users to generate high-quality FEP protocols. While fully automated systems have shown promising results, a hybrid human–ML approach may offer the optimal balance of efficiency, flexibility, and scientific rigor. This paradigm could extend to other aspects of the FEP pipeline, as well. For instance, in large-scale virtual screening campaigns, current ML models remain limited in their ability to assess synthesizability, off-target effects, and in vivo behavior. Integrating expert knowledge, particularly from medicinal chemists, into a “human-in-the-loop” framework could significantly enhance the downstream quality of candidate molecules.

Taken together, these developments suggest a future in which FEP workflows will be faster, more accurate, and more accessible. Continued investment in ML-guided automation, hybrid modeling strategies, and interdisciplinary collaboration will be essential to realize this vision and to extend the impact of FEP across a wider range of scientific and industrial applications.

Glossary

ABBREVIATIONS

CADD

computer-aided drug discovery

SBDD

structure-based drug design

MD

molecular dynamics

FEP

free energy perturbation

ML

machine learning

resFEP

residue free energy perturbation

ABFE

absolute binding free energy

RBFE

relative binding free energy

TI

thermodynamic integration

(M)­BAR

(multistate) Bennet acceptance ratio

KRAS

Kirsten rat sarcoma

TYK2

tyrosine kinase 2

PDE2A

phosphodiesterase 2A

c-MET

mesenchymal–epithelial transition factor

A2BAR

A2b adenosine receptor

DL

deep learning

AL

active learning

QSAR

quantitative structure–activity relationship

MLP

multilayer perceptron

FEP-PB

FEP Protocol Builder

MM

molecular mechanics

QM

quantum mechanics

NNP

neural network potential

FF

force field

ANI

Accurate NeurAl networK engine

MACE-OFF

MACE-based neural potential-Optimization for Force Field

MUE

mean unsigned error

NEQ

nonequilibrium

DP

deep potential

DP/MM

deep potential/molecular mechanics

GPU

graphics processing unit

PFKFB3

6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3

X-ray

X-ray crystallography

MSA

multiple sequence alignment

OpenMM

Open Molecular Mechanics

D.v.P. and W.J. contributed equally to the conception, research, writing, and revision of this manuscript. Both authors were involved in the literature review, drafting of the main text, and creation of figures and tables. Both authors reviewed and approved the final manuscript.

This publication is part of the project Novel Oncological Targets-Inhibiting Cancer via Mutated G Proteins with file number VI.Veni.232.243 (partly) financed by the Dutch Research Council (NWO). The authors are grateful for the opportunities and support afforded by the Sector Plan Pharmaceutical Sciences, implemented in the overarching Sector Plan Beta II and put into action by the Ministry of Education, Culture, and Science (OCW) of The Netherlands.

The authors declare the following competing financial interest(s): Willem Jespers is CSO of MODSIM Pharma AI B.V.

References

  1. Wang K., Huang Y., Wang Y., You Q., Wang L.. Recent advances from computer-aided drug design to artificial intelligence drug design. RSC Med. Chem. 2024;15(12):3978. doi: 10.1039/D4MD00522H. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Mullard A.. 2024 FDA approvals. Nat. Rev. Drug Discov. 2025;24(2):75. doi: 10.1038/d41573-025-00001-5. [DOI] [PubMed] [Google Scholar]
  3. Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A.. et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., Ronneberger O., Willmore L., Ballard A. J., Bambrick J.. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493–500. doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Mey A. S. J. S., Allen B. K., Macdonald H. E. B., Chodera J. D., Hahn D. F., Kuhn M., Michel J., Mobley D. L., Naden L. N., Prasad S.. et al. Best Practices for Alchemical Free Energy Calculations [Article v1.0] Living J. Comput. Mol. Sci. 2020;2(1):18378. doi: 10.33011/livecoms.2.1.18378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Khalak Y., Tresadern G., Aldeghi M., Baumann H. M., Mobley D. L., Groot B. L. d., Gapsys V.. Alchemical absolute protein–ligand binding free energies for drug design. Chem. Sci. 2021;12(41):13958–13971. doi: 10.1039/D1SC03472C. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cournia Z., Allen B., Sherman W.. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. J. Chem. Inf. Model. 2017;57(12):2911. doi: 10.1021/acs.jcim.7b00564. [DOI] [PubMed] [Google Scholar]
  8. Jespers W., Isaksen G. V., Andberg T. A. H., Vasile S., Veen A. v., Åqvist J., Brandsdal B. O., Gutiérrez-de-Terán H.. QresFEP: An Automated Protocol for Free Energy Calculations of Protein Mutations in Q. J. Chem. Theory Comput. 2019;15(10):5461–5473. doi: 10.1021/acs.jctc.9b00538. [DOI] [PubMed] [Google Scholar]
  9. Mortier J., Friberg A., Badock V., Moosmayer D., Schroeder J., Steigemann P., Siegel F., Gradl S., Bauser M., Hillig R. C.. et al. Computationally Empowered Workflow Identifies Novel Covalent Allosteric Binders for KRASG12C. ChemMedChem. 2020;15(10):827. doi: 10.1002/cmdc.201900727. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Leit S., Greenwood J., Carriero S., Mondal S., Abel R., Ashwell M., Blanchette H., Boyles N. A., Cartwright M., Collis A.. et al. Discovery of a Potent and Selective Tyrosine Kinase 2 Inhibitor: TAK-279. J. Med. Chem. 2023;66(15):10473. doi: 10.1021/acs.jmedchem.3c00600. [DOI] [PubMed] [Google Scholar]
  11. Tresadern G., Velter I., Trabanco A. A., Keybus F. V. d., Macdonald G. J., Somers M. V. F., Vanhoof G., Leonard P. M., Lamers M. B. A. C., Roosbroeck Y. E. M. V., Buijnsters P. J. J. A.. [1,2,4]­Triazolo­[1,5-a]­pyrimidine Phosphodiesterase 2A Inhibitors: Structure and Free-Energy Perturbation-Guided Exploration. J. Med. Chem. 2020;63(21):12887–12910. doi: 10.1021/acs.jmedchem.0c01272. [DOI] [PubMed] [Google Scholar]
  12. Michaelides I. N., Collie G. W., Börjesson U., Vasalou C., Alkhatib O., Barlind L., Cheung T., Dale I. L., Embrey K. J., Hennessy E. J.. et al. Discovery and Optimization of the First ATP Competitive Type-III c-MET Inhibitor. J. Med. Chem. 2023;66(13):8782. doi: 10.1021/acs.jmedchem.3c00401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Tandarić T., Gutiérrez-de-Terán H.. Ligand and Residue Free Energy Perturbations Solve the Dual Binding Mode Proposal for an A2BAR Partial Agonist. J. Phys. Chem. B. 2025;129:886. doi: 10.1021/acs.jpcb.4c07391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Majellaro M., Jespers W., Crespo A., Núñez M. J., Novio S., Azuaje J., Prieto-Díaz R., Gioé C., Alispahic B., Brea J.. et al. 3,4-Dihydropyrimidin-2­(1H)-ones as Antagonists of the Human A2B Adenosine Receptor: Optimization, Structure–Activity Relationship Studies, and Enantiospecific Recognition. J. Med. Chem. 2021;64(1):458–480. doi: 10.1021/acs.jmedchem.0c01431. [DOI] [PubMed] [Google Scholar]
  15. Chakraborty C., Bhattacharya M., Lee S.-S., Wen Z.-H., Lo Y.-H.. The changing scenario of drug discovery using AI to deep learning: Recent advancement, success stories, collaborations, and challenges. Mol. Ther.--Nucleic Acids. 2024;35(3):102295. doi: 10.1016/j.omtn.2024.102295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Ding Y., Yu K., Huang J.. Data science techniques in biomolecular force field development. Curr. Opin. Struct. Biol. 2023;78:102502. doi: 10.1016/j.sbi.2022.102502. [DOI] [PubMed] [Google Scholar]
  17. Qian R., Xue J., Xu Y., Huang J.. Alchemical Transformations and Beyond: Recent Advances and Real-World Applications of Free Energy Calculations in Drug Discovery. J. Chem. Inf. Model. 2024;64(19):7214. doi: 10.1021/acs.jcim.4c01024. [DOI] [PubMed] [Google Scholar]
  18. Khalak Y., Tresadern G., Hahn D. F., Groot B. L. d., Gapsys V.. Chemical Space Exploration with Active Learning and Alchemical Free Energies. J. Chem. Theory Comput. 2022;18(10):6259–6270. doi: 10.1021/acs.jctc.2c00752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Thompson J., Walters W. P., Feng J. A., Pabon N. A., Xu H., Maser M., Goldman B. B., Moustakas D., Schmidt M., York F.. Optimizing active learning for free energy calculations. Artif. Intell. Life Sci. 2022;2:100050. doi: 10.1016/j.ailsci.2022.100050. [DOI] [Google Scholar]
  20. Gorantla R., Kubincová A., Suutari B., Cossins B. P., Mey A. S. J. S.. Benchmarking Active Learning Protocols for Ligand-Binding Affinity Prediction. J. Chem. Inf. Model. 2024;64(6):1955. doi: 10.1021/acs.jcim.4c00220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Gusev F., Gutkin E., Kurnikova M. G., Isayev O.. Active Learning Guided Drug Design Lead Optimization Based on Relative Binding Free Energy Modeling. J. Chem. Inf. Model. 2023;63(2):583. doi: 10.1021/acs.jcim.2c01052. [DOI] [PubMed] [Google Scholar]
  22. Lonsdale R., Glancy J., Kalash L., Marcus D., Wall I. D.. Active Learning FEP: Impact on Performance of AL Protocol and Chemical Diversity. J. Chem. Theory Comput. 2025;21(9):4867. doi: 10.1021/acs.jctc.5c00128. [DOI] [PubMed] [Google Scholar]
  23. Landrum, G. ; Tosco, P. ; Kelley, B. ; Rodriguez, R. ; Cosgrove, D. ; Vianello, R. ; sriniker; Gedeck, P. ; Jones, G. ; Kawashima, E. ; et al. rdkit/rdkit: 2025_03_5 (Q1 2025) Release. 10.5281/zenodo.16439048 [DOI] [Google Scholar]
  24. Jiang H., Fan M., Wang J., Sarma A., Mohanty S., Dokholyan N. V., Mahdavi M., Kandemir M. T.. Guiding Conventional Protein–Ligand Docking Software with Convolutional Neural Networks. J. Chem. Inf. Model. 2020;60(10):4594. doi: 10.1021/acs.jcim.0c00542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Wójcikowski M., Kukiełka M., Stepniewska-Dziubinska M. M., Siedlecki P.. Development of a protein–ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions. Bioinformatics. 2019;35(8):1334. doi: 10.1093/bioinformatics/bty757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Feurer M., Klein A., Eggensperger K., Springenberg J. T., Blum M., Hutter F.. Auto-sklearn: Efficient and Robust Automated Machine Learning. Autom. Mach. Learn. 2019:113. doi: 10.1007/978-3-030-05318-5_6. [DOI] [Google Scholar]
  27. Zacarías N. V. O., Veldhoven J. P. D. v., Hollander L. S. d., Dogan B., Openy J., Hsiao Y.-Y., Lenselink E. B., Heitman L. H., IJzerman A. P.. Synthesis and Pharmacological Evaluation of Triazolopyrimidinone Derivatives as Noncompetitive, Intracellular Antagonists for CC Chemokine Receptors 2 and 5. J. Med. Chem. 2019;62(24):11035–11053. doi: 10.1021/acs.jmedchem.9b00742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Yadav S., Kardam V., Tripathi A., G S. T., Dubey K. D.. The Performance of Different Water Models on the Structure and Function of Cytochrome P450 Enzymes. J. Chem. Inf. Model. 2022;62(24):6679–6690. doi: 10.1021/acs.jcim.2c00505. [DOI] [PubMed] [Google Scholar]
  29. Olson R. S., Urbanowicz R. J., Andrews P. C., Lavender N. A., Kidd L. C., Moore J. H.. Automating Biomedical Data Science Through Tree-Based Pipeline Optimiz. Appl. Evol. Comput. 2016;9597:123. doi: 10.1007/978-3-319-31204-0_9. [DOI] [Google Scholar]
  30. Oliveira C. d., Leswing K., Feng S., Kanters R., Abel R., Bhat S.. FEP Protocol Builder: Optimization of Free Energy Perturbation Protocols Using Active Learning. J. Chem. Inf. Model. 2023;63(17):5592–5603. doi: 10.1021/acs.jcim.3c00681. [DOI] [PubMed] [Google Scholar]
  31. Furui K., Ohue M.. Benchmarking HelixFold3-Predicted Holo Structures for Relative Free Energy Perturbation Calculations. ACS Omega. 2025;10(11):11411. doi: 10.1021/acsomega.4c11413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Qiao, Z. ; Ding, F. ; Dresselhaus, T. ; Rosenfeld, M. A. ; Han, X. ; Howell, O. ; Iyengar, A. ; Opalenski, S. ; Christensen, A. S. ; Sirumalla, S. K. ; et al. NeuralPLexer3: Accurate Biomolecular Complex Structure Prediction with Flow Models 2024. 10.48550/arXiv.2412.10743 [DOI] [Google Scholar]
  33. Hiranuma N., Park H., Baek M., Anishchenko I., Dauparas J., Baker D.. Improved protein structure refinement guided by deep learning based accuracy estimation - PubMed. Nat. Commun. 2021;12(1):1340. doi: 10.1038/s41467-021-21511-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Team, C. D. ; Boitreaud, J. ; Dent, J. ; McPartlon, M. ; Meier, J. ; Reis, V. ; Rogozhonikov, A. ; Wu, K. . Chai-1: Decoding the molecular interactions of life. bioRxiv. 2024 doi: 10.1101/2024.10.10.615955. [DOI] [Google Scholar]
  35. Wohlwend, J. ; Corso, G. ; Passaro, S. ; Getz, N. ; Reveiz, M. ; Leidal, K. ; Swiderski, W. ; Atkinson, L. ; Portnoi, T. ; Chinn, I. ; et al. Boltz-1 Democratizing Biomolecular Interaction Modeling. bioRxiv 2025. 10.1101/2024.11.19.624167. [DOI] [Google Scholar]
  36. Scheen, J. ; Rianjongdee, F. ; Macdonald, H. B. ; Gowers, R. ; Degorce, S. ; Green, A. ; Scully, C. ; Duffy, T. ; Howes, J. ; Cordery, C. ; et al. Leveraging Alchemical Free Energy Calculations with Accurate Protein Structure Prediction. 2025. 10.26434/chemrxiv-2025-wv6z9-v2. [DOI] [Google Scholar]
  37. Jespers W., Esguerra M., Åqvist J., Gutiérrez-de-Terán H.. QligFEP: an automated workflow for small molecule free energy calculations in Q. J. Cheminf. 2019;11(1):26. doi: 10.1186/s13321-019-0348-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Jorgensen W. L., Tirado-Rives J.. Molecular modeling of organic and biomolecular systems using BOSS and MCPRO. J. Comput. Chem. 2005;26(16):1689. doi: 10.1002/jcc.20297. [DOI] [PubMed] [Google Scholar]
  39. Jiang, Y. ; Li, X. ; Zhang, Y. ; Han, J. ; Xu, Y. ; Pandit, A. ; Zhang, Z. ; Wang, M. ; Wang, M. ; Liu, C. ; et al. PoseX: AI Defeats Physics Approaches on Protein-Ligand Cross Docking. 2025. 10.48550/arXiv.2505.01700. [DOI] [Google Scholar]
  40. Škrinjar, P. ; Eberhardt, J. ; Durairaj, J. ; Schwede, T. . Have protein-ligand co-folding methods moved beyond memorisation? bioRxiv 2025. 10.1101/2025.02.03.636309. [DOI] [Google Scholar]
  41. Nittinger E., Yoluk Ö., Tibo A., Olanders G., Tyrchan C.. Co-folding, the future of docking – prediction of allosteric and orthosteric ligands. Artif. Intell. Life Sci. 2025;8:100136. doi: 10.1016/j.ailsci.2025.100136. [DOI] [Google Scholar]
  42. Wang J., Wolf R. M., Caldwell J. W., Kollman P. A., Case D. A.. Development and testing of a general amber force field. J. Comput. Chem. 2004;25(9):1157. doi: 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
  43. Wang J., Wang W., Kollman P. A., Case D. A.. Automatic atom type and bond type perception in molecular mechanical calculations. J. Mol. Graphics Modell. 2006;25(2):247. doi: 10.1016/j.jmgm.2005.12.005. [DOI] [PubMed] [Google Scholar]
  44. Vanommeslaeghe K., Raman E. P., MacKerell J. A. D.. Automation of the CHARMM General Force Field (CGenFF) II: Assignment of Bonded Parameters and Partial Atomic Charges. J. Chem. Inf. Model. 2012;52(12):3155. doi: 10.1021/ci3003649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Vanommeslaeghe K., Hatcher E., Acharya C., Kundu S., Zhong S., Shim J., Darian E., Guvench O., Lopes P., Vorobyov I., Mackerell A. D.. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J. Comput. Chem. 2010;31(4):671. doi: 10.1002/jcc.21367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Roos K., Wu C., Damm W., Reboul M., Stevenson J. M., Lu C., Dahlgren M. K., Mondal S., Chen W., Wang L.. et al. OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules. J. Chem. Theory Comput. 2019;15(3):1863. doi: 10.1021/acs.jctc.8b01026. [DOI] [PubMed] [Google Scholar]
  47. Bui T. Q., Loan H. T. P., My T. T. A., Quang D. T., Thuy B. T. P.. et al. A density functional theory study on silver and bis-silver complexes with lighter tetrylene: are silver and bis-silver carbenes candidates for SARS-CoV-2 inhibition? Insight from molecular docking simulation. RSC Adv. 2020;10(51):30961. doi: 10.1039/D0RA05159D. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Devereux C., Smith J. S., Huddleston K. K., Barros K., Zubatyuk R., Isayev O., Roitberg A. E.. Extending the Applicability of the ANI Deep Learning Molecular Potential to Sulfur and Halogens. J. Chem. Theory Comput. 2020;16(7):4192. doi: 10.1021/acs.jctc.0c00121. [DOI] [PubMed] [Google Scholar]
  49. Smith J. S., Isayev O., Roitberg A. E.. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017;8(4):3192–3203. doi: 10.1039/C6SC05720A. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Unke O. T., Meuwly M.. PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges. J. Chem. Theory Comput. 2019;15(6):3678. doi: 10.1021/acs.jctc.9b00181. [DOI] [PubMed] [Google Scholar]
  51. Schütt K. T., Sauceda H. E., Kindermans P.-J., Tkatchenko A., Müller K.-R.. SchNet - A Deep Learning Architecture for Molecules and Materials. J. Chem. Phys. 2018;148(24):241722. doi: 10.1063/1.5019779. [DOI] [PubMed] [Google Scholar]
  52. Musaelian A., Batzner S., Johansson A., Sun L., Owen C. J., Kornbluth M., Kozinsky B.. Learning local equivariant representations for large-scale atomistic dynamics. Nat. Commun. 2023;14(1):579. doi: 10.1038/s41467-023-36329-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Batzner S., Musaelian A., Sun L., Geiger M., Mailoa J. P., Kornbluth M., Molinari N., Smidt T. E., Kozinsky B.. E­(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 2022;13(1):2453. doi: 10.1038/s41467-022-29939-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Batatia, I. ; Kovács, D. P. ; Simm, G. N. C. ; Ortner, C. ; Csányi, G. . MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields. 2022. 10.48550/arXiv.2206.07697. [DOI] [Google Scholar]
  55. Kovács, D. P. ; Moore, J. H. ; Browning, N. J. ; Batatia, I. ; Horton, J. T. ; Pu, Y. ; Kapil, V. ; Witt, W. C. ; Magdău, I.-B. ; Cole, D. J. ; Csányi, G. . MACE-OFF: Transferable Short Range Machine Learning Force Fields for Organic Molecules. 2023. 10.48550/arXiv.2312.15211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Gao X., Ramezanghorbani F., Isayev O., Smith J. S., Roitberg A. E.. TorchANI: A Free and Open Source PyTorch-Based Deep Learning Implementation of the ANI Neural Network Potentials - PubMed. J. Chem. Inf. Model. 2020;60(7):3408. doi: 10.1021/acs.jcim.0c00451. [DOI] [PubMed] [Google Scholar]
  57. Anstine D. M., Zubatyuk R., Isayev O.. AIMNet2: a neural network potential to meet your neutral, charged, organic, and elemental-organic needs. Chem. Sci. 2025;16(23):10228–10244. doi: 10.1039/D4SC08572H. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Zariquiey F. S., Galvelis R., Gallicchio E., Chodera J. D., Markland T. E., Fabritiis G. D.. Enhancing Protein–Ligand Binding Affinity Predictions Using Neural Network Potentials. J. Chem. Inf. Model. 2024;64(5):1481–1485. doi: 10.1021/acs.jcim.3c02031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Karwounopoulos J., Bieniek M., Wu Z., Baskerville A. L., König G., Cossins B. P., Wood G. P. F.. Evaluation of Machine Learning/Molecular Mechanics End-State Corrections with Mechanical Embedding to Calculate Relative Protein–Ligand Binding Free Energies. J. Chem. Theory Comput. 2025;21(2):967. doi: 10.1021/acs.jctc.4c01427. [DOI] [PubMed] [Google Scholar]
  60. Zariquiey F. S., Farr S. E., Doerr S., Fabritiis G. D.. QuantumBind-RBFE: Accurate Relative Binding Free Energy Calculations Using Neural Network Potentials. J. Chem. Inf. Model. 2025;65(8):4081–4089. doi: 10.1021/acs.jcim.5c00033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Tkaczyk S., Karwounopoulos J., Schöller A., Woodcock H. L., Langer T., Boresch S., Wieder M.. Reweighting from Molecular Mechanics Force Fields to the ANI-2x Neural Network Potential. J. Chem. Theory Comput. 2024;20(7):2719. doi: 10.1021/acs.jctc.3c01274. [DOI] [PubMed] [Google Scholar]
  62. Galvelis R., Varela-Rial A., Doerr S., Fino R., Eastman P., Markland T. E., Chodera J. D., Fabritiis G. D.. NNP/MM: Accelerating Molecular Dynamics Simulations with Machine Learning Potentials and Molecular Mechanics. J. Chem. Inf. Model. 2023;63(18):5701–5708. doi: 10.1021/acs.jcim.3c00773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Ding Y., Huang J., Ding Y., Huang J.. Implementation and Validation of an OpenMM Plugin for the Deep Potential Representation of Potential Energy. Int. J. Mol. Sci. 2024;25(3):1448. doi: 10.3390/ijms25031448. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Chemical Information and Modeling are provided here courtesy of American Chemical Society

RESOURCES