Abstract
The inhibition of β-site amyloid precursor protein-cleaving enzyme 1 presents a promising therapeutic strategy for treating Alzheimer’s disease by reducing amyloid-β (Aβ) production. This paper employed a computational approach that combined machine learning (ML) and atomistic simulations to accelerate the discovery of potential BACE1 inhibitors. Our ML models, trained on a set of ligands with experimental binding affinity, showed high accuracy when tested on a holdout test set. The best model was used to screen more than two million compounds in the CHEMBL33 chemical library to obtain a short list of top-hit compounds, which were further analyzed using molecular docking and fast pulling of ligand (FPL) simulations. The insights into structure and binding energetics obtained from FPL simulations elucidate the stability and interaction mechanisms of the BACE1-ligand bound state, providing data useful for the rational design of novel AD therapeutics.


Introduction
β-site amyloid precursor protein-cleaving enzyme 1 (BACE1) is a critical factor associated with Alzheimer’s disease (AD). BACE1 is primarily known for its function in the amyloidogenic pathway, where it cleaves the amyloid precursor protein (APP) to produce amyloid-beta (Aβ) peptides, which aggregate to form plaquesa hallmark of AD. − The accumulation of Aβ is believed to initiate a cascade of neurodegenerative events, leading to synaptic dysfunction and neuronal death, ultimately resulting in cognitive decline. , Recent studies have highlighted the implications of BACE1 in various physiological and pathological contexts. For instance, BACE1 expression is not only limited to neurons but is also found in astrocytes and endothelial cells, suggesting its involvement in neuroinflammatory processes and vascular dysfunction associated with AD. , The expression of BACE1 in reactive astrocytes near Aβ plaques indicates a potential role in the local inflammatory response, which is hypothesized to exacerbate neurodegeneration.
Inhibition of BACE1 offers a promising strategy for reducing amyloid-beta (Aβ) production and mitigating the progression of AD. Ruderisch et al. demonstrated that selective BACE1 peptide inhibitors could lower brain Aβ levels by more than 50%, which is associated with a rescue of cognitive decline. Gijen et al. designed a novel series of 1,4-oxazine BACE1 inhibitors, which exhibit potent BACE1 inhibition at nanomolar concentrations and strong efficacy in reducing Aβ, achieving a 50–65% reduction in Aβ42 levels in cerebrospinal fluid (CSF). , However, the journey from the initial discovery of BACE1 inhibitors to their clinical application has been challenging, with many inhibitors showing potent activity in preclinical studies but failing in clinical trials due to issues related to toxicity and a lack of efficacy. These setbacks emphasize the need for innovative approaches to accelerate the discovery and development of effective BACE1 inhibitors.
However, it is important to note that BACE1 cleaves other substrates involved in myelination and synaptic plasticity, such as neuregulin-1. , Consequently, complete inhibition may lead to adverse physiological effects, , highlighting the importance of a careful balance between on-target efficacy and preservation of physiological processes.
Recent computational studies have applied machine learning algorithms and atomistic simulations to characterize BACE1 inhibitors. − Ligand-based approaches, by applying Deep Neural Networks (DNN) and Random Forest algorithms with 2D and 3D descriptors, were employed to model the binding affinities of diverse BACE1 inhibitors. In the context of the D3R Grand Challenge 4, Wang and Ng developed a target-specific DNN model that combined structure-based and ligand-based features to accurately predict BACE1-ligand affinities. A hierarchical virtual screening workflow involving molecular docking, ADMET prediction, and MD simulations was carried out to successfully identify potential BACE1 inhibitors from a large database of natural products. To obtain atomistic insights, multiple separate molecular dynamics simulations, alongside MM-GBSA and Solvated Interaction Energy (SIE) free energy calculations, were utilized to elucidate the binding mechanisms and key residue interactions of specific BACE1 inhibitors.
The integration of machine learning (ML) with atomistic simulations represents a powerful paradigm shift in the prediction of potent inhibitors that could form a strong affinity to a biological target. ML models can predict the free binding affinity of potential inhibitors with high accuracy, enabling the rapid screening of vast chemical libraries. Computational methods such as molecular docking and MD simulations can be employed to refine binding free-energy predictions from ML and to provide structural insights into the binding processes. Indeed, by coupling ML predictions with detailed atomistic simulations such as molecular docking and steered molecular dynamics (SMD), profound insights into the interactions between drug targets and ligands can be gained. These computational strategies offer a cost-effective and efficient approach for screening and prioritizing potential drug candidates.
This synergy between ML and atomistic simulations used in this work aims to propose a set of potential candidates for the BACE1 inhibitors that exhibit strong ligand-binding affinities, favorable interaction profiles, and potential for further development as therapeutic agents for AD. A robust ML model trained on experimental binding-affinity data of 2537 compounds was constructed and applied to screen a vast chemical library, CHEMBL 33, comprising ∼2.3 million compounds, to prioritize structures with high binding affinities. The top candidates are then subjected to molecular docking and MD simulations to elucidate their binding interaction mechanisms with BACE1.
Materials & Methods
Ligand Data Collection and Preparation
To construct ML models to predict binding free energy between BACE1 and a set of ligands, we utilized a data set comprising 2537 ligands whose SMILES and experimentally determined binding constants (K i) for BACE1 were obtained from the BindingDB database. , The binding free energy, ΔG, was calculated from the formula ΔG = RT ln K i, where R is the molar gas constant and T is set at 298 K. The experimental ΔG was utilized as the target variable for training the ML models. To ensure the robustness of the machine learning models and mitigate potential bias arising from a single random partition, the data set was randomly divided into a training set (2137 ligands) and a test set (400 ligands). This random partitioning process was performed three independent times (in triplicate). Consequently, all machine learning models were trained and evaluated on each of the three distinct data set splits. The distributions of ΔG for the training and the test sets in the first split are shown in Figure , which indicates a representative spread of binding affinities in both sets of data.
1.

Distributions of the experimental ΔG for compounds in the training and test data sets.
ML Model Training
We trained several ML models, including linear regression (LR), random forest (RF), , extreme gradient boosting (XGBoost), − and convolutional networks on graphs (GraphConv). The LR model served as a baseline due to its simplicity, thereby minimizing the risk of overfitting. RF and XGBoost are both ensemble methods, but distinct in their approaches to handling training samples and predictions. RF employs bootstrapping and averages predictions from individual tree learners, while XGBoost adjusts learners sequentially using a weighted sum of predictions for the final output.
For LR, RF, and XGBoost models, we extracted molecular descriptors using the RDKitDescriptors toolkit available in the DeepChem library. Initially, 200 physicochemical descriptors were computed, encompassing properties such as molecular weight, polar surface area, and hydrogen bond donors and acceptors. To enhance model robustness, we reduced the feature set by removing descriptors with predominantly zero values (>90%) and those highly correlated (correlation >0.95), resulting in a final set of 103 features. Missing values within the data set were imputed using the median of the respective feature. Prior to model input, all features were standardized to achieve a mean of zero and a standard deviation of one.
GraphConv model, on the other hand, leverages graph-based representations of molecules, eliminating the need for manual feature extraction by learning features directly from molecular graphs. Each molecule is depicted as an undirected graph, where nodes represent atoms and edges represent chemical bonds. The model dynamically updates node feature vectors through convolutional layers, aggregating information from neighboring nodes. This process results in a fixed-length vector representation, which is fed into a densely connected layer for prediction.
Model hyperparameters were tuned using a 10-fold cross-validation approach, optimizing for minimal mean square error (MSE). The Hyperopt library facilitated the search for optimal hyperparameter configurations. For LR, we tuned the L2 regularization strength, while for RF and XGBoost, parameters such as the tree depth and learning rate were optimized. Optimal hyperparameters related to the GraphConv model’s architecture were searched by varying the number of units in convolutional and dense layers, as well as learning rates and dropout rates. Early stopping was employed, where training was halted once validation loss plateaued.
Performance metrics, including root-mean-square error (RMSE), Pearson’s correlation coefficient (R), and Spearman’s correlation coefficient (ρ), were employed to evaluate model efficacy on the test set. The best-performing model, with RMSE = 1.01 ± 0.05 kcal/mol and correlation coefficients Pearson’s R = 0.77 ± 0.02 and Spearman’s ρ = 0.78 ± 0.02 (measured for the test set in the first split), was subsequently used to predict binding affinities for ∼2.3 million compounds in the CHEMBL33 database.
The primary goal of screening the entire set of 2.3 million compounds was to leverage the speed of the ML model as a computational filter. Atomistic simulations, such as MD and FPL, are computationally demanding and cannot be applied to such a vast chemical space. By using the ML model to rank all compounds, we were able to efficiently filter the database, identifying only a small number of top candidates for subsequent, more resource-intensive analysis of structures and binding energetics.
Validation Data Sets
We used 4 different data sets for validating and benchmarking molecular docking calculations and free energy calculations based on MD simulations, as well as for predicting the highest ranking ligands. (i) Validation Set 1 was used for benchmarking the structures and binding energies obtained by docking and binding free energy calculations. This set included 14 publicly available BACE1-ligand crystal structures with known experimental K i and was used to validate our docking and free energy calculation protocols. The pdb identifiers were 4GID, 4B05, 4DJU, 4DJV, 4DJW, 6EJ3, 5HU0, 5HE5, 5HDX, 5HDV, 5HE7, 3LPI, 3LPK, and 2G94. (ii) Validation Set 2 was used to retrospectively validate the ML prediction accuracy and also to validate the accuracy of free energy calculations. It included the ML-predicted top 10 compounds from the ChEMBL database for which experimental binding affinity existed in the literature, but these compounds were not used in our training/test sets (known hits). (iii) Novel Set included the ML-predicted top 10 compounds from the ChEMBL database with no experimental binding affinity data (unknown hits). This set represents novel discovery outputs from ML prediction and was further refined by atomistic simulations. (iv) Reference Set included 2 known phase 3 clinical candidates (Elenbecestat, Lanabecestat) for benchmarking and comparison.
Molecular Docking Simulation
AutoDock Vina with modified empirical parameters was employed to redock 14 complexes in Validation Set 1 to benchmark the accuracy of the docking protocol. The docking empirical parameters that were modified included Gauss1 (−0.049811), Gauss2 (−0.007218), Repulsion (0.756221), Hydrophobic (−0.031562), Hydrogen Bond (−0.469951), and Rotation (0.025722). Changing these parameters was found to significantly improve the correlation of docking scores with respect to experiment compared to default values. After the docking protocol was validated, the ligands in Validation Set 1, Reference Set, and Novel Set were docked into the binding site of BACE1, whose crystal structure was downloaded from the Protein Data Bank with identifier 5HE7, which is one of the 14 structures in Validation Set 1. 5HE7 was selected because of its high resolution of 1.71 Å (Table S1 in Supporting Information), which is among the top 3 highest resolutions among 14 complexes. Furthermore, the binding constant of its cocrystallized ligand is also high, at top 4 (Table S1 in Supporting Information). Therefore, the structure provides an active conformation of the enzyme relevant for drug discovery targeting the active site. These docked ligand-5HE7 complexes were used as initial structures for subsequent molecular dynamics simulations. Note that the rest of 13 crystal structures in Validation Set 1 were not used to dock top ligands predicted by ML because they were used mainly for validating docking and free energy protocols. Using all these 13 other crystal structures for MD simulations would be prohibitively expensive.
The receptor and ligands were parametrized using a force field provided by AutoDockTools, Open Babel, and the RDKit library. The center of the docking grid was chosen as Asp93, the box range was set to 28 × 28 × 28 Å3, and the spacing was set to default by mvina. Docking structures of ligand-5HE7 complexes with the lowest docking energy were selected as the initial structures for molecular dynamics simulations.
MD Simulations
Atomistic simulations in aqueous solution based on GROMACS 2019.6 were carried out for 36 BACE1-ligand complexes, including 14 complexes in Validation Set 1, 10 complexes formed by docking ligands in Validation Set 2 to 5HE737, 10 complexes formed by docking ligands in Novel Set to 5HE737, and two complexes formed by docking two compounds in Reference Set to 5HE7. At the beginning, the ligand-5HE7 complex was parametrized using the Amber99SB-iLDN force field for protein/ions, TIP3P for the water molecules, and the general Amber force field (GAFF) for ligands. The details of the ligand were acquired by quantum chemical calculations using the B3LYP/6-31G(d,p) level of theory, and then AmberTools18 and ACPYPE methods were used for parametrizing ligands. The electrostatic potential grids used for fitting point charges were obtained from density functional theory (DFT) quantum mechanical calculations using the double hybrid Mp2 (Møller–Plesset second-order perturbation theory) functional, the basis set 6-31G(d,p), and implicit solvent (ε = 78.4). The BACE1 and the ligand were put into a rectangular periodic boundary condition box with a size of approximately 6.53 × 7.93 × 11.48 nm3. This box size was set up to ensure that the complex was at least 16 Å away from the box’s edges. All Cα atoms of BACE1 were restrained with a harmonic force during the MD simulations.
The simulations were performed by using a cutoff of 0.9 nm. The electrostatics and the van der Waals interactions were calculated via the Particle Mesh Ewald (PME) method and cutoff scheme, respectively. The LINCS algorithm with order set to 4 was used to constrain all bonds during the simulations. The parametrized complex was initially minimized for energy using the steepest descent approach. The minimized complex was then relaxed in 100 ps of NVT and 20 ns of NPT simulations.
The binding affinity of ligands to BACE1 was estimated using the fast pulling of ligand (FPL) scheme according to previous successes. , The last snapshot of the NPT simulation was used as the initial structure of the SMD simulation, during which a harmonic force with a spring constant of k ≈ 143.5 kcal mol–1 nm–2 was applied to the ligand center of mass to rapidly dissociate it from the binding cavity. The pulling process was performed using a constant pulling velocity of v = 0.005 nm/ps. The recorded pulling work W was directly proportional to the ligand-binding affinity since it can be used to estimate the binding free energy via the isobaric–isothermal Jarzynski equality.
Analysis Tools
In docking, PyMOL Education was used to view and save the docking pose for further analysis. The root-mean-square deviation (RMSD) of the best redocked pose (pose with the lowest docking energy) and the crystal ligand was calculated with the function obrms from Open Babel. The correlation coefficient (Pearson’s r) between ΔG dock (lowest docking energy) and experimental ΔG exp was calculated.
In MD simulations, the ligand protonation states were estimated by using the ChemAxon Web server (www.chemicalize.com). Statistical errors of root-mean-square error (RMSE) and the correlation coefficient were computed via 1000 cycles of bootstrapping calculations. The docking success rate p̂ was computed by differentiating the root-mean-square deviation (RMSD) of non-hydrogen atoms between experimental and docking poses. The RMSD value was determined using the GROMACS tool “gmx rms”. The diagram describing the interaction between BACE1 and the ligand was prepared via the free version of Maestro. The log(BB) was predicted using the PreADME webapp.
Computational Workflow
Our multistage computational workflow combined efficient ligand screening of the ML model with the atomic detail of MD simulations. The pipeline is shown in Figure . In stage 1, four ML models (LR, RF, XGBoost, and GraphConv) were trained on 3 random training splits, each containing 2137 randomly selected ligands. Then the four models were tested on 3 random test splits, each containing 400 ligands. Then the model with the overall best performance on the three test splits (XGBoost) was selected to make prediction of binding free energy for ∼2.3 million compounds in the CHEMBL database. In stage 2, the accuracy of structures and binding energies obtained by the docking and free energy calculations (FPL) protocol was benchmarked using three data sets: Validation Set 1, Validation Set 2, and Reference Set. In stage 3, validated docking and FPL protocols were applied to the Novel Set, consisting of a short list of 10 novel high-ranking candidates (compounds without prior experimental BACE1 activity), to prioritize compounds with the strongest binding for future experimental investigation.
2.

Computational workflow for the identification of potential BACE1 inhibitors.
Results and Discussion
ML Models
Our study focuses on integrating ML and atomistic simulation to predict the binding affinity of compounds in CHEMBL33 database to BACE1. Four ML models, including linear regression (LR), random forest (RF), , extreme gradient boosting (XGBoost), − and convolutional networks on graphs (GraphConv) were trained on the training set of 2137 ligands and tested on the test set of 400 ligands. Table shows a test performance comparison of the four ML models. As expected, the LR model gave the poorest performance due to its inability to capture potentially complex nonlinear relationships between the input features, the molecular descriptors, and the target variable, the binding free energy. The XGBoost model gave the best performance in predicting the binding free energy of test ligands to BACE1, with RMSE = 1.01 ± 0.05 kcal/mol and correlation coefficients Pearson’s R = 0.77 ± 0.02 and Spearman’s ρ = 0.78 ± 0.02, although it is not significantly better than RF and GraphConv. A similar performance comparison for the two additional test sets, resulting from two additional independent train/test splits, is shown in Table S2 in the Supporting Information and confirms that XGBoost is the best model. A comparison of binding free energy between prediction and experiment for 400 compounds in the test data set is shown in Figure . From this model performance comparison, we decided to select XGBoost to make predictions for ∼2.3 million CHEMBL33 compounds. The top 10 most important features for the XGBoost model are shown in Table S3 in the Supporting Information, in which the number of heteroatoms is the most important feature.
1. Comparison of Test Performance Metrics for Four Trained ML Models Calculated for 400 Test Ligands in the First Data Split .
| Model | RMSE (kcal/mol) | Pearson’s R | Spearman’s ρ |
|---|---|---|---|
| Linear Regression | 1.27 ±0.08 | 0.60 ± 0.04 | 0.64 ± 0.03 |
| Random Forest | 1.03 ± 0.06 | 0.75 ± 0.02 | 0.76 ± 0.02 |
| XGBoost | 1.01 ± 0.05 | 0.77 ± 0.02 | 0.78 ± 0.02 |
| GraphConv | 1.03 ± 0.05 | 0.75 ± 0.03 | 0.75 ± 0.03 |
The standard errors were estimated using bootstrapping with 1000 samples drawn with replacement.
3.

Comparison of binding free energy (ΔG) between experimental results and predictions made by XGBoost model for 400 test compounds.
Over 2.3 million compounds from the CHEMBL33 database were predicted for their inhibitory ability on BACE1, and the ΔG ML prediction ranged from −13.5 to −4.3 kcal/mol, with an average of −8.4 ± 0.8 kcal/mol. Note that compounds in the training and test sets were removed from the CHEMBL33 database. In the database, there were 2 compounds that underwent clinical trial phase 3 and were considered as the reference compounds (Reference Set). These two substances were CHEMBL4204869 (elenbecestat) with a ΔG ML = −10.4 kcal/mol and CHEMBL3989948 (lanabecestat) with a ΔG ML = −10.1 kcal/mol. There were 24,881 compounds that were predicted by ML to have higher binding affinity than both elenbecestat and lanabecestat. There were also 10 other compounds whose activity had been tested for BACE1 inhibitors in recent research and showed good inhibitory effects (Validation Set 2). The chemical structures and the comparison between experimental binding free energy and ML prediction results for Validation Set 2 are shown in Table S4 in the Supporting Information. Our ML model gave very accurate predictions of binding free energy for these 10 compounds, with a low RMSE of 0.73 kcal/mol. For further investigation with atomic details using molecular docking and MD simulations, we selected the 10 top compounds whose activity against BACE1 had not been experimentally tested (Novel Set). Their chemical structures and predicted values of binding free energy are shown in Table S5 in the Supporting Information.
Molecular Docking
The Molecular Docking method was employed to find the docking poses for the 10 compounds in the Novel Set predicted by ML to be strongly bound to BACE. These docking poses were used as initial structures for subsequent MD simulations. In order to validate the accuracy of molecular docking in finding the correct binding poses, we carried out redocking for 14 complexes in Validation Set 1. The redocking results for these 14 complexes are shown in Table S6 in the Supporting Information. The RMSD of the redocked pose with respect to the native pose ranges from 0.37 to 2.49 Å, with a mean value of 0.89 Å. There are 12 out of 14 redocked complexes with RMSD less than 1.5 Å, which indicates that the docking protocol provides very high accuracy in predicting the binding structures of the BACE1-ligand complexes. The docking free energy was also compared with the experimental value, as shown in Figure . There is a strong correlation between the redocking binding free energy and the experimental binding free energy, with Pearson’s R = 0.83, although there is a tendency of docking to overestimate the binding affinity. The validated docking protocol was then applied to 10 compounds in the Novel Set predicted to be potential inhibitors, and the docking free energy and the ligand-BACE1 interactions are shown in Table S7 in the Supporting Information.
4.

Comparison of binding free energy (ΔG) between experimental results and docking for 14 complexes in Validation Set 1.
Free energy calculations based on fast pulling of ligand (FPL) were used to estimate the binding free energy between the top 10 compounds in the Novel Set and BACE1. FPL is a type of steered molecular dynamics, requires nearly the same computational cost as normal unbiased MD simulations, and scales linearly with the number of compounds studied. To validate the accuracy of this approach, we used 14 complexes in Validation Set 1, which were also used to validate docking calculations as discussed above. Each complex was equilibrated with unbiased MD simulations for 20 ns, and the simulations were repeated four times. For each of the last snapshots of the four MD trajectories, the centers of mass of the ligands were pulled at a constant pulling speed. The maximum pulling force (F max) and pulling work (W) were recorded and are shown in Table S8 in the Supporting Information. The pulling work is highly correlated with the experimental binding free energy (Figure ), with Pearson’s R = −0.9. The maximum rupture force is also strongly correlated with the experimental binding free energy (Figure S1 in the Supporting Information), with Pearson’s R = −0.9. The strong correlation between pulling work and the experimental binding free energy allows us to derive a simple linear relationship using a linear regression line as
where ΔG FPL is the predicted binding free energy via FPL calculations and W is the work of pulling the ligand out of the binding pocket.
5.

Scatter plot of experimental binding free energy vs pulling work for 14 complexes in Validation Set 1.
Next, we carried out the validated FPL simulation protocol for the top 10 compounds in the Novel Set and the two reference compounds, elenbecestat and lanabecestat, in the Reference Set, and used the linear function above to estimate the binding free energy. The pulling work and maximum rupture force for these 12 compounds are shown Table S9 in the Supporting Information and the estimated binding free energy is shown in Table . All 10 compounds in the Novel Set have predicted binding free energy stronger than lanabecestat, and nine of them are stronger than elenbecestat. Two compounds, CHEMBL131805 and CHEMBL135529, have the strongest predicted binding free energy, Δ G FPL = −12.8 kcal/mol and ΔG FPL = −12.9 kcal/mol, respectively, suggesting that they are the most potential ligands. Furthermore, the two compounds have lower log(BB) than the two reference molecules, suggesting that they can have appropriate blood–brain penetration for the inhibition of BACE1.
2. Binding Free Energy Estimated Using the Fast Pulling of Ligand Approach for the Top 10 Compounds and the Two Reference Compounds, Elenbecestat and Lanabecestat.
| No. | CHEMBL ID | Log (BB) | ΔG FPL (kcal/mol) |
|---|---|---|---|
| 1 | CHEMBL4287439 | –1.0 | –12.1 |
| 2 | CHEMBL3640397 | –1.7 | –10.8 |
| 3 | CHEMBL3640406 | –1.7 | –11.5 |
| 4 | CHEMBL131805 | –1.0 | –12.8 |
| 5 | CHEMBL135529 | –1.0 | –12.9 |
| 6 | CHEMBL2180765 | –1.1 | –11.1 |
| 7 | CHEMBL4115452 | 2.0 | –10.4 |
| 8 | CHEMBL3640361 | –1.7 | –10.6 |
| 9 | CHEMBL4112361 | –2.0 | –10.7 |
| 10 | CHEMBL3691676 | –1.7 | –10.8 |
| 11 | elenbecestat | –1.1 | –10.5 |
| 12 | lanabecestat | –1.4 | –10.0 |
When looking further to the interaction between CHEMBL131805 and CHEMBL135529 and BACE1 active site (Figure ), the high potent binding affinity can be explained as two molecules able to form hydrogen bonds with one of the aspartic dyads, Asp93, and interact with Asp289. It might be suggested that a higher rupture force or pulling work is required to break the bonds, and as a result, this compound has higher binding affinity to BACE1.
6.

Interactions between the ligand and residues in the BACE1 binding site for the two predicted strongest-binding compounds. The last snapshots of 20 ns MD simulations were used for this interaction analysis.
Conclusions
In this study, we applied a computational approach based on ML and molecular simulations to screen the CHEMBL33 library, which contains over 2.3 million compounds, for potential BACE1 inhibitors. The best XGBoost model predicted the binding free energy for ligands in a test set of 400 compounds with low RMSE (1.01 ± 0.05 kcal/mol) and high correlation (Pearson’s R = 0.77 ± 0.02 and Spearman’s ρ = 0.78 ± 0.02). The 10 top compounds with the strongest predicted binding free energy were selected for further investigation using molecular docking, MD simulations, and Log(BB) calculations. These calculations not only identified two candidate compounds with high binding affinity and low logBBB values but also provided structural insights into the binding process in the binding site of BACE1. They indicated important hydrogen bonds and side-chain contacts between the top two ligands and residues in BACE1’s binding site. The identified compounds, CHEMBL131805 and CHEMBL135529, are top hits for future experimental validation of AD drug discovery. The computational pipeline we developed, which integrates machine learning and atomistic simulations, could also serve as a scalable and cost-effective approach for screening vast chemical libraries and prioritizing candidates for in-depth, resource-intensive experimental studies. Our findings of BACE1 inhibitory ability of the two candidate compounds were based on computational predictions and should be validated by experimental assays in future work to confirm their binding affinity and inhibitory activity. Furthermore, the detailed molecular interactions that we observed can guide future work to optimize the lead compounds, such as designing and testing derivatives to improve binding affinity, selectivity, and other ADMET properties.
Supplementary Material
All relevant data necessary to reproduce all results in the paper are within the main text and the SI file.
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.5c07081.
Validation set PDB IDs, resolution, and binding affinities (Table S1); performance metrics comparisons for ML models across independent splits (Table S2); XGBoost model top ten feature importance (Table S3); predicted versus experimental binding affinity for Validation Set 2 (Table S4); top 10 novel compound predictions (Table S5); docking-free energies and RMSD for Validation Set 1 (Table S6); docking results for Novel and Reference sets (Table S7); FPL pulling force and work data for Validation Set 1 (Table S8); FPL results for Novel and Reference set compounds (Table S9); scatter plot of experimental binding free energy versus maximum rupture force for Validation Set 1 (Figure S1) (PDF)
.
All authors designed the studies, collected and analyzed the data, and wrote the manuscript. Q.M.T. collected the database of available inhibitors of BACE1 from bindingdb.org. T.D.Q. and D.D.T.M. carried out molecular docking and unbiased MD simulations. T.D.Q. carried out FPL simulations and analyzed simulation results. T.H.N. trained and tested ML models. T.H.N. predicted potential inhibitors for BACE1 via ML calculations. Q.M.T. performed molecular docking and MD simulations. V.V.V. helped in discussing the results. T.H.N., P.T.T., and S.T.N. provided the concept, supervision , writing, editing, etc.
The authors declare no competing financial interest.
References
- Vergallo A., Houot M., Cavedo E., Lemercier P., Vanmechelen E., De Vos A., Habert M.-O., Potier M.-C., Dubois B., Lista S.. et al. Brain Aβ load association and sexual dimorphism of plasma BACE1 concentrations in cognitively normal individuals at risk for AD. Alzheimer’s Dementia. 2019;15(10):1274–1285. doi: 10.1016/j.jalz.2019.07.001. [DOI] [PubMed] [Google Scholar]
- Du W., Lei C., Dong Y.. MicroRNA-149 is downregulated in Alzheimer’s disease and inhibits β-amyloid accumulation and ameliorates neuronal viability through targeting BACE1. Genet Mol. Biol. 2021;44(1):e20200064. doi: 10.1590/1678-4685-gmb-2020-0064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chacón-Quintero M. V., Pineda-López L. G., Villegas-Lanau C. A., Posada-Duque R., Cardona-Gómez G. P.. Beta-Secretase 1 Underlies Reactive Astrocytes and Endothelial Disruption in Neurodegeneration. Front. Cell. Neurosci. 2021;15:656832. doi: 10.3389/fncel.2021.656832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhai K., Huang Z., Huang Q., Tao W., Fang X., Zhang A., Li X., Stark G. R., Hamilton T. A., Bao S.. Pharmacological inhibition of BACE1 suppresses glioblastoma growth by stimulating macrophage phagocytosis of tumor cells. Nat. Cancer. 2021;2(11):1136–1151. doi: 10.1038/s43018-021-00267-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moussa-Pacha N. M., Abdin S. M., Omar H. A., Alniss H., Al-Tel T. H.. BACE1 inhibitors: Current status and future directions in treating Alzheimer’s disease. Med. Res. Rev. 2020;40(1):339–384. doi: 10.1002/med.21622. [DOI] [PubMed] [Google Scholar]
- Ruderisch N., Schlatter D., Kuglstatter A., Guba W., Huber S., Cusulin C., Benz J., Rufer A. C., Hoernschemeyer J., Schweitzer C.. et al. Potent and Selective BACE-1 Peptide Inhibitors Lower Brain Aβ Levels Mediated by Brain Shuttle Transport. eBiomedicine. 2017;24:76–92. doi: 10.1016/j.ebiom.2017.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gijsen H. J. M., Alonso de Diego S. A., De Cleyn M., García-Molina A., Macdonald G. J., Martínez-Lamenca C., Oehlrich D., Prokopcova H., Rombouts F. J. R., Surkyn M.. et al. Optimization of 1,4-Oxazine β-Secretase 1 (BACE1) Inhibitors Toward a Clinical Candidate. J. Med. Chem. 2018;61(12):5292–5303. doi: 10.1021/acs.jmedchem.8b00304. [DOI] [PubMed] [Google Scholar]
- Rombouts F. J. R., Tresadern G., Delgado O., Martínez-Lamenca C., Van Gool M., García-Molina A., Alonso de Diego S. A., Oehlrich D., Prokopcova H., Alonso J. M.. et al. 1,4-Oxazine β-Secretase 1 (BACE1) Inhibitors: From Hit Generation to Orally Bioavailable Brain Penetrant Leads. J. Med. Chem. 2015;58(20):8216–8235. doi: 10.1021/acs.jmedchem.5b01101. [DOI] [PubMed] [Google Scholar]
- Coimbra J. R. M., Marques D. F. F., Baptista S. J., Pereira C. M. F., Moreira P. I., Dinis T. C. P., Santos A. E., Salvador J. A. R.. Highlights in BACE1 Inhibitors for Alzheimer’s Disease Treatment. Front. Chem. 2018;6:178. doi: 10.3389/fchem.2018.00178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Willem M., Garratt A. N., Novak B., Citron M., Kaufmann S., Rittger A., DeStrooper B., Saftig P., Birchmeier C., Haass C.. Control of Peripheral Nerve Myelination by the β-Secretase BACE1. Science. 2006;314(5799):664–666. doi: 10.1126/science.1132341. [DOI] [PubMed] [Google Scholar]
- Hu X., Hicks C. W., He W., Wong P., Macklin W. B., Trapp B. D., Yan R.. Bace1 modulates myelination in the central and peripheral nervous system. Nat. Neurosci. 2006;9(12):1520–1525. doi: 10.1038/nn1797. [DOI] [PubMed] [Google Scholar]
- Filser S., Ovsepian S. V., Masana M., Blazquez-Llorca L., Brandt Elvang A., Volbracht C., Müller M. B., Jung C. K. E., Herms J.. Pharmacological Inhibition of BACE1 Impairs Synaptic Plasticity and Cognitive Functions. Biol. Psychiatry. 2015;77(8):729–739. doi: 10.1016/j.biopsych.2014.10.013. [DOI] [PubMed] [Google Scholar]
- Egan Michael F., Kost J., Tariot Pierre N., Aisen Paul S., Cummings Jeffrey L., Vellas B., Sur C., Mukai Y., Voss T., Furtek C.. et al. Randomized Trial of Verubecestat for Mild-to-Moderate Alzheimer’s Disease. N. Engl. J. Med. 2018;378(18):1691–1703. doi: 10.1056/NEJMoa1706441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Subramanian G., Ramsundar B., Pande V., Denny R. A.. Computational Modeling of β-Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches. J. Chem. Inf. Model. 2016;56(10):1936–1949. doi: 10.1021/acs.jcim.6b00290. [DOI] [PubMed] [Google Scholar]
- Wang B., Ng H.-L.. Deep neural network affinity model for BACE inhibitors in D3R Grand Challenge 4. J. Comput.-Aided Mol. Des. 2020;34(2):201–217. doi: 10.1007/s10822-019-00275-z. [DOI] [PubMed] [Google Scholar]
- Gheidari D., Mehrdad M., Karimelahi Z.. Virtual screening, ADMET prediction, molecular docking, and dynamic simulation studies of natural products as BACE1 inhibitors for the management of Alzheimer’s disease. Sci. Rep. 2024;14(1):26431. doi: 10.1038/s41598-024-75292-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y., Yang F., Yan D., Zeng Y., Wei B., Chen J., He W.. Identification Mechanism of BACE1 on Inhibitors Probed by Using Multiple Separate Molecular Dynamics Simulations and Comparative Calculations of Binding Free Energies. Molecules. 2023;28:4773. doi: 10.3390/molecules28124773. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Réda C., Kaufmann E., Delahaye-Duriez A.. Machine learning applications in drug development. Comput. Struct. Biotechnol. J. 2020;18:241–252. doi: 10.1016/j.csbj.2019.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ngo S. T., Nguyen T. H., Tung N. T., Vu V. V., Pham M. Q., Mai B. K.. Characterizing the ligand-binding affinity toward SARS-CoV-2 Mpro via physics- and knowledge-based approaches. Phys. Chem. Chem. Phys. 2022;24(48):29266–29278. doi: 10.1039/D2CP04476E. [DOI] [PubMed] [Google Scholar]
- Thai Q. M., Pham T. N. H., Hiep D. M., Pham M. Q., Tran P.-T., Nguyen T. H., Ngo S. T.. Searching for AChE inhibitors from natural compounds by using machine learning and atomistic simulations. J. Mol. Graphics Modell. 2022;115:108230. doi: 10.1016/j.jmgm.2022.108230. [DOI] [PubMed] [Google Scholar]
- Mendez D., Gaulton A., Bento A. P., Chambers J., De Veij M., Félix E., Magariños María M., Mosquera Juan J., Mutowo P., Nowotka M.. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2019;47(D1):D930–D940. doi: 10.1093/nar/gky1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilson M. K., Liu T., Baitaluk M., Nicola G., Hwang L., Chong J.. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 2016;44(D1):D1045–D1053. doi: 10.1093/nar/gkv1072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu T., Lin Y., Wen X., Jorissen R. N., Gilson M. K.. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 2007;35(suppl_1):D198–201. doi: 10.1093/nar/gkl999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H., Leung K.-S., Wong M.-H., Ballester P. J.. Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest. Molecules. 2015;20:10947–10962. doi: 10.3390/molecules200610947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee K., Lee M., Kim D.. Utilizing random Forest QSAR models with optimized parameters for target identification and its application to target-fishing server. BMC Bioinf. 2017;18(16):567. doi: 10.1186/s12859-017-1960-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen, T. ; Guestrin, C. . XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Francisco: California, USA, 2016. [Google Scholar]
- Robles, J. ; Sotelo, F. ; Rojas, C. ; Hurtado, J. ; Lopez, J. . Performance Analysis of XGBoost Models with Ultrafast Shape Recognition Descriptors in Ligand-Based Virtual Screening. In Proceedings of the 8th International Conference on Bioinformatics Research and Applications; Berlin: Germany, 2021. [Google Scholar]
- Tuerkova A., Bongers B. J., Norinder U., Ungvári O., Székely V., Tarnovskiy A., Szakács G., Özvegy-Laczka C., van Westen G. J. P., Zdrazil B.. Identifying Novel Inhibitors for Hepatic Organic Anion Transporting Polypeptides by Machine Learning-Based Virtual Screening. J. Chem. Inf. Model. 2022;62(24):6323–6335. doi: 10.1021/acs.jcim.1c01460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Adeshina Y. O., Deeds E. J., Karanicolas J.. Machine learning classification can reduce false positives in structure-based virtual screening. Proc. Natl. Acad. Sci. U. S. A. 2020;117(31):18477–18488. doi: 10.1073/pnas.2000585117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duvenaud, D. ; Maclaurin, D. ; Aguilera-Iparraguirre, J. ; Gómez-Bombarelli, R. ; Hirzel, T. ; Aspuru-Guzik, A. ; Adams, R. P. . Convolutional networks on graphs for learning molecular fingerprints. In Proceedings of the 29th International Conference on Neural Information Processing Systems - Vol. 2; MIT Press: Canada, 2015. [Google Scholar]
- Ramsundar, B. ; Eastman, P. ; Walters, P. ; Pande, V. . Deep Learning For The Life Sciences: Applying Deep Learning To Genomics, Microscopy, Drug Discovery, and More; O’Reilly, 2019. [Google Scholar]
- Bergstra, J. ; Yamins, D. ; Cox, D. . Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. Proceedings Of The 30th International Conference On Machine Learning, Proceedings Of Machine Learning Research; PMLA, 2013. [Google Scholar]
- Ghosh A. K., Venkateswara Rao K., Yadav N. D., Anderson D. D., Gavande N., Huang X., Terzyan S., Tang J.. Structure-Based Design of Highly Selective β-Secretase Inhibitors: Synthesis, Biological Evaluation, and Protein–Ligand X-ray Crystal Structure. J. Med. Chem. 2012;55(21):9195–9207. doi: 10.1021/jm3008823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jeppsson F., Eketjäll S., Janson J., Karlström S., Gustavsson S., Olsson L.-L., Radesäter A.-C., Ploeger B., Cebers G., Kolmodin K.. et al. Discovery of AZD3839, a Potent and Selective BACE1 Inhibitor Clinical Candidate for the Treatment of Alzheimer Disease*. J. Biol. Chem. 2012;287(49):41245–41257. doi: 10.1074/jbc.M112.409110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cumming J. N., Smith E. M., Wang L., Misiaszek J., Durkin J., Pan J., Iserloh U., Wu Y., Zhu Z., Strickland C.. et al. Structure based design of iminohydantoin BACE1 inhibitors: Identification of an orally available, centrally active BACE1 inhibitor. Bioorg. Med. Chem. Lett. 2012;22(7):2444–2449. doi: 10.1016/j.bmcl.2012.02.013. [DOI] [PubMed] [Google Scholar]
- Johansson P., Kaspersson K., Gurrell I. K., Bäck E., Eketjäll S., Scott C. W., Cebers G., Thorne P., McKenzie M. J., Beaton H.. et al. Toward β-Secretase-1 Inhibitors with Improved Isoform Selectivity. J. Med. Chem. 2018;61(8):3491–3502. doi: 10.1021/acs.jmedchem.7b01716. [DOI] [PubMed] [Google Scholar]
- Mandal M., Wu Y., Misiaszek J., Li G., Buevich A., Caldwell J. P., Liu X., Mazzola R. D., Orth P., Strickland C.. et al. Structure-Based Design of an Iminoheterocyclic β-Site Amyloid Precursor Protein Cleaving Enzyme (BACE) Inhibitor that Lowers Central Aβ in Nonhuman Primates. J. Med. Chem. 2016;59(7):3231–3248. doi: 10.1021/acs.jmedchem.5b01995. [DOI] [PubMed] [Google Scholar]
- Cumming J., Babu S., Huang Y., Carrol C., Chen X., Favreau L., Greenlee W., Guo T., Kennedy M., Kuvelkar R.. et al. Piperazine sulfonamide BACE1 inhibitors: Design, synthesis, and in vivo characterization. Bioorg. Med. Chem. Lett. 2010;20(9):2837–2842. doi: 10.1016/j.bmcl.2010.03.050. [DOI] [PubMed] [Google Scholar]
- Ghosh A. K., Kumaragurubaran N., Hong L., Lei H., Hussain K. A., Liu C.-F., Devasamudram T., Weerasena V., Turner R., Koelsch G.. et al. Design, Synthesis and X-ray Structure of Protein–Ligand Complexes: Important Insight into Selectivity of Memapsin 2 (β-Secretase) Inhibitors. J. Am. Chem. Soc. 2006;128(16):5310–5311. doi: 10.1021/ja058636j. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trott O., Olson A. J.. AutoDock Vina Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010;31(2):455–461. doi: 10.1002/jcc.21334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pham T. N. H., Nguyen T. H., Tam N. M. Y., Vu T., Pham N. T., Huy N. T., Mai B. K., Tung N. T., Pham M. Q. V., Vu V. V.. Improving ligand-ranking of AutoDock Vina by changing the empirical parameters. J. Comput. Chem. 2022;43(3):160–169. doi: 10.1002/jcc.26779. [DOI] [PubMed] [Google Scholar]
- O’Boyle N. M., Banck M., James C. A., Morley C., Vandermeersch T., Hutchison G. R.. Open Babel: An open chemical toolbox. J. Cheminf. 2011;3(1):33. doi: 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Riniker S., Landrum G. A.. Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation. J. Chem. Inf. Model. 2015;55(12):2562–2574. doi: 10.1021/acs.jcim.5b00654. [DOI] [PubMed] [Google Scholar]
- Abraham M. J., Murtola T., Schulz R., Páll S., Smith J. C., Hess B., Lindahl E.. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1–2:19–25. doi: 10.1016/j.softx.2015.06.001. [DOI] [Google Scholar]
- Aliev A. E., Kulke M., Khaneja H. S., Chudasama V., Sheppard T. D., Lanigan R. M.. Motional timescale predictions by molecular dynamics simulations: Case study using proline and hydroxyproline sidechain dynamics. Proteins: Struct., Funct., Bioinf. 2014;82(2):195–215. doi: 10.1002/prot.24350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jorgensen W. L., Chandrasekhar J., Madura J. D., Impey R. W., Klein M. L.. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983;79(2):926–935. doi: 10.1063/1.445869. [DOI] [Google Scholar]
- Wang J., Wolf R. M., Caldwell J. W., Kollman P. A., Case D. A.. Development and testing of a general amber force field. J. Comput. Chem. 2004;25(9):1157–1174. doi: 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
- Case, D. A. ; Ben-Shalom, I. Y. ; Brozell, S. R. ; Cerutti, D. S. ; Cheatham, I. T. E. ; Cruzeiro, V. W. D. ; Darden, T. A. ; Duke, R. E. ; Ghoreishi, D. ; Gilson, M. K. , et al. AMBER 2018; University Of California: San Francisco, 2018. [Google Scholar]
- Hess B., Bekker H., Berendsen H. J. C., Fraaije J. G. E. M.. LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem. 1997;18(12):1463–1472. doi: 10.1002/(SICI)1096-987X(199709)18:12<1463::AID-JCC4>3.0.CO;2-H. [DOI] [Google Scholar]
- Truong D. T., Nguyen M. T., Vu V. V., Ngo S. T.. Fast pulling of ligand approach for the design of β-secretase 1 inhibitors. Chem. Phys. Lett. 2017;671:142–146. doi: 10.1016/j.cplett.2017.01.032. [DOI] [Google Scholar]
- Tam N. M., Vu K. B., Vu V. V., Ngo S. T.. Influence of various force fields in estimating the binding affinity of acetylcholinesterase inhibitors using fast pulling of ligand scheme. Chem. Phys. Lett. 2018;701:65–71. doi: 10.1016/j.cplett.2018.04.024. [DOI] [Google Scholar]
- Park S., Schulten K.. Calculating potentials of mean force from steered molecular dynamics simulations. J. Chem. Phys. 2004;120(13):5946–5961. doi: 10.1063/1.1651473. [DOI] [PubMed] [Google Scholar]
- Schrödinger, LLC. The PyMOLMolecular Graphics SystemVersion 3.0; Schrödinger, LLC, 2024. [Google Scholar]
- Bell E. W., Zhang Y.. DockRMSD: an open-source tool for atom mapping and RMSD calculation of symmetric molecules through graph isomorphism. J. Cheminf. 2019;11(1):40. doi: 10.1186/s13321-019-0362-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Efron, B. Bootstrap Methods: Another Look at the Jackknife. In Breakthroughs in Statistics: Methodology and Distribution, Kotz, S. ; Johnson, N. L. , eds.; Springer: New York, 1992; pp. 569–593. [Google Scholar]
- Schrödinger LLC Schrödinger Release 2020–4: Maestro. 2020.
- Lee, S. K. ; Lee, I. H. ; Kim, H. J. ; Chang, G. S. ; Chung, J. E. ; No, K. T. . The PreADME approach: Web-based program for rapid prediction of physico-chemical, drug absorption and drug-like properties EuroQSAR 2002 Designing Drugs and Crop Protectants: processes, Problems And Solutions Blackwell; 2003. 418–420 [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All relevant data necessary to reproduce all results in the paper are within the main text and the SI file.
