Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Apr 14;15:12764. doi: 10.1038/s41598-025-97208-8

Integrating machine learning driven virtual screening and molecular dynamics simulations to identify potential inhibitors targeting PARP1 against prostate cancer

Fahad M Aldakheel 1,, Shatha A Alduraywish 2, Khaled H Dabwan 1
PMCID: PMC11997099  PMID: 40229418

Abstract

Prostate cancer (PC) is one of the most common types of malignancies in men, with a noteworthy increase in newly diagnosed cases in recent years. PARP1 is a ubiquitous nuclear enzyme involved in DNA repair, nuclear transport, ribosome synthesis, and epigenetic bookmarking. In this study, a library of 9000 phytochemicals was screened, with a focus on those with high drug efficacy and potential PARP1 inhibition. Different machine learning models were generated and assessed using various statistical measures. The RF model outperformed all other models in terms of accuracy (0.9489), specificity (0.9171), and area under the curve (AUC = 0.9846). Following this, a library of 9510 phytochemicals was screened, yielding 181 compounds predicted to be active. These compounds were subsequently assessed using Lipinski’s Rule of Five, yielding 40 interesting candidates. Molecular docking experiments demonstrated that compound ZINC2356684563, ZINC2356558598, and ZINC14584870, had strong affinity for the PARP1 active site. Further molecular dynamics simulations and MM-PBSA validated the stability of the ligand–protein complexes, with ZINC14584870 and ZINC43120769 demonstrating the most stable interaction, as seen by low RMSD and RMSF levels. Our findings emphasize the potential of these phytochemical inhibitors as novel therapeutic agents against PARP1 in prostate cancer treatment, paving the path for further experimental validation and clinical investigations. These results open new possibilities for developing treatments to benefit prostate cancer patients.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-97208-8.

Subject terms: Machine learning, Virtual drug screening, Drug discovery

Introduction

Prostate cancer (PC) is one of the most prevalent forms of malignancies in males, with a significant increase in newly diagnosed cases in recent years1. Every year, this illness kills more than 350,000 people2,3. Poly (ADP-ribose) polymerase 1 (PARP1) is a ubiquitous nuclear enzyme that performs a role in a variety of nuclear functions, including DNA repair, ribosome synthesis, chromatin regulation, nuclear transportation, and epigenetic bookmarking (Fig. 1). When activated, PARP1 captures NAD + and assembles long polymers of poly(ADP-ribose), covalently changing itself and the surrounding nuclear proteins46. One of PARP1’s most prevalent and well-studied actions is DNA repair. PARP1 inhibition can be fatal to mutant cells when paired with the loss of activity of several DNA repair genes like BRCA1, BRCA2, PTEN, ATM, CHEK2, and FANCA which are often mutated in cancers79. Thus, PARP1 inhibitors can specifically target tumor cells that have abnormalities in DNA repair mechanisms10,11. PARP1, with its specific domain design, plays an important role in DNA repair and other cellular functions. PARP1 is composed of three domains such as C-terminal catalytic domain (CD), DNA binding domain (N-terminal) and central domain. The N-terminal domain consists of three zinc finger motifs required for recognizing DNA damage and assisting the PARP1 binding to the DNA. The central domain of PARP1 contains glutamate and lysine residues that function as acceptors for ADP-ribose moieties, and a BRCT domain that allows to interact with other DNA damage response proteins12. The catalytic domain (CAT) consists of two subdomains: helical domain (HD) and ADP-ribosyltransferase (ART) domain, with the active site in the ART domain. The fold architecture of these domains is conserved across all PARP family members1315. The catalytic domain is crucial for poly (ADPribosyl)ation (PARylation) of target proteins, which uses NAD + to form poly(ADP-ribose) chains for DNA repair and cellular regulation.

Fig. 1.

Fig. 1

Domains of the PAPR1 (PDB ID 4DQY) except BRCT domain. BRCT domain is not present in this crystal structure. Sub domains of the catalytic domain are colored as WGR (Yellow), Helical Domain (Pink), and ART domain (Green). The image was drawn using PyMol 3.0.4 (https://www.pymol.org/) and the labeling was done in Microsoft office power point.

Experimental studies have shown that certain residues, such as D766, E763, D770, Y907, R878, S864, H862, K903, and Ser904, are critical for PARP catalytic action. These residues are critical to the enzyme activity because they are directly involved in the catalytic process and the stability of the enzyme–substrate complex16. Talazoparib and Olaparib are examples of PARP1 inhibitors that selectively target the catalytic region of PARP1 enzymes. These inhibitors cause cancer cells, especially those with BRCA mutations, to accumulate damage DNA in cancer cells. Apart from these there have been multiples studies investigating the effects inhibitors of PARP1 against PC1719. Because homologous recombination is a critical step in DNA repair that is impaired by BRCA mutations, cells are heavily dependent on PARP1 to repair DNA damage20. Through PARP1 inhibition, these medications stop DNA damage from being repaired, which kills cancer cells. These medications generate synthetic lethality by blocking PARP1, resulting in cell death and demonstrating significant potential for treating BRCA-mutated malignancies, furthering the area of precision medicine21.According to current research and clinical studies, inhibitors targeting the ADP-ribosyltransferase domain(ART) of PARP1 are helpful for treating PC cancers with DNA damage-repair gene mutations11,22. In PC, PARP1 is required for androgen receptor (AR) transcriptional activity. After being attracted to AR target sites, PARP1 enzymatic activity plays a critical role in AR-driven gene expression and the growth of PC cells. Inhibiting PARP1 reduces AR transcriptional activity, affecting both androgen-dependent and -independent AR activation. PARP1 significance in prostate cancer, notably in AR signaling and DNA repair pathways, demonstrates its potential as a therapeutic target11. Computational biology has revolutionized drug discovery and development, with machine learning-driven virtual screening and molecular dynamics simulations emerging as powerful tools. Machine learning algorithms can efficiently process and analyze vast chemical libraries, predicting potential inhibitors with high accuracy. These algorithms use patterns and features from known inhibitors to identify new compounds that bind effectively to target proteins23,24. Integrating machine learning-driven virtual screening with molecular dynamics simulations offers a promising strategy for identifying potential inhibitors targeting ADP- ribosyltransferase domain (ART) of PARP1 in prostate cancer, potentially improving treatment outcomes and providing new avenues for combating this prevalent and deadly disease. Machine learning (ML) is a powerful tool in drug discovery, enabling rapid screening and prioritization of compounds based on efficacy and drug-like properties. When combined with molecular dynamics simulations, it significantly improves the precision and efficiency of inhibitor identification25,26. In this study machine learning (ML)-based virtual screening and molecular docking integrated with molecular dynamics simulations were used to identify potential PARP1 inhibitors. A library of phytochemicals was screened focusing on those with high drug ability and potential inhibitory effects on PARP1.

Methods

Collection and processing of the data

A comprehensive library of 6,510 active inhibitors that target the PARP1 enzyme was retrieved from BindingDB27. The Directory of Useful Decoys-Enhanced (DUD-E) (https://dude.docking.org/) (Accessed on Sep 16 2024) resource was used to create a set of 2871 decoy compounds28. To maintain data integrity, duplicate compounds in the active inhibitor dataset were carefully found and eliminated. To make more experimental studies easier, the decoys and active inhibitors were then combined into a single CSV file. The active inhibitors downloaded from the BindingDB were labeled with a value of 1, denoting their activity, and the decoy compounds were labeled with a value of 0, denoting their inactivity, in order to create a machine learning-based predictive model. During the model training process, this labeling scheme is crucial for differentiating between active and inactive molecules.

Molecular descriptors generation

The 2D molecular descriptors for the dataset were calculated using RDKit (https://www.rdkit.org/), a popular open-source cheminformatics program. A total of 34 descriptors were initially developed, encompassing a broad variety of chemical characteristics and structural aspects. To verify the dataset’s integrity and quality, descriptors with null values were rigorously eliminated. Following this preprocessing stage, feature selection was carried out to keep just the most relevant and informative descriptors. This method guaranteed that the dataset had high-quality features that were tailored for model training and testing. Figure 2 depicts steps of the methodology of the study, illustrating each step from dataset preparation to ML-based screening and molecular docking.

Fig. 2.

Fig. 2

Workflow of the study. This figure was created using Microsoft office power point.

Principal component analysis (PCA)

To handle the large dimensionality of the data, Principal Component Analysis (PCA) was used. PCA is a standard and frequently used approach for pattern detection and data representation. It converts the original high-dimensional dataset into a lower-dimensional space while retaining the majority of the important information. This decrease in dimensionality is accomplished by identifying the primary components29. To perform PCA, the dataset’s covariance matrix was generated to determine the directions of highest variance. Eigenvalues and related eigenvectors were calculated to quantify and characterize the importance of each primary component. Components with higher eigenvalues contributed considerably to the dataset’s variance and were prioritized for further study.

Machine learning models

Many machine learning models were developed and tested to determine if the compounds in the dataset were active or inactive. These models were created on the processed dataset. The algorithms employed in this study were Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Random Forest. SVM, K-Nearest Neighbors, Naïve Bayes, and Random Forest are popular algorithms in drug discovery due to their effectiveness in high-dimensional spaces, classification of complex datasets, simplicity, and prediction of molecular activity. SVM is particularly useful in virtual screening studies for inhibitor identification, KNN is simple and effective in QSAR modeling, and Random Forest is robust, accurate, and resistant to overfitting3032. The SVM approach is famous for its efficiency in high-dimensional environments and resistance to over fitting, particularly when there are more dimensions than samples. SVM determines the hyper plane that splits the data into two classes in the best possible way, giving the biggest margin between the classes. KNN is a straightforward but efficient approach that uses the majority class of its ‘k’ nearest neighbors in the descriptor space to determine whether a chemical is active or inactive. This is a very simple method that is predicated on the idea that similar compounds probably have similar properties. Based on the Bayes theorem, the NB probabilistic classifier assumes that every feature is independent. In many real-world applications, NB works remarkably well, despite its simplicity and the sometimes unrealistic assumption of feature independence. This makes it an invaluable tool in cheminformatics. RF is an ensemble learning approach that builds many decision trees during training and outputs the class that represents the mode of the classes or the mean prediction of the individual trees. Random Forest is well-known for its excellent accuracy, capacity to handle big datasets with increasing dimensionality, and resilience to over fitting.

Model evaluation

In this work, we used tenfold cross-validation to thoroughly test the performance of our machine learning models. This robust validation approach divides the dataset into 10 unique subsets, or folds. The models’ efficacy was evaluated using a number of statistical measures, including accuracy, sensitivity, specificity, and MCC. The accuracy of model is calculated as proportion of correctly predicted instances (both active and inactive) out of the total instances.

graphic file with name d33e360.gif 1

The specificity is measured as

graphic file with name d33e371.gif 2

The sensitivity is estimated as:

graphic file with name d33e382.gif 3

These performance measures were rigorously calculated to offer a thorough assessment of each model’s forecasting ability. The inclusion of numerous statistical factors provides a complete knowledge of the models’ strengths and limits, which eventually guides the selection of the most successful prediction model for differentiating active and inactive compounds.

Machine learning model based screening

A library consisting of 9510 phytochemicals was sourced from ZINC, PubChem, and ChEMBL databases for the purpose of virtual screening. The processing and feature computing approach previously established for our initial dataset were systematically applied to this new dataset. The processed library was then screened using our pre-trained machine learning model, which identify each compound as active or inactive. Further the screened active compounds were assessed using Lipinski’s rule of five.

Molecular docking

After machine learning model based screening the screened compounds were docked into the active site of PARP1. Molecular docking experiments were performed with Autodock Vina. Autodock Vina is extensively used docking software which predicts the binding of compounds to a receptor. It efficiently looks for optimal binding poses by analyzing configurations and scoring them according to binding affinities33. The crystallographic structure of the ART sub domain of the CAT domain was retrieved from the PDB database under the PDB ID 5DS3 (https://www.rcsb.org/structure/5ds3). The protein and the compounds were prepared using AutoDockTools-1.5.6. Polar hydrogen bonds and charged were computed of protein and the compounds. Later on the compounds and the protein were saved in pdbqt format for docking experiments. The grid size for docking experiments was set to 45 Å for x, y, and z, while the box was centered around the active site with the center_x =  − 16.759, center_y = 30.743, and center_z = 132.285. This size was chosen to cover the full active site while giving ligands the freedom to explore neighboring binding areas. The exhaustiveness parameter was set to 16 and 20 numbers of poses for each compound generated.

Molecular dynamics (MD) simulations

In this study, we utilized GROMACS 2021.334 to simulate both the apo forms and the complexes derived from AutoDock Vina. The charmm36-jul202234 was applied for protein topology, while ligand topology was generated using the CGENFF server35. The systems were solvated in a cubic box with the TIP3P water model, ensuring the protein was positioned 15 Å from each side of the box. Sodium ions (Na +) were introduced to neutralize the system. Following neutralization, energy minimization was executed using the steepest descent method for 1000 steps. The systems were then equilibrated using the NVT and NPT ensembles for 300 ps to stabilize temperature and pressure, respectively. Each system underwent a 100 ns simulation, with trajectories saved for subsequent analysis. LINCS algorithm was used to apply constraints on bond length and constraints were applied to bonds involving hydrogen atoms. Trajectories were saved after every 10 ps. A detailed analysis of the root mean square deviation (RMSD), root mean square fluctuation (RMSF), and radius of gyration (Rg) was conducted.

Free energy analysis (MM-PBSA)

After the simulations were completed, the binding energies of the PARP1 and compound complexes were computed to assess the stability of the complexes. This study is critical for understanding biomolecular interactions and their components. The gmx_MMPBSA module was used to do these computations. For MM-PBSA calculations 5000 frames from the last 25 ns of the 100 ns simulation were extracted. This application calculates the binding energy of a protein–ligand complex by analyzing multiple binding components, such as polar, non-polar, bound, and non-bonded interactions.

The binding free energy of the complexes was estimated using the following equation:

graphic file with name d33e431.gif 4

In this equation Inline graphic represents the net free binding energy Inline graphic is the binding free energy of the ligand–protein complex, Inline graphic is the free energy of the protein and Inline graphic is the free energy of the ligand in the solvent. The free energy can also be expressed as:

graphic file with name d33e466.gif 5

In Eq. (5) Inline graphic is solvation free energy consisting of polar solvation (ΔEGB) and non-polar solvation energy (ΔESURF). Inline graphic includes non-bonded energy electrostatics ΔEelec and van der waals interactions ΔEvdw and bonded energy also known as internal energy including angle bending, bond stretching and torsional interaction. Inline graphic denotes the entropic energy contribution. MM-PBSA approach calculates effective binding energies by omitting the entropic term, − TΔS. When the entropic term is excluded, the estimated value is the effective free energy, which is generally adequate for comparing the binding free energies of related ligands36,37. The gmx_MMPBSA module systematically evaluates these components, providing a comprehensive description of the binding interactions.

Toxicity analysis

Chemical toxicity evaluation is an essential step in the medication development process. In addition to being quicker, computational techniques for toxicity estimation can lessen the need for animal testing. Researchers might lessen the high attrition rates frequently observed in pharmaceutical research and development by discovering compounds with fewer toxicity concerns early in the development process. The ProTox-II38 webserver was used to evaluate the toxicity of the chosen compounds. The ProTox-II server provides a complete platform for predicting variety of endpoint toxicity, including mutagenicity, cytotoxicity, hepatotoxicity, and carcinogenicity.

Results

Data collection and preprocessing for machine learning model generation

The dataset of 6510 active and 2871 decoys compounds against PARP1 were retrieved from BindingDB and DUD-E, respectively. Active compounds data was then preprocessed for removal of duplicated entries yielding 4046 compounds in total. The active compounds in the dataset were labeled as 1 and inactive compounds were labeled 0. The integrated dataset consisting of 6917 compounds (4046 actives and 2871 inactives) was split into training and test sets consisting 4841(70%) and 2076 (30%) compounds constituting respectively (Table 1).

Table 1.

Statistics of the train and test set.

Data set Decoys Actives Total
Train set 2019 2822 4841
Test set 852 1224 2076

Descriptors generation and feature selection

Initially, quantifiable 2D molecular and structural descriptors of the selected compounds from previous step were determined using RDKit. In total 34 descriptors were calculated. Descriptors with zero values were not included for model training. The list of the discriptors calculated for the dataset is presented in Table S1. The selection of descriptors for drug discovery was based on their relevance to drug-likeness, biological activity, and physicochemical properties. Key descriptors like molecular weight, LogP, hydrogen bond donors, acceptors, TPSA, and rotatable bonds were prioritized due to their impact on absorption, distribution, metabolism, excretion, and toxicity profiles. These properties align with guidelines like Lipinski’s Rule of Five39 and Veber’s Rule40.

Principle component analysis

We used PCA to lower the dimensionality of a dataset that included 34 descriptors for different substances. By using PCA, we were able to condense the data to two main components that each combined the original descriptors to represent the most important underlying patterns and characteristics of the chemicals. Figure 3 illustrates the scatter plot of the first two principal components from PCA. In the PCA scatter plot, overlapping regions for label 1 (active compounds) and label 0 (inactive compounds) indicate areas where the compounds share similar descriptor values. This suggests that the active and inactive compounds in these regions have comparable chemical or structural features, making them challenging to distinguish based solely on the selected descriptors. The eigenvalues were 1863.27 for the first component and 5.28 for the second component. These values represented the amount of variance that each component accounted for in the dataset. The higher the eigenvalue, the more variance the component extracts. In our example, the PC1 has a substantially higher eigenvalue, accounting for around 99.59% of the variance, capturing highest crucial information hidden in our initial high-dimensional data. The PC2 contributed an extra 0.28% of the explained variation. The variances indicated the sufficiency of these two principal components efficiently retaining the dataset’s important information. PCA makes it possible to perform simpler analysis and visualization without compromising the crucial information in the dataset by minimizing the number of dimensions.

Fig. 3.

Fig. 3

Plot of 2 principle components. Red and Blue color indicates the active and inactive compounds.

Chemical space and diversity analysis

The performance of machine learning models is influenced by the chemical heterogeneity present in the training and test datasets. Models that are developed keeping in view diverse sample populations are likely to outperform on novel data. In current study, physiochemical distribution of the samples was represented in terms of molecular weight and LogP and was plotted on two axis of the graph (Fig. 4). Molecular weight and LogP were chosen for the chemical space study because they are essential descriptors of drug-like properties and key components of Lipinski’s Rule of Five. These parameters provide a straightforward assessment of the dataset’s diversity while focusing on molecular size and lipophilicity, which are both important for absorption, distribution, and target binding. Molecular weight ranged between 70 to 1562 Daltons and 70 to 1313 Dalton for test and training sets, respectively. The values of LogP varied between − 3.5 to 9.0 and − 5.20 to 10 for test and training sets respectively.

Fig. 4.

Fig. 4

Distribution of diversity within the training set and test set, defined by parameters such as LogP and molecular weight.

Model generation and validation

For the construction of model, four supervised learning algorithms (KNN, NB, RF and SVM) were applied. These models were developed using python scikit-learn library and various statistical measures (accuracy, sensitivity, specificity, MCC and Area Under the Curve) were computed to evaluate the performance of each. The values of these assessment metrices are given Table 2. Based on the performance indexes, it was evident that RF outperformed all the models with highest values of accuracy (0.948940), specificity (0.917056), MCC (0.894556) and AUC (0.984607). A tie was observed between KNN and SVM with negligibly high values of accuracy (0.906551 vs 0.905106), specificity (0.809579 vs 0.806075) and MCC (0.809970 vs 0.807131) for KNN. However, AUC for SVM (0.963942) was higher as compared to KNN (0.943203). Worst performance was exhibited by Naïve Bayes with lowest values against all the computed parameters (accuracy 0.827553, sensitivity 0.877049, MCC 0.641622 and AUC 0.874420). In agreement with above findings, RF formed the highest peak in ROC-AUC curve followed by KNN and SVM in training and test data sets, respectively (Fig. 5). Sensitivity which is the ability of a model to correctly identify the positives depicted very different trend favoring both KNN and SVM (0.974590) over RF (0.971311). Conclusively, RF model was considered the best and was employed to screen the library of 9000 phytochemical for prediction of active molecules. The choice of machine learning algorithms was based on their proven effectiveness in handling cheminformatics data. RF was selected for its ability to handle high-dimensional datasets, robustness against over fitting, and capacity to capture non-linear relationships. RF algorithm has been widely applied in drug discovery for predicting molecular activities and identifying potential inhibitors41,42.

Table 2.

Assessment of the trained models.

Performance KNN SVM RF NB
Accuracy 0.9065 0.9051 0.9489 0.8275
Specificity 0.8095 0.8060 0.9170 0.7570
Sensitivity 0.9745 0.9745 0.9713 0.8770
MCC 0.8099 0.8071 0.8945 0.6416
AUC 0.9432 0.9639 0.9846 0.8744
F1 score 0.95 0.92 0.96 0.79

Fig. 5.

Fig. 5

The ROC-AUC curve of all the models on test and train set showing the performance of each model.

Drug likeness properties

Out of 9000, 181 compounds were found active against PARP1. Subsequently, Lipinski rule of five was employed to shortlist only the compounds with favorable permeation and absorption potential. The compounds with molecular weight > 500Daltons, < 5 Hydrogen bond donors, < 10 Hydrogen bond acceptors and LogP < 5 were chosen for further docking study and therapeutic evaluation as potent drug candidates. Forty out of 181 compounds fulfilled all four drug likeness criteria. A heatmap was plotted to investigate the correlation between molecular properties of the active compounds (Fig. S2). Two positive correlations indicated by light blue color were witnessed between Hydrogen bond acceptor (HBA)-Hydrogen bond donor (HBD) [0.229] and LogP-HBD [0.291].

Molecular docking

The binding interactions of the screened drug like compounds with the PARP1 active site were evaluated using Olaparib as a reference. All 40 molecules, including the reference, were docked against the PARP1 active site using AutoDock Vina. Ten docking poses were acquired for each docking experiment. Compounds were arranged on the basis of the energy scores. Binding energies of all the compounds ranged between − 10.8 kcal/mol and − 6.3 kcal/mol. However, least binding energy scores were exhibited by ZINC43120769 (− 10.8 kcal/mol), ZINC40934164 (− 10.2 kcal/mol) and ZINC14584870 (− 9.9 kcal/mol) (Table 3). These compounds accommodated inside the binding pocket of PARP1 at same location as that of reference. The reference compounds olaparib exhibit hydrogen bond interactions with Ser904 Gly863, Arg878, and Tyr896 (Fig. S1). Complex ZINC43120769 established hydrogen bonds with Tyr889, Trp861, and Ser904. These interactions indicate that ZINC43120769 occupies a comparable area as Olaparib, interacting with key residues known to stabilize the ligand within the binding pocket. Compound ZINC40934164 formed hydrogen bonds with Arg878 and Ala880, whereas ZINC14584870 formed a strong interaction network comprising Arg878, Ala880, His862, Ser864, and Ser904. Compounds with lower binding energies, ZINC14584870 and ZINC43120769, generated stronger hydrogen bonds and hydrophobic interactions with important residues such as Arg878 and Ser904 and displayed greater shape complementarity in the binding pocket. The substantial interaction with several active site residues underlines the drugs’ ability to effectively inhibit PARP1 activity. The hydrogen bond interactions of the three top compounds are illustrated in Fig. 6. The physiochemical properties and chemical structures of the candidate compounds are presented in Table 4.

Table 3.

Docking score, number of H-bonds and residues making hydrogen bonds with the compounds are presented of reference compound Olaparib and candidate compounds.

Compound Binding score (kcal/mol) No. of H-bonds Residues making H-bond Other interactions
Ref (Olaparib) − 11.7 5 Gly863, Arg878, Tyr896, Ser904 Lys903, Tyr907, Ala898, Gly894
ZINC43120769 − 10.8 4 Ala880, Arg878 Gly863, Ala898, Lys903, Tyr907,Gly894
ZINC40934164 − 10.2 3 Trp861, Tyr889, Ser904 Tyr907, His862,Ile895, Leu877
ZINC14584870 − 9.9 7 Ala880, Arg878, Ser864, His862, Ser904 Tyr896,Tyr889,Tyr907, His862,Ile872, Leu877

Fig. 6.

Fig. 6

Binding position of the selected hits bound inside the active site. The pocket residues are shown in lines while residues making hydrogen bonds with the ligands are displayed as sticks and labeled (A) Complex ZINC40934164 (B) Complex ZINC43120769 and (C) Complex ZINC14584870. The interaction figures were created using PyMol 3.0.4 (https://www.pymol.org/) and the whole figure was assembled using Microsoft office power point.

Table 4.

physiochemical properties, chemical structures and phytochemical class of the selected compounds.

Compound Chemical structure H-bond donor H-bond acceptor MolWT Phytochemical class
ZINC43120769 graphic file with name 41598_2025_97208_Figa_HTML.gif 3 7 476 Alkaloids
ZINC40934164 graphic file with name 41598_2025_97208_Figb_HTML.gif 3 5 471.6 Alkaloid
ZINC14584870 graphic file with name 41598_2025_97208_Figc_HTML.gif 2 7 436.46 Flavonoids

Molecular dynamics simulation

MD simulation analysis was conducted to validate the stability of the docked compounds-PARP1 complexes. Implementing such a study would yield vital insights into the dynamic behavior of both the ligand and PARP1 protein, as well as stability of the compounds within the catalytic site of protein. Consequently, three complexes and apo-PARP1 were subjected to 100 ns all-atom MD simulation. RMSD is an important parameter for determining the stability of a protein–ligand complex during MD simulations. It determines the average divergence of the protein atomic motions from a reference structure over time. The RMSD of Apo PARP1 steadily increases over time, eventually reaching a plateau at 0.3 nm. This implies that the apo form of the protein undergoes conformational variations as the simulation proceeds, indicating flexibility in the absence of a ligand. The RMSD of PARP1 in association with ZINC43120769 is slightly lower than that of the apo form, indicating that this ligand helps to stabilize the protein structure. The RMSD values remain steady, ranging from 0.2 to 0.3 nm, indicating that the protein maintained overall structural stability during the simulation. ZINC40934164 compound has a similar RMSD trend as ZINC43120769, with somewhat higher values but still within a stable range (between 0.25 and 0.3 nm) as also evident from Fig. 7A. The consistent RMSD shows a stable interaction between ZINC40934164 and PARP1, albeit with slightly greater flexibility than ZINC43120769. ZINC14584870 compound has the most stable RMSD profile, with values consistently less than 0.2 nm throughout the simulation. This shows that ZINC14584870 formed the most stable complex with PARP1 among all the compounds thereby stabilizing the protein. All the complexes exhibit RMSD values lower then apo and reference compound Olaparib.

Fig. 7.

Fig. 7

RMSD and RMSF analysis of the Apo, reference drug (Olaparib) and complexes ZINC43120769, ZINC40934164 and ZINC14584870. RMSD and RMSF plotted over the time of 100 ns for all the simulated systems.

Second simulation run was performed to assess reproducibility of the complexes. In Fig. 8. RMSD plots compare the structural stability of the complexes across both runs.

Fig. 8.

Fig. 8

RMSD analysis over 100 ns for complexes (A) ZINC43120769, (B) ZINC40934164 and (C) ZINC14584870. The plots compare the RMSD of first run and the second MD simulation run. First run black, and second run red line.

The results of the second simulation run are consistent with the first run. Throughout the 100 ns simulation, complexes ZINC43120769 and ZINC14584870 exhibited excellent stability, maintaining RMSD values consistently around 0.2 nm. For complex ZINC43120769, an initial fluctuation was observed during the first 50 ns, after which the RMSD stabilized and closely followed the pattern of the first run, suggesting convergence towards a stable conformation. However complex ZINC40934164 exhibited greater fluctuations in the second run with RMSD values slightly higher than in the first run. Despite this variation, the deviations remained within an acceptable range less than 0.25 nm, showing that the complex did not experience any significant conformational changes and maintained its overall structural integrity. Furthermore, the RMSD values of all the ligand-bound complexes remained lower than those of the apo form, highlighting the stabilizing effect of ligand binding. This further supports the structural integrity of the complexes over time.

Figure 7B depicts the RMSF values for the apo PARP1 and its complexes with the three compounds. RMSF calculates the average variation of each residue around its mean position during the simulation, providing information about the flexibility of specific residues of the protein. In this study, fluctuations were particularly observed in several loop regions of the protein, specifically around the amino acid residues Glu883- Thr887, Ser781-Asp788, Thr825-Ala828, and Thr910-Gln912. The maximum RMSF values recorded for Apo, ZINC43120769, ZINC40934164, and ZINC14584870 were 0.67 nm, 0.52 nm, 0.54 nm, and 0.45 nm, respectively. The active site residues remained stable for all the complexes (Table 5). The RMSF value for the Apo form of PARP1 form was the greatest, indicating a system with more flexibility than the others. The compound ZINC14584870 demonstrated the lowest maximum RMSF among the three, suggesting that it is more effective in stabilizing the protein structure.

Table 5.

RMSF (nm) of the active site residues of all the simulated complexes.

Simulated system Trp861 His862 Ser864 Arg878 Ala880 Tyr889 Ser904
Olaparib 0.05 0.05 0.07 0.11 0.14 0.22 0.09
ZINC43120769 0.05 0.05 0.06 0.10 0.14 0.18 0.11
ZINC40934164 0.05 0.04 0.06 0.09 0.12 0.21 0.09
ZINC14584870 0.05 0.05 0.05 0.12 0.16 0.23 0.09

Compactness of simulated systems

The Radius of Gyration (Rg) is an important metric for determining the structural compactness of PARP1 and compound complexes over the simulation period. To investigate the compactness of the simulation systems, radius of gyration analysis was carried out using “gmx-gyrate” command of GROMACS. This parameter outlines the global stability and folding pattern of protein and its ligand bounded forms. A higher value of Rg denotes less compactness, less stability and expanded protein state. In contrast, lower Rg value suggests a compact, stable and condensed protein configuration. The Rg profiles for protein complexes and the apo form were examined using a 100 ns MD simulation. The Rg values of the complexes ZINC40934164 and ZINC14584870 were found to be surprisingly stable, oscillating within a small range of 182–184 nm as can be seen in Fig. 9. A modest drop in Rg was found between 35 and 45 ns, indicating a tiny structural shift in the protein structure. However, after this change, the Rg values stabilized within the initial range, indicating that the overall compactness of the protein–ligand complexes remained constant for the rest of the simulation. In comparison, the compound ZINC43120769 showed slightly different behavior. Initially, the Rg values were consistent with the other compounds, ranging from 182–184 nm. However, after 40 ns, Rg gradually decreased, eventually reaching 179–180 nm. This drop indicates a modest shrinkage in the overall structure of the protein–ligand complex, which remained stable throughout the simulation. The apo version of the protein showed a significantly altered Rg profile. Unlike the complexes with ligands, the apo form exhibited constant Rg variations throughout the simulation, indicating a lack of structural stability and higher flexibility in the absence of a binding ligand. The mean Rg values of apo PARP1, ZINC43120769, ZINC40934164 and ZINC14584870 were 1.81 nm, 1.80 nm, 1.82 nm, and 1.82, respectively.

Fig. 9.

Fig. 9

Radius of gyration of the Apo and complexes ZINC43120769, ZINC40934164 and ZINC14584870.plotted over the time of 100 ns.

Principle component analysis (PCA)

PCA was used to analyze the conformational dynamics of the three ligand–protein complexes, ZINC43120769, ZINC40934164, and ZINC14584870. Figure 10 PCA plots show the distribution of conformations collected during the simulation, indicating significant variations in each complex’s dynamic behavior. The distribution of points in each plot represents the conformational space explored by the protein during the simulation. Among the three complexes, ZINC43120769 (Fig. A) and ZINC14584870 had more compact and well-defined clusters, indicating greater structural stability. The PCA plot of ZINC43120769 shows two dominating conformational states with little dispersion.

Fig. 10.

Fig. 10

Principle component analyses of the three complexes in a two-dimensional space defined by Principal Component 1 (PC1) and Principal Component 2 (PC2). (A) ZINC43120769, (B) ZINC40934164 and (C) ZINC14584870.

Similarly, ZINC14584870 exhibited a dense and continuous distribution, showing balanced structural, implying the lower degree of structural fluctuations. In contrast, ZINC40934164 exhibited a ring-like shape with a central void, indicating a more dynamic and adaptable conformational landscape. This pattern shows frequent transitions between distinct states, as well as possible energy barriers that prevent the system from settling into a stable shape. Overall, PCA analysis suggests that ZINC43120769 and ZINC14584870 exhibit greater stability, while ZINC40934164 is more flexible and less likely to maintain a stable binding conformation.

MM-PBSA of the complexes

The MM-PBSA approach is an essential tool for understanding the energy aspects of molecular interactions after MD simulations. This is significant because it provides a precise and comprehensive picture of the binding free-energy landscape, revealing critical information about the stability and affinity of molecular compounds. Decomposing the total binding energy into distinct components such as van der Waals, electrostatic, and solvation energies allows for an accurate assessment of the attraction intensity of molecules. The MM-PBSA method was used to compute the binding free energies of PARP1 complexes with ZINC43120769, ZINC40934164, and ZINC14584870. The analysis included different energy terms such as ), electrostatic (ΔEelec), van der Waals (ΔEvdw), solvation (ΔGSOLV), gas-phase (ΔGGAS), and total binding free energies. The energy values of each energy term of all complexes are detailed in Table 6.

Table 6.

MM-PBSA profile of PARP1 in complex with compounds ZINC43120769, ZINC40934164, and ZINC14584870.

Energy terms (Kcal/mol) Reference (Olaparib) ZINC43120769 ZINC40934164 ZINC14584870
ΔEelec − 51.53 − 21.08 48.63 − 8.12
ΔEvdw − 39.76 − 30.42 − 19.34 − 34.32
ΔGSOLV 70.11 29.67 − 38.33 22.07
ΔGGAS − 91.29 − 51.49 29.29 − 42.44
ΔTOTAL − 21.18 − 21.83 − 9.04 − 20.38

ΔEelec (electrostatic energy), Δ Evdw (van der Waals interactions), ΔGSOLV is the solvation energy including polar and non-polar solvation, ΔGGAS is gas phase energy which encompasses ΔEelec and Δ Evdw.

Electrostatic interactions were favorable for ZINC43120769 (− 21.08 kcal/mol) and ZINC14584870 (− 8.12 kcal/mol) but strongly unfavorable for ZINC40934164 (48.63 kcal/mol), indicating probable repulsion. The strong negative ΔEelec values observed for ZINC14584870 and ZINC43120769 suggest favorable electrostatic interactions, particularly with positively charged residues like Arg878 and Lys903 in the active site. Among the three compounds ZINC14584870 has the most beneficial van der Waals interactions, followed by ZINC43120769 and ZINC40934164. The favorable van der Waals indicates strong hydrophobic interactions with residues like Trp861, Tyr896, and Ala880. Solvation energy in positive of ZINC43120769 and ZINC14584870 indicates that these compounds are more hydrophobic and prefers to remain in a less polar environment, like the protein binding site. Compound ZINC43120769 and ZINC14584870 have favorable gas-phase interactions, while ZINC40934164 had negative interactions with ΔGGAS 29.29 kcal/mol. Overall, ZINC43120769 and ZINC14584870 had the highest binding affinities (− 21.83 kcal/mol and − 20.38 kcal/mol, respectively which is comparable to the total binding energy of the reference drug Olaparib. While ZINC40934164 had the lowest (− 9.04 kcal/mol). These data indicate that ZINC43120769 and ZINC14584870 are the most potent binders to LasR, owing to strong van der Waals and electrostatic contacts, whereas ZINC40934164 poor binding is related to unfavorable gas-phase interactions, despite its favorable solvation energy. The Binding free energy results are consistent with the above RMSD, RMSF and PCA results.

Toxicity measurements

The toxicity of the compounds was predicted using ProTox-II. The proTox-II server determines the toxic properties of compounds through the predicted median lethal dose (LD50) in mg/kg weight. ZINC40934164 and ZINC43120769 were found to be in class IV while ZINC14584870 was in class V indicative of non-toxicity and non-irritating. As class IV indicate very low toxicity while class V indicate non-toxic This finding highlights these hits to be favorable in terms of toxicity. The predicted LD50 for ZINC40934164 was 760 mg/kg, LD50 2500/kg for ZINC43120769,and for ZINC14584870 was 1500 mg/kg. According to LD50 values the compound ZINC40934164 exhibit moderate toxicity while the other compounds exhibit no toxicity. The toxicity profile of the selected candidates is presented in Table 7.

Table 7.

Toxicity measurements of the predicted compounds.

Compound ID Hepatotoxicity Neurotoxcity Cytotixicity Carcinogeneity Mutagenecity
ZINC43120769 Inactive Inactive Inactive Inactive Inactive
ZINC40934164 Inactive Inactive Inactive Inactive Inactive
ZINC14584870 Inactive Inactive Inactive Inactive Inactive

Chemical diversity analysis

The chemical diversity and structure similarity of the candidate compounds was further assessed by estimating the Tanimoto coefficient (TC) between the selected compounds and the known PARP1 drugs. The 2D chemical structure of the known drugs (Fig. 11) and the selected compounds were compared in terms of their similarity. The TC, which ranges from 0 to 1, is a quantitative measure of molecular similarity based on structural fingerprints. The TC values close to 1 indicating more similarity between two molecules. Figure 12 shows the TC values of the candidate compounds compared to the known PARP1 drugs, allowing for a direct comparison of TC values between each candidate molecule and each known inhibitors. This study found varied degrees of structural alignment with the known inhibitors, showing distinct similarities and contrasts among the candidates. This variation in similarity profiles underlines each candidate’s distinct structural features, supporting the concept that these compounds have different binding affinities and selectivities for PARP1. Physiochemical properties of the candidate compounds and the reported PARP1 drugs were also conducted (Fig. 13). The properties compared include Molecular Weight (MolWT), LogP (Lipophilicity), H-Bond Donor/Acceptor counts, and Topological Polar Surface Area (TPSA). The selected compounds have physicochemical features similar to those of the four previously known PARP1 inhibitors. All compounds have molecular weights more than 400 g/mol, indicating that they are in a comparable size range, which may encourage binding affinity. The LogP values of the candidate compounds are approximately 4 which is comparable to Olaparib and Rucaparib.

Fig. 11.

Fig. 11

Chemical structures (2D) of the known PARP drugs.

Fig. 12.

Fig. 12

Tanimoto coefficient similarity between the candidate compounds and the reported drugs of PARP1 (Olaparib, Niraparib, Talazoparib, and Rucaparib). TC values ranges between 0 and 1. Values close to 1 indicates high similarity between two compounds.

Fig. 13.

Fig. 13

Physiochemical Properties comparison between the selected compounds (ZINC14584870, ZINC4093164, and ZINC43120769) and the known inhibitors Olaparib, Niraparib, Talazoparib, and Rucaparib. (A) Molecular weight (MolWT), (B) LogP, (C) Number of H-Bond donors and acceptors, (D) Topological Surface Area (TPSA).

Discussion

In this study we employed machine learning based virtual screening to identify potential compounds as therapeutics for prostate cancer by targeting the PARP1 enzyme. PARP1 (Poly(ADP-ribose) polymerase 1) is a crucial enzyme implicated in DNA repair and has been implicated in cancer progression and resistance to chemotherapy. Inhibiting PARP1 has emerged as a promising strategy for cancer treatment, particularly in tumors with defective DNA repair mechanisms4,5. The extensive dataset, encompassing 6510 active inhibitors and 287 decoy compounds, provided a solid foundation for machine learning-based predictive modeling. A library of 9000 phytochemicals was targeted to find promising inhibitors against PARP1. Machine learning model and drug likeness filtering finally yielded 40 active compounds for PARP. ZINC43120769, ZINC40934164, and ZINC14584870 were the three compounds that stood out after docking because of their substantial binding affinities toward the PARP1 active site. We compared these molecules to the well-known PARP1 inhibitor olaparib, which has crystallized with the PARP1 ART domain. According to our investigation, Arg878, His862, and Ser904, three important catalytic residues in the PARP1 enzyme, were created via hydrogen bonding by all three compounds. The validity of our findings was reinforced by the observation that these interactions aligned with the findings of previous studies16,43,44. The interaction profile of the selected candidate ligands was compare able to Olaparib. All candidate compounds form H-bonds with Arg878 which is critical for Olaparib activity. Among all ZINC14584870 showed a more diverse interaction profile, which may indicate its binding stability. To further confirm the potential of these substances, we employed MD simulations and MM-PBSA (Molecular Mechanics/Poisson-Boltzmann Surface Area) analyses. The MD simulations revealed that the chemicals ZINC43120769 and ZINC14584870 significantly stabilized the PARP1 structure over time. MD simulations of the protein–ligand complexes provided insights into the stability and dynamics of these interactions. The RMSD and RMSF analyses indicated that compounds ZINC14584870, ZINC40934164, and ZINC43120769 effectively stabilize the PARP1 protein structure, with ZINC14584870 showing the most stable interaction profile45. According to the RMSD values, among all ZINC14584870 maintained the most constant conformation which assess the stability of the protein–ligand complex. Furthermore, the MM-PBSA analysis demonstrated that these two compounds exhibited a high affinity for PARP1, indicating their potential as effective inhibitors. Overall, our findings show that machine learning-based virtual screening, paired with vast phytochemical libraries, can reveal promising novel PARP1 inhibitors. The compounds ZINC43120769 and ZINC14584870, in particular, show tremendous promise due to their binding affinity (− 21.83 kcal/mol and − 20.38 kcal/mol, respectively) and stabilizing actions on the PARP1 enzyme, making them excellent candidates for further research in prostate cancer therapy. Niraparib, Olaparib, Talazoparib, and Rucaparib are some of the most extensively investigated PARP1 inhibitors for PC46,4650. These inhibitors impede PARP1’s catalytic activity, resulting in DNA damage buildup and cell death in cancer cells. Talazoparib, has a high trapping potency on PARP-DNA complexes, which improves its effectiveness in treating BRCA-mutated malignancies. Olaparib, the first clinically authorized PARP1 inhibitor, has demonstrated remarkable effectiveness in BRCA-mutant PC and is being investigated for additional malignancies46,47. Niraparib is remarkable for its wide range of applications in PC maintenance treatment48. Comparing the candidate compounds evaluated in this study revealed structural and interaction similarities, indicating potential efficacy in PARP1 inhibition. Notably, olaparib and talazoparib are known to exhibit strong binding interactions with catalytic residues, particularly Arg878 and Ser904 with PARP1.Docking results revealed that candidate compounds ZINC43120769 and ZINC14584870 also interact with these key residues, replicating Olaparib binding conformation. Furthermore, these candidate compounds have comparable molecular characteristics; such as molecular weights (MolWT) and topological polar surface areas (TPSA) roughly matching those of recognized inhibitors. The substantial structural similarity of ZINC14584870 to Talazoparib (Fig. 10) shows that this candidate compound may stabilize PARP1 through comparable processes, thereby increasing DNA damage and synthetic lethality in prostate cancer cells lacking DNA repair pathways.

Conclusion

In this study we used machine learning based screening to identify and evaluate potential inhibitors for prostate cancer treatment. Different models like SVM, KNN, NB and RF were used to predict active inhibitors from a dataset of 6510 known compounds and 2871 decoys. The Random Forest model showed the highest accuracy, specificity, and AUC scores. It was used to screen 9510 phytochemicals, identifying 181 potential active compounds. Further analysis using Lipinski’s Rule of Five narrowed down the compounds to 40 drugs like compounds. Further molecular docking studies showed compounds ZINC14584870, ZINC43120769, and ZINC40934164 had strong binding interactions with the PARP1 active site residues including the crucial residues Arg878 and Ser904, similar to the reference inhibitor Olaparib, demonstrating high affinity and effective interaction. Overall MD simulations and MM-PBSA results demonstrated that two compounds ZINC14584870 and ZINC43120769. These compounds have a stable and robust contact with the target, making them ideal candidates for future research and therapeutic applications. Further experimental validation and clinical trials will be required to confirm their efficacy and therapeutic potential.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (89.5KB, xlsx)
Supplementary Material 2 (1.5MB, docx)

Acknowledgements

The authors extend their appreciation to the Researchers Supporting Project number (RSP2025R506), King Saud University, Riyadh, Saudi Arabia for funding this work.

Author contributions

F.M.A. collected data, performed analysis, wrote the first draft of the manuscript, and arranged funding. S.A.A. and K.H.D. assisted in data collection, validated the findings, and revised the manuscript. All authors reviewed and approved the manuscript.

Data availability

Data is provided within the manuscript or supplementary information files.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Wong, M. C. et al. Global incidence and mortality for prostate cancer: Analysis of temporal patterns and trends in 36 countries. Eur. Urol.70, 862–874 (2016). [DOI] [PubMed] [Google Scholar]
  • 2.Howrey, B. T., Kuo, Y.-F., Lin, Y.-L. & Goodwin, J. S. The impact of PSA screening on prostate cancer mortality and overdiagnosis of prostate cancer in the United States. J. Gerontol. Series A Biomed. Sci. Med. Sci.68, 56–61 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sekhoacha, M. et al. Prostate cancer review: Genetics, diagnosis, treatment options, and alternative approaches. Molecules27, 5730 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.D’Amours, D., Desnoyers, S., D’Silva, I. & Poirier, G. G. Poly (ADP-ribosyl) ation reactions in the regulation of nuclear functions. Biochem. J.342, 249–268 (1999). [PMC free article] [PubMed] [Google Scholar]
  • 5.Swindall, A. F., Stanley, J. A. & Yang, E. S. PARP1: Friend or foe of DNA damage and repair in tumorigenesis?. Cancers5, 943–958 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Pascal, J. M. The comings and goings of PARP1 in response to DNA damage. DNA Repair71, 177–182 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Farmer, H. et al. Targeting the DNA repair defect in BRCA mutant cells as a therapeutic strategy. Nature434, 917–921 (2005). [DOI] [PubMed] [Google Scholar]
  • 8.Mendes-Pereira, A. M. et al. Synthetic lethal targeting of PTEN mutant cells with PARP inhibitors. EMBO Mol. Med.1, 315–322 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.McCabe, N. et al. Deficiency in the repair of DNA damage by homologous recombination and sensitivity to poly (ADP-ribose) polymerase inhibition. Can. Res.66, 8109–8115 (2006). [DOI] [PubMed] [Google Scholar]
  • 10.Thomas, C. & Tulin, A. V. Poly-ADP-ribose polymerase: Machinery for nuclear processes. Mol. Aspects Med.34, 1124–1137 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Weaver, A. N. & Yang, E. S. Beyond DNA repair: Additional functions of PARP1 in cancer. Front. Oncol.3, 290 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Langelier, M.-F., Planck, J. L., Roy, S. & Pascal, J. M. Structural basis for DNA damage–dependent poly (ADP-ribosyl) ation by human PARP1. Science336, 728–732 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hottiger, M. O., Hassa, P. O., Lüscher, B., Schüler, H. & Koch-Nolte, F. Toward a unified nomenclature for mammalian ADP-ribosyltransferases. Trends Biochem. Sci.35, 208–219 (2010). [DOI] [PubMed] [Google Scholar]
  • 14.Zong, C. et al. PARP mediated DNA damage response, genomic stability and immune responses. Int. J. Cancer150, 1745–1759 (2022). [DOI] [PubMed] [Google Scholar]
  • 15.Thapa, K., Khan, H., Sharma, U., Grewal, A. K. & Singh, T. G. Poly (ADP-ribose) polymerase-1 as a promising drug target for neurodegenerative diseases. Life Sci.267, 118975 (2021). [DOI] [PubMed] [Google Scholar]
  • 16.Mateo, J. et al. A decade of clinical development of PARP inhibitors in perspective. Ann. Oncol.30, 1437–1447 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Zhang, S.-H. et al. Design, synthesis and biological evaluation of dual inhibitors targeting AR/AR-Vs and PARP1 in castration resistant prostate cancer therapy. Biomed. Pharmacother.180, 117485 (2024). [DOI] [PubMed] [Google Scholar]
  • 18.Messina, C. et al. Combining PARP inhibitors and androgen receptor signalling inhibitors in metastatic prostate cancer: A quantitative synthesis and meta-analysis. Eur. Urol. Oncol.7, 179–188 (2024). [DOI] [PubMed] [Google Scholar]
  • 19.Herencia-Ropero, A. et al. The PARP1 selective inhibitor saruparib (AZD5305) elicits potent and durable antitumor activity in patient-derived BRCA1/2-associated cancer models. Genome Medicine16, 107 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Boussios, S. et al. Poly (ADP-Ribose) polymerase inhibitors: Talazoparib in ovarian cancer and beyond. Drugs in R20, 55–73 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Deshmukh, D. & Qiu, Y. Role of PARP1 in prostate cancer. Am. J. Clin. Exp. Urol.3, 1 (2015). [PMC free article] [PubMed] [Google Scholar]
  • 22.Teyssonneau, D. et al. Prostate cancer and PARP inhibitors: progress and challenges. J. Hematol. Oncol.14, 1–19 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Dara, S., Dhamercherla, S., Jadav, S. S., Babu, C. M. & Ahsan, M. J. Machine learning in drug discovery: A review. Artif. Intell. Rev.55, 1947–1999 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V. & Fotiadis, D. I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J.13, 8–17 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhang, H. et al. An integrated deep learning and molecular dynamics simulation-based screening pipeline identifies inhibitors of a new cancer drug target TIPE2. Front. Pharmacol.12, 772296 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zhou, R. et al. Machine learning-aided discovery of T790M-mutant EGFR inhibitor CDDO-Me effectively suppresses non-small cell lung cancer growth. Cell Commun. Signal.22, 1–25 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Liu, T., Lin, Y., Wen, X., Jorissen, R. N. & Gilson, M. K. BindingDB: A web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res.35, D198–D201 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of useful decoys, enhanced (DUD-E): Better ligands and decoys for better benchmarking. J. Med. Chem.55, 6582–6594 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Karamizadeh, S. et al. An overview of principal component analysis. J. Signal Inf. Process.4, 173–175 (2020). [Google Scholar]
  • 30.Khamis, M. A., Gomaa, W. & Ahmed, W. F. Machine learning in computational docking. Artif. Intell. Med.63, 135–152 (2015). [DOI] [PubMed] [Google Scholar]
  • 31.Ballester, P. J. & Mitchell, J. B. A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics26, 1169–1175 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Karthikeyan, M., Vyas, R., Karthikeyan, M. & Vyas, R. Machine learning methods in chemoinformatics for drug discovery. Pract. Chemoinf. 133–194 (2014).
  • 33.Trott, O. & Olson, A. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem.31, 455–461 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bekker, H. et al. in 4th International Conference on Computational Physics (PC 92). 252–256 (World Scientific Publishing).
  • 35.Best, R. B. et al. Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone ϕ, ψ and side-chain χ1 and χ2 dihedral angles. J. Chem. Theory Comput.8, 3257–3273 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wang, E. et al. End-point binding free energy calculation with MM/PBSA and MM/GBSA: Strategies and applications in drug design. Chem. Rev.119, 9478–9508 (2019). [DOI] [PubMed] [Google Scholar]
  • 37.Case, D. A. et al. Amber 2021 (University of California, 2021). [Google Scholar]
  • 38.Banerjee, P., Eckert, A. O., Schrey, A. K. & Preissner, R. J. ProTox-II: A webserver for the prediction of toxicity of chemicals. Nucleic Acids Res.46, W257–W263 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Pollastri, M. P. Overview on the rule of five. Curr. Protoc. Pharmacol.49, 9–12 (2010). [DOI] [PubMed] [Google Scholar]
  • 40.Plinski, E. F. & Plinska, S. Veber’s rules in terahertz light. (2020).
  • 41.Priya, N. & Shobana, G. Application of machine learning models in drug discovery: A review. Int. J. Emerg. Technol.10, 268–275 (2019). [Google Scholar]
  • 42.Cano, G. et al. Automatic selection of molecular descriptors using random forest: Application to drug discovery. Expert Syst. Appl.72, 151–159 (2017). [Google Scholar]
  • 43.Dawicki-McKenna, J. M. et al. PARP1 activation requires local unfolding of an autoinhibitory domain. Mol. Cell60, 755–768 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Langelier, M.-F., Zandarashvili, L., Aguiar, P. M., Black, B. E. & Pascal, J. M. NAD+ analog reveals PARP1 substrate-blocking mechanism and allosteric communication from catalytic center to DNA-binding domains. Nat. Commun.9, 844 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Adelakun, N. et al. Discovery of new promising USP14 inhibitors: Computational evaluation of the thumb-palm pocket. J. Biomol. Struct. Dyn.40, 3060–3070 (2022). [DOI] [PubMed] [Google Scholar]
  • 46.Agarwal, N. et al. Talazoparib plus enzalutamide in metastatic castration-resistant prostate cancer: TALAPRO-2 phase III study design. Future Oncol.18, 425–436 (2022). [DOI] [PubMed] [Google Scholar]
  • 47.Smith, M. R. et al. Niraparib in patients with metastatic castration-resistant prostate cancer and DNA repair gene defects (GALAHAD): A multicentre, open-label, phase 2 trial. Lancet Oncol.23, 362–373 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.de Bono, J. et al. Olaparib for metastatic castration-resistant prostate cancer. N. Engl. J. Med.382, 2091–2102 (2020). [DOI] [PubMed] [Google Scholar]
  • 49.Abida, W. et al. Rucaparib in men with metastatic castration-resistant prostate cancer harboring a BRCA1 or BRCA2 gene alteration. J. Clin. Oncol.38, 3763–3772 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Anscher, M. S. et al. FDA approval summary: Rucaparib for the treatment of patients with deleterious BRCA-mutated metastatic castrate-resistant prostate cancer. Oncologist26, 139–146 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (89.5KB, xlsx)
Supplementary Material 2 (1.5MB, docx)

Data Availability Statement

Data is provided within the manuscript or supplementary information files.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES