Skip to main content
Heliyon logoLink to Heliyon
. 2025 Feb 10;11(4):e42593. doi: 10.1016/j.heliyon.2025.e42593

Generative AI, molecular docking and molecular dynamics simulations assisted identification of novel transcriptional repressor EthR inhibitors to target Mycobacterium tuberculosis

Rupesh V Chikhale a,, Rinku Choudhary b, Gaber E Eldesoky c, Mahima Sudhir Kolpe b, Omkar Shinde b, Dilnawaz Hossain b
PMCID: PMC11874554  PMID: 40034280

Abstract

Tuberculosis (TB) remains a persistent global health threat, with Mycobacterium tuberculosis (Mtb) continuing to be a leading cause of mortality worldwide. Despite efforts to control the disease, the emergence of multi-drug-resistant (MDR) and extensively drug-resistant (XDR) TB strains presents a significant challenge to conventional treatment approaches. Addressing this challenge requires the development of novel anti-TB drug molecules. This study employed de novo drug design approaches to explore new EthR ligands and ethionamide boosters targeting the crucial enzyme InhA involved in mycolic acid synthesis in Mtb. Leveraging REINVENT4, a modern open-source generative AI framework, the study utilized various optimization algorithms such as transfer learning, reinforcement learning, and curriculum learning to design small molecules with desired properties. Specifically, focus was placed on molecule optimization using the Mol2Mol option, which offers multinomial sampling with beam search. The study's findings highlight the identification of six promising compounds exhibiting enhanced activity and improved physicochemical properties through structure-based drug design and optimization efforts. These compounds offer potential candidates for further preclinical and clinical development as novel therapeutics for TB treatment, providing new avenues for combating drug-resistant TB strains and improving patient outcomes.

Keywords: EthR inhibitors, Machine learning, Artificial intelligence, Mycobacteria, Anti-TB drugs

Graphical abstract

Image 1

1. Introduction

Tuberculosis (TB), attributable to Mycobacterium tuberculosis (MTB), was designated a global health crisis nearly two decades ago. Nevertheless, it is the primary cause of mortality from a singular bacterial pathogen. According to data provided by the World Health Organization (WHO) [1], the year 2021 witnessed approximately 1.6 million fatalities attributed to tuberculosis (TB), encompassing 187,000 cases coinfected with the human immunodeficiency virus (HIV) [1,2]. The World Health Organization highlights a noteworthy number of deaths and infections related to tuberculosis, with an increasing prevalence of drug-resistant strains [3]. Managing multi-drug resistant (MDR) TB presents a substantial challenge, demanding extended and customised treatment protocols. Recent data from the World Health Organization (WHO) indicates an annual estimation of 440,000 new cases of multidrug-resistant TB (MDR-TB), characterised by resistance to the frontline drugs isoniazid and rifampicin [4]. The persistence of infection despite prolonged chemotherapy constitutes a secondary significant challenge in TB control. While drug-susceptible TB necessitates 6–9 months of combination therapy for the cure, treatments for MDR-TB and XDR-TB extend over the years [5,6]. Patients undergoing tuberculosis (TB) treatment face formidable challenges that impede the curative process and pose a substantial obstacle to conventional therapeutic approaches. This brings up a challenge for the scientific community for an urgent need for novel anti-Tb drug molecules. The scientific community is exploring various innovative approaches for curing tuberculosis. Efforts are focused on developing shorter and more efficient treatment regimens by exploring new drug combinations and repurposing existing medications. Ongoing research also targets the development of novel antibiotics with unique mechanisms of action, investigating new drug targets in Mtb.

We have been developing novel molecules to target mycobacteria in active and dormant forms by investigating various drug targets in Mycobacterium tuberculosis [7]. Our recent report on the Mycobacterium tuberculosis transcriptional repressor EthR inhibitors was based on the application of drug repurposing and machine learning (ML) tools for the quality assessment of the known ligands [8]. Ethionamide (ETH) and isoniazid (INH) are structurally analogous and must be activated before performing their desired biological activity [9]. In this case, both drugs target enoyl-acyl carrier protein reductase (InhA), an enzyme necessary for mycolic acid synthesis. Ethionamide (ETH) is a vital second-line drug in TB treatment, with a well-understood mechanism of action [10]. In our recent report we discussed in detail how the SAR based synthesis and development by Besra, Baulard and other researchers led to the evolution of novel molecules like the BDM31343, Thiophen-2-yl-1,2,4-oxadiazoles, BDM41906, 4-(2-Methylthiazol-4-yl)-N-(3,3,3-trifluoropropyl) benzamide, 4-(2-(Propylsulfonamidomethyl) thiazol-4-yl)- N-(3,3,3-trifluoropropyl) benzamide, and 4-(3-(Phenylsulfonamido)prop-1-ynyl)-N-(3,3,3-trifluoropropyl) benzamide. Willand et al. reported the crystal structure of EthR (PDB: 4M3D) [9], which revealed a homodimer structure with two functional domains, and each monomer has nine α-helices. The first three α-helices from each monomer form a DNA-binding domain; these are also connected to the core domain responsible for the dimerization of the repressor. Willand also reported a fragment-grown ligand 4-{3-[(phenylsulfonyl)amino]prop-1-yn-1-yl}-N-(3,3,3-trifluoropropyl)benzamide, which is a linear, highly hydrophobic ligand embedded in the core domain of each monomer [11]. These inputs from the literature and experimental information on the structural aspects of the Mycobacterium tuberculosis transcriptional repressor EthR have assisted successfully in identifying novel ligands. In this study, we applied generative artificial intelligence (GenAI) tools to complete novel designs for potent EthR ligands as ethionamide boosters [12].

REINVENT4 is a GenAI tool that allows designing novel molecules using recurrent neural networks and transformer architectures [13]. It is a deep learning-based platform primarily designed for drug discovery and molecular design [14]. Fig. 1 explains the process flow more fully. REINVENT4 utilizes transfer learning (TL), reinforcement learning (RL), and curriculum learning (CL) to facilitate various methods in molecule design and optimization, including de novo, functional group replacement, library and linker design, molecular optimization, and scaffold hopping methods [15,16]. This tool supports various run modes, with a sampling run mode generating molecules based on a model produced by transfer learning (TL) or reinforcement learning (RL). In this mode, a molecule acts as a restraint, and the generator produces a second molecule within a defined similarity, allowing for scaffold variation while maintaining similarity. In our present study, we have used the Mol2Mol option, which offers multinomial sampling with beam search.

Fig. 1.

Fig. 1

The diagrammatic representation of the workflow for identifying novel inhibitors that target the EthR protein binding site.

2. Materials and methods

The supplementary information file provides a detailed description of the methods and the inputs. Here, we have provided a short description and information on the.

2.1. Compound library selection

All 36 drug-like molecules represented in SMILES format were used for compound library generation.

2.1.1. Mol2Mol molecule optimization compound library generation

REINVENT4 Mol2Mol molecule optimization method was used for the compound library generation. REINVENT4's Mol2Mol with beam search represents a unique balance between novelty and structural similarity in drug design, offering more controlled exploration compared to traditional frameworks. Unlike unrestricted Generative Adversarial Networks (GANs) or pure similarity searches, it uses a configurable similarity radius and beam search to explore chemical space while maintaining key molecular properties systematically. This approach, combined with its transfer learning capabilities, makes it particularly effective for lead optimization, though it may be more computationally intensive than simpler methods. The three learning techniques work together in REINVENT4's molecular design process: Transfer Learning provides foundational knowledge of chemical structures from pre-trained data, Reinforcement Learning optimizes molecules through reward-based learning for desired properties, while Curriculum Learning gradually increases task complexity to ensure effective exploration of chemical space. The 36 ligands were obtained from the earlier referenced literature [8,11,17] and ChemBL (https://www.ebi.ac.uk/chembl/) [18] were supplied as a sampling configuration file in toml format. The output was obtained as a CSV file containing more than 21 thousand sampled SMILES and the Tanimoto score, and all smiles were canonicalised and unique.

2.1.2. Curation of Mol2Mol generated compounds libraries

Mol2Mol-generated compounds were curated to remove duplicate compounds, if any, and other structural aspects like bonds, charges, and stereochemistry were also checked for correctness. Following this, the library was further sorted based on a similarity search. The Tanimoto score was used to measure the similarity of the compounds. This study used more than 12 thousand compounds with a Tanimoto score of 0.6 and above for further research.

2.2. ADMET analysis using pkCSM

The pkCSM [19] Tool was used to perform in-silico ADMET prediction of 12,198 compounds. Various parameters were investigated, such as intestinal absorption, skin permeability, blood-brain barrier (BBB) permeability, central nervous system (CNS) permeability. Additionally, physicochemical characteristics such as lipophilicity, molecular weight, hydrogen bond donors/acceptors, and polar surface area were evaluated, aligning with Lipinski's rule of five for drug-likeness. A toxicity profile and ADME characteristics for each molecule were built and analyzed. Out of these ligands, only 12 Molecules were found to have acceptable pharmacokinetic and toxicity profiles; these were considered for further studies.

2.3. Multi-step molecular docking using AutoDock Vina and PLANTS

2.3.1. Molecular docking using AutoDock Vina

Molecular docking studies were performed using AutoDock Vina (ADV) and AutoDock Vina Tools [20]. The ligand set comprised 12 molecules obtained from the previous step, a co-crystalized ligand, and the standard (DB05154) molecule. The transcriptional regulator EthR protein (PDB: 4M3D), with a resolution of 1.90 Å, was used as the protein receptor with bound ligand as a reference for grid definition in docking. The docking grid was defined with coordinates x: 26.216, y: −10.021, z: −4.773 and a grid box size of 60x60 × 60 Å to ensure adequate coverage of the binding pocket.

2.4. Molecular docking using PLANTS

PLANTS (Protein-Ligand ANTSystem) is an optimization technique based on ant colonies [21,22]. An artificial ant colony was utilized to determine the minimal energy configuration of the ligand in the protein binding site. This tool was applied to the same set of molecules with the same protein receptor as used in ADV. The docking grid was defined with coordinates x: 26.216, y: −10.021, z: −4.773 and a binding site radius of 20 Å.

2.5. Absolute binding free energy estimation using KDeep

The KDeep tool [23] uses the deep convolutional neural networks (DCNNs) model, which has previously undergone pre-training, testing, and verification using the PDBbindv2016 database. To be more precise, the KDeep first splits the binding site into eight distinct pharmacophoric-like features/descriptors (e.g., aromatic, hydrophobic, metallic, positive or negative ionizable, and total excluded volume) during execution and then uses those descriptors to generate models and predict binding affinities. All the protein-ligand complexes for the docking studies by ADV were chosen as input for the KDeep tool, and all features were kept at their default settings.

2.6. Synthetic accessibility studies

The synthetic accessibility properties of the selected six compounds and co-crystal and standard were evaluated using the SwissADME web server, which is publicly available at www.swissadme.ch. Developed by the Swiss Institute of Bioinformatics, SwissADME is a freely accessible web tool for in silico ADME screening and drug-likeness assessment. The assessment encompassed examining molecular and pharmacokinetic behaviour for eight molecules, focusing on parameters such as synthetic accessibility. By subjecting these molecules to a drug-like filter, those that successfully passed the criteria were considered as potential candidates against the target protein. The analysis was conducted by submitting the SMILES format of the eight molecules to retrieve outcomes from the SwissADME server.

2.7. Molecular dynamics (MD) simulation

MD simulations were performed on eight selected protein-ligand complexes of the EthR protein, the six final compounds, the co-crystal ligand, and the standard inhibitor. The MD simulation production run was calculated using the GROMACS MD tool for 100 ns with the explicit solvent model. MD trajectory was further processed, and various characteristics, including mean-square fluctuation (RMSF), protein and ligand mean-square deviation (RMSD), and radius of gyration (RoG), were studied. The details of the experimental setup are provided as supplementary information.

2.8. Generalized born surface area (MM-GBSA) and free energy landscape (FEL)

2.8.1. Binding free energy calculation through MM-GBSA approach

The binding free energy of a small molecule can be used to describe its interaction affinity for creating a stable macromolecular complex [24]. An ensemble of MD simulation trajectories of EthR bound with finally proposed compounds was employed. The final 10,000 trajectory frames determined the ΔGbind value for every complex.

2.9. Binding free energy calculation through the free energy landscape analysis approach

A conformational sampling technique that enables the exploration of conformations close to the native state structure can be used to obtain a protein's free-energy landscape. In this case, the sampling strategy used was the MD simulation technique [25].

3. Results and discussion

3.1. Compound library generation using REINVENT4 and curation

In the initial phase of our study, we identified, curated, and analyzed 36 active EthR inhibitor molecules using data from the literature and the ChemBL database. These molecules were then converted to Simplified Molecular Input Line Entry System (SMILES) format using OpenBabel for further processing. Subsequently, we utilized REINVENT4, a versatile molecular design tool, for molecular optimization and compound library generation. Implemented in Python3, REINVENT4 utilizes reinforcement learning with neural networks and transfer learning for molecule generation. The Mol2Mol model, chosen for our study, incorporated a beam search decoding strategy and a sampling run mode to generate more than 21,000 molecules with a specified degree of resemblance to parent compounds. The curated compounds underwent careful examination, duplicate removal, and structural similarity analysis using Tanimoto scores to improve the pharmacological profiles. Over 12,000 molecules with a Tanimoto score of 0.6 or higher were selected for further investigation, indicating the success of our compound library preparation.

3.2. ADMET analysis using pkCSM

It is crucial to consider the ADME and toxicity profiles to evaluate the pharmacokinetic profile, which includes the qualities of medicinal chemistry and the poisonous nature of any particular molecule. After filtration of the Mol2Mol molecule optimization compound library based on the Tanimoto score, more than twelve thousand molecules were kept. They are then exposed to ADMET estimation using pkCSM. Utilizing internal code to divide 12,198 compounds into smiles files, each of which held 100 SMILES and the specific smiles file format needed to be used as an input in pkCSM. A pkCSM automation script was utilized to execute 129 smile files. One combined CSV file was created from the output of 129 different files. Based on parameters such as intestinal absorption≥30, skin permeability <=-2.5, BBB <=-1, CNS permeability ≤ −3.0, AMES toxicity, hepatotoxicity, skin sensitivity = No, minnow toxicity ≥ −0.3, and maximum tolerated dosage≤0.477, we selected 12 compounds from the ADMET study for further investigation.

3.3. Prediction of binding energy of protein-ligand by molecular docking

3.3.1. Molecular docking – AutoDock Vina-based analysis

Using Autodock vina, each molecule is docked, and the binding energy is noted. Apart from those mentioned above, the PLANTS docking algorithm is also utilized to dock the complete collection of molecules. Furthermore, using KDeep, the optimal docked pose with EthR for every molecule is utilized to forecast the absolute binding energy. After obtaining findings about the active domain of EthR Protein, the preparation of EthR Protein for molecular docking was performed. Then, I formed a grid box on active sites of EthR protein and generated a configuration file in config.txt format for molecular docking. 14 compounds have been used as ligands in molecular docking through the AutoDock Vina approach. AutoDock Vina performed molecular docking of 12 active compounds, co-crystal, and standard compounds, and 14 compounds were docked in total with Prepared Protein EthR. An amino acid interaction study was also conducted for 14 compounds as output. Pdbqt was used to check the amino acid interactions of protein and ligands in pymol visualising software for a comparative study with co-crystallized ligand 2H2 and standard compound(DB05154). Amino acid interactions of co-crystal are Gly106, Tyr148, Asn176, Asn179, Met102, Trp103, Ile107, Phe110, Trp145, Tyr148, Val152 and Trp207. Furthermore, analysis of compound docking outputs found that m4 and m6 compounds show the highest binding affinity towards EthR Protein with −11.5 kcal/mol energy.

3.4. Re-evaluation of screened compounds using PLANTS docking and absolute binding free energy estimation using KDEEP

PLANTS docking was performed by preparing protein and ligand files in.mol2 format with a configuration file of active sites of protein EthR. PLANTS docking of 12 selected compounds, co-crystal and DB05154, were performed, and binding energy was calculated as the total score of protein and ligand molecules and the absolute binding energy of 14 compounds by the KDEEP platform. KDEEP is a freely available server that calculates protein-ligand absolute binding energy in dG (kcal/mol). For analysis of absolute binding energy, ligands should be needed in sdf format, and proteins should be required in pdb format with an index sheet containing the path of Protein and ligand. The KDEEP server calculated the absolute binding energy of 14 compounds, and the absolute binding energy of the protein and ligand of 14 compounds was extracted.

The binding interactions between EthR and the final selected six hit molecules and co-crystal and DB05154 molecules are explored (Fig. 2). A1 showed hydrophobic interactions with Arg99, Met102, Trp103, Tyr148, Val152 residues, and halogen bond with Tyr148 residue. The A2 has formed several hydrophobic interactions with Arg99, Trp103, Ile107, Phe110, Tyr148, Val152, Trp207, and halogen bond with Thr149 residue (Fig. 3). A3 interacts with Pro94, Met102, Trp103, Phe110, Trp145, Tyr148, Val152, Asn179 residues through hydrophobic interactions and also shows pi-stacking with Phe110 residue. A4 showed hydrophobic interactions with Leu90, Pro94, Ala95, Met102, Ile107, Phe110, Trp145, Trp207 residues. A5 showed hydrophobic interactions with Ile107, Phe110, Trp145, Trp207 residues. A6 interacts with Trp103, Ile107, Phe110, Tyr148, Val152, Trp207 residues through hydrophobic interactions. From the binding interactions profile, it is observed that co-crystal ligand (2H2) formed several hydrogen bond interactions with residues Gly106, Tyr148, Asn176, Asn179 and hydrophobic interactions with residues Met102, Trp103, Ile107, Phe110, Trp145, Tyr148, Val152, Trp207 of EthR protein (Fig. 4). It is observed that binding interactions for DB05154 (standard) were several hydrogen bond interactions with residues Tyr148 and Asn179 and hydrophobic interactions with residues Leu87, Trp103, Tyr148, and Val152 and also showed pi-stacking with Phe110 residue and halogen bonding with Met102 residue of EthR protein. Beyond the above, a number of other amino acids are found to form crucial binding interactions with the amino acids of EthR. The list of amino acids and corresponding binding interactions are given in Table 1.

Fig. 2.

Fig. 2

Two-dimensional representation of the compounds A1 - A6.

Fig. 3.

Fig. 3

Protein-ligand interactions of ligands A1 - A6, Co-crystal ligand and DB05154 after molecular docking studies with EthR.

Fig. 4.

Fig. 4

Surface interactions of ligands A1 - A6, Co-crystal ligand and DB05154 after molecular docking studies with EthR.

Table 1.

Interactions profile of AutoDock Vina and PLANTS docking of selected compounds.

Compounds Binding affinity (in Kcal/mol)
Interacting amino acid residues
Synthetic accessibility
KDEEP AutoDock Vina n Hydrogen Bond Other types of interactions
N-(((3S)-7-(1-(cyclopentanecarbonyl)-1,2,3,6-tetrahydropyridin-4-yl)-8-fluoro-1-oxo-3a,4-dihydro-1H,3H-benzo[b]oxazolo [3,4-d] [1,4] oxazin-3-yl)methyl)-2,2,2-trifluoroacetamide (A1) −8.050 −7.9 −111.437 Asn176, Asn179 Arg99, Met102, Trp103, Tyr148, Val152(Hydrophobic interactions), Tyr148(Halogen bond) 4.87
(R)-5-((1H-1,2,3-triazol-1-yl)methyl)-3-(2,2′-difluoro-4'-(2-(4-methyl-1H-imidazole-1-yl)acetyl)-[1,1′-biphenyl]-4-yl)oxazolidin-2-one (A2) −11.680 −11.5 −142.137 Asn179 Arg99, Trp103, Ile107, Phe110, Tyr148, Val152, Trp207(Hydrophobic interactions),
Thr149(Halogen bond)
3.96
(R)-N-(4-(aminomethyl)phenyl)-3-(4-(1,1-dioxidotetrahydro-2H-thiopyran-4-yl)-3-fluorophenyl)-2-oxooxazolidine-5-carboxamide(A3) −7.040 −7.5 −108.27 Thr149 Pro94, Met102, Trp103, Phe110, Trp145, Tyr148, Val152, Asn179(Hydrophobic interactions), Phe110(Pi-stacking) 3.82
N-(2-(1H-imidazole-4-yl)ethyl)-4-(2-(propylsulfonamidomethyl)thiazol-4-yl)benzamide (A4) −6.607 −6.6 −137.822 Met102, Gly106, Tyr148, Asn176, Asn179 Leu90, Pro94, Ala95, Met102, Ile107, Phe110, Trp145, Trp207(Hydrophobic interactions) 3.42
4-(1H-1,2,3-triazol-4-yl)-N-(3,3,3-trifluoropropyl)benzamide (A5) −7.532 −10.3 −111.318 Tyr148, Asn176, Asn179 Ile107, Phe110, Trp145, Trp207(Hydrophobic interactions) 2.11
(S)-2-((4'-(5-(aminomethyl)-2-oxooxazolidin-3-yl)-2′,4-difluoro-[1,1′-biphenyl]-3-yl)oxy)a9cetic acid (A6) −7.325 −6.8 −118.161 Met102, Gly106, Asn179 Trp103, Ile107, Phe110, Tyr148, Val152, Trp207(Hydrophobic interactions) 3.31
DB05154 (standard) −8.745 −9.31 −102.044 Tyr148, Asn179 Leu87, Trp103, Tyr148, Val152(Hydrophobic interactions), Phe110(Pi-stacking), Met102(Halogen bond) 3.66
Co-crystal −10.267 −9.46 −145.908 Gly106, Tyr148, Asn176, Asn179 Met102, Trp103, Ile107, Phe110, Trp145, Tyr148, Val152, Trp207(Hydrophobic interactions) 2.61

3.5. Synthetic accessibility studies

Following the execution of KDeep, the SMILES format of 8 compounds was used for synthetic accessibility analysis on the server. These compounds were filtered and subjected to a more detailed examination based on the available experimental data. The advantages of each selected compound, including efficacy, safety, and potential for pharmaceutical formulation, were also identified. These eight molecules' molecular and pharmacokinetic behaviour was assessed for drug-like properties, focusing on a synthetic accessibility score of less than 5 (Table 1). The goal was to prioritize compounds with favourable pharmacokinetic profiles and practical synthesis feasibility.

3.6. Molecular dynamics simulation

Molecular Dynamics (MD) simulation is indispensable for comprehensively exploring various biological characteristics and the dynamic behaviour inherent in protein-ligand complexes. These complexes, formed by binding small molecules to proteins, hold exceptional significance in biochemistry. Consequently, examining the complex's stability and the binding attributes of the small molecules necessitates thorough exploration utilizing biochemical and biophysical methodologies. For this purpose, all complexes of proposed molecules with EthR were subjected to 100 ns all-atoms MD simulation within an explicit hydration environment. Upon successful culmination of the MD simulation, a comprehensive analysis of the entire trajectory for each complex ensued, encompassing various parameters such as the Root Mean Square Deviation (RMSD) of the protein backbone and ligand, Root Mean Square Fluctuation (RMSF) of individual amino acids, hydrogen bond analysis between the protein and ligand, Radius of Gyration (RoG) of the system, and ultimately, the binding affinity of the molecules. The binding affinity was assessed regarding the binding energy, calculated through the Molecular Mechanics Generalized Born Surface Area (MM-GBSA) approach.

3.7. Root mean-square deviation

The Root Mean Square Deviation (RMSD) of the protein backbone, as derived from the simulation trajectory of the protein-ligand complex, serves as a key metric elucidating the stability of the complex within a dynamic environment (Fig. 5). A higher value of protein-backbone RMSD signifies unfolding, whereas conversely, a lower value indicates a propensity towards compactness in the complex structure. The low fluctuation observed in the backbone RMSD further substantiates the equilibration state of the protein-ligand complex. This low variability in the RMSD of the backbone underscores the balanced and stabilised nature of the complex during the simulation. The RMSD was calculated for each frame and plotted against the simulation time, as shown in Fig. 5. Initially, noticeable deviations in the RMSD were observed, but subsequently, all complexes attained stability. Notably, all complexes gradually increased RMSD until approximately 20 ns of simulation time. Beyond this point, around 20 ns, the complexes demonstrated equilibration with minor deviations, signifying that the systems folded into more stable conditions than their native structures. Complex A2 showed a higher deviation amongst all of them initially until 40 ns, but it also equilibrated at the end of the simulation. Average, maximum, and minimum RMSD were calculated and are given in Table 2. The average protein backbone RMSD was found to be 1.70, 2.29, 0.196, 1.50,1.70, 2.03, 2.04, and 1.80 when bound to A1, A2, A3, A4, A5, A6, CO-CRYSTAL and DB05154 respectively. The above data suggest that the average RMSD of the protein backbone bound with proposed molecules gives almost similar and consistent deviation.

Fig. 5.

Fig. 5

Backbone RMSD bound with A1, A2, A3, A4, A5, A6, Co-crystal and DB05154.

Table 2.

Statistical parameters obtained from MD simulation trajectories.

Parameters A1 A2 A3 A4 A5 A6 Co-crystal DB05154
Backbone RMSD (Å) Average 1.70 2.29 1.96 1.50 1.70 2.03 2.04 1.80
Maximum 2.95 3.94 3.25 2.26 3.24 2.91 3.21 3.07
Minimum 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Ligand RMSD (Å) Average 1.21 1.32 0.92 1.48 0.88 0.98 1.51 1.23
Maximum 2.23 2.15 1.91 2.55 1.38 1.78 2.55 1.96
Minimum 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
RMSF (Å) Average 1.21 1.58 1.37 1.24 1.42 1.35 1.32 1.46
Maximum 4.25 5.86 4.13 3.71 3.83 4.89 2.86 4.56
Minimum 0.53 0.74 0.67 0.51 0.54 0.55 0.65 0.54
RoG (Å) Average 19.21 19.43 19.03 19.34 1.933 19.28 19.07 19.24
Maximum 19.71 20.04 19.63 19.78 19.87 19.99 19.58 19.77
Minimum 18.84 18.96 18.61 18.92 18.96 18.77 18.69 18.80

It is essential to check the deviation of the ligand to the native conformation during the MD simulation. The ligand-RMSD was calculated, and it is given in Fig. 6. The average, maximum, and minimum ligand RMSD values for each of the studied ligands are presented in Table 2. The average RMSD was 1.21, 1.32, 0.92, 1.48,0.88, 0.98, 1.51 and 1.23 respectively. Not a single molecule can be observed deviating more than 0.25 nm. It can be observed that almost all molecules remain consistent towards the end of the simulation except for a few fluctuations; it can be explained that small molecules have remained inside the binding pocket.

Fig. 6.

Fig. 6

Ligand RMSD of A1, A2, A3, A4, A5, A6, Co-crystal and DB05154.

3.8. Root-mean-square fluctuation

The Root Mean Square Fluctuation (RMSF) parameter holds significant importance in elucidating the contribution of individual amino acids to the stability of a protein-ligand complex. This parameter characterizes the fluctuation of each amino acid backbone throughout the simulation concerning its initial orientation in the native state. The RMSF analysis provides valuable insights into the dynamic behaviour of individual amino acids within the complex, offering a nuanced understanding of their roles in maintaining structural stability. In any trajectory, a large RMSF value indicates instability. Otherwise, the residue remains stable. The RMSF of each amino residue of all eight complexes was calculated from the MD simulation trajectory, which is given in Fig. 7. The average, maximum, and minimum for all eight complexes are shown in Table 2. The amino acid residues of the protein complexes bound with A1, A2, A3, A4, A5, A6, Co-crystal, and DB05154 exhibited a nearly uniform fluctuation pattern throughout the simulation. Except towards the end, there was a maximum fluctuation from residue numbers 175 to 200. Unlike in exceptional cases, minimal fluctuations were observed, underscoring the stability of these amino residues in dynamic states. This observed phenomenon may be attributed to amino acid conformational adjustments to accommodate the receptor cavity's ligands. The disparity between maximum and average Root Mean Square Fluctuation (RMSF) values was determined to be 3.72, 5.12, 3.46, 3.17, 3.29, 4.34, 2.21, and 4.02 nm for amino residues of complexesA1, A2, A3, A4, A5, A6, Co-crystal and DB05154 respectively. Such consistently low RMSF values unequivocally indicate the robust stability of each complex during the Molecular Dynamics (MD) simulation.

Fig. 7.

Fig. 7

Root-mean-square fluctuation of A1, A2, A3, A4, A5, A6, Co-crystal and DB05154.

3.9. Radius of gyration

The Radius of Gyration (RoG) stands as a pivotal parameter extracted from the Molecular Dynamics (MD) simulation trajectory, serving as a critical indicator of the structural integrity of the protein-ligand system. Minimal deviation and consistent variation observed in the RoG signify the stable folding of the protein throughout the execution of the MD simulation. The RoG analysis provides valuable insights into the firmness and structural dynamics of the protein-ligand complex, contributing to a nuanced understanding of its behaviour under dynamic conditions. All the complexes were found to show low deviation in Fig. 8. Complex A1 can be observed to show the highest deviation compared to all the other complexes, but it also converged at the end of the simulation. Complex A3 and CO-CRYSTAL showed very low deviation, but most importantly, no unusual deviation was observed. The difference between the highest and lowest R0G is 0.87, 1.08, 1.02, 1.17, 0.91, 1.22,0.89, and 0.97 nm for A1, A2, A3, A4, A5, A6, Co-crystal and DB05154 respectively. Hence, the low RoG value of each system explained the compactness of the protein-ligand complexes.

Fig. 8.

Fig. 8

The radius of gyration ofA1, A2, A3, A4, A5, A6, Co-crystal and DB05154.

3.10. Intermolecular H-bond interactions

The hydrogen bonds formed between the protein and ligand during the Molecular Dynamics (MD) simulation play a pivotal role in anchoring the molecule within the active site cavity. The calculation of inter-molecular hydrogen bonds in each simulation system is depicted in Fig. 9. At least one hydrogen bond is consistently present across most frames in all eight simulated systems. In instances where a few frames exhibit no or fewer hydrogen bonds, it is observed that non-hydrogen bond interactions stabilise the ligands. This underscores the dynamic nature of the protein-ligand interactions, wherein the presence of hydrogen bonds predominantly contributes to the stability of the complex in the active site. The DB05154 shows a maximum number of hydrogen bonds across all the frames, and the average hydrogen bond indicated by each complex is 3. The co-crystal also shows more hydrogen bonds.

Fig. 9.

Fig. 9

Number of hydrogen bonds formed between proposed A1, A2, A3, A4, A5, A6, Co-crystal and DB05154.

3.11. MM-GBSA

The MM-GBSA approach, a well-established method for determining binding affinities, calculated the binding free energies of three proposed molecules DB05154 and co-crystal using 2000 frames of MD simulation trajectories. Results are summarised in Table 3; notably, A2 and A5 exhibited superior binding affinity compared to DB05154. Although A1 and A4 showed slightly lower affinity, this observation aligns with the protein backbone RMSD and RoG values when bound with A1 and A4. Analysis of the binding free energy revealed that no frames displayed positive binding free energy. This suggests that all studied frames contributed to binding interaction affinity towards the target EthR protein to some extent. Consequently, it can be inferred that all proposed molecules possess the potential to act as strong inhibitors/modulators for the target EthR protein.

Table 3.

Results from the molecular docking studies and MM-GBSA calculations.

Ligand Dock Score MM-GBSA (ΔG Kcal/mol) Standard Deviation
A1 −7.90 −51.58 ±2.28
A2 −11.50 −30.38 ±3.55
A3 −7.50 −34.96 ±1.65
A4 −6.60 −44.57 ±7.41
A5 −10.30 −30.54 ±1.54
A6 −6.80 −36.27 ±1.78
Co-crystal −9.46 −21.78 ±5.65
DB05154 −9.31 −28.34 ±3.59

3.12. Free energy landscape (FEL) analysis

Free Energy Landscape (FEL) analysis was conducted to evaluate the impact of small molecule binding on the conformational dynamics of the EthR protein. As depicted in Fig. 10, the FEL provides insights into the stability and conformational changes of EthR protein during the MD simulation. The presence of small molecules influences the free energy landscape, leading to the emergence of multiple minima. Conformational changes generate additional minima, potentially hindering the system from reaching its native state consistently. Conversely, flat minima indicate convergence and stability of the system. Notably, all six proposed molecules, along with DB05154 and co-crystal, demonstrated distinct flat minima, indicating the stability of the EthR protein during the simulation. The presence of thermally stable intermediates within the valley of the FEL suggests stable interactions between the protein and the identified hits.

Fig. 10.

Fig. 10

Free energy landscape for the protein-ligand interactions during the MD simulations with compounds A1, A2, A3, A4, A5, A6, Co-crystal and DB05154.

4. Conclusion

In this study, we employed a comprehensive approach to design and evaluate novel inhibitors/modulators for the HTH-type transcriptional regulator EthR. Through Mol2Mol molecule optimization compound library generation using the AI/ML tool REINVENT4, followed by curation and similarity search, we identified over 12,000 potential compounds for further investigation. Our ADMET analysis using pkCSM allowed us to filter and select 12 compounds with acceptable pharmacokinetic and toxicity profiles for subsequent analyses. Molecular docking using AutoDock Vina and PLANTS revealed valuable insights into the binding interactions of the selected compounds with EthR. Six compounds exhibited the most significant binding interactions with the standard inhibitor DB05154. The subsequent absolute binding free energy estimation using KDeep confirmed the superior inhibitory potential of the selected six inhibitors/modulators. MD simulations provided a dynamic perspective on the behaviour of EthR in complex with the selected compounds, offering valuable insights into conformational changes and critical residues involved in ligand binding. MM-GBSA and FEL approaches were employed to calculate binding free energies, supporting the findings from molecular docking and emphasising the importance of accounting for dynamic behaviours in binding energy predictions. In summary, our six generated inhibitors/modulators demonstrated superior binding affinities and pharmacokinetic profiles compared to existing inhibitors. The integrative use of computational tools, ranging from Mol2Mol molecule optimization compound generation to dynamic simulations, allowed for a robust evaluation of the inhibitory potential of the identified compounds. These findings lay the groundwork for further experimental validation and potentially developing these compounds as effective EthR inhibitors in anti-tuberculosis drug discovery. The identified compounds show promise for tackling multidrug-resistant (MDR) and extensively drug-resistant (XDR) strains of Mycobacterium tuberculosis by targeting the EthR protein, which is crucial for mycolic acid synthesis. Their optimized physicochemical properties and strong binding affinities suggest potential broad-spectrum efficacy against diverse TB strains, making them valuable candidates for further development in combating drug-resistant tuberculosis.

4.1. Future prospects

The proposed de novo drug design method is highly scalable and can be applied to other drug-resistant bacterial infections beyond tuberculosis, such as methicillin-resistant Staphylococcus aureus (MRSA) [26], and even viral infections like HIV. This approach leverages computational tools to identify novel compounds, making it adaptable for various pathogens.

The following steps for advancing the identified compounds toward preclinical or clinical trials include

  • 1.

    In Vitro Testing: Conducting laboratory experiments to evaluate the efficacy and safety of the compounds against Mycobacterium tuberculosis, mainly focusing on drug-resistant strains.

  • 2.

    Pharmacokinetic and Toxicity Studies: Performing detailed pharmacokinetic studies to assess the absorption, distribution, metabolism, and excretion (ADME) properties, along with toxicity evaluations to ensure safety for human use.

  • 3.

    Optimization of Formulations: Developing suitable drug formulations that enhance the bioavailability and stability of the compounds in biological systems.

Challenges in their development as anti-TB therapies may include

  • ˗

    Resistance Development: Mycobacterium tuberculosis has the potential to develop resistance to new compounds, necessitating ongoing monitoring and combination therapies.

  • ˗

    Regulatory Hurdles: Navigating the complex regulatory landscape for drug approval can be time-consuming and resource-intensive.

  • ˗

    Funding and Resources: Securing adequate funding and resources for extensive preclinical and clinical trials, especially given the high costs of developing new antibiotics.

CRediT authorship contribution statement

Rupesh V. Chikhale: Writing – review & editing, Writing – original draft, Validation, Supervision, Investigation, Conceptualization. Rinku Choudhary: Writing – original draft, Methodology, Investigation, Formal analysis. Gaber E. Eldesoky: Supervision, Software, Resources, Project administration. Mahima Sudhir Kolpe: Visualization, Validation, Project administration, Methodology. Omkar Shinde: Software, Investigation, Formal analysis. Dilnawaz Hossain: Software, Data curation.

Data availability

The data related to this work could be provided on request.

Declaration of competing interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:Gaber E. Eldesoky reports administrative support and statistical analysis were provided by Chemistry Department, College of Science, King Saud University, Riyadh 11451, Saudi Arabia. Dr Rupesh Chikhale is an associate editor to the Pharmaceutical Sciences Section of the Heliyon Journal. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The authors are grateful to the Researchers Supporting Project No. (RSP2025R161), King Saud University, Riyadh, Saudi Arabia.

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.heliyon.2025.e42593.

Appendix A. Supplementary data

The following is the Supplementary data to this article:

Multimedia component 1
mmc1.docx (81.6KB, docx)

References

  • 1.Global tuberculosis report. 2024. https://www.who.int/ 2024.
  • 2.1. TB disease burden, Https://Www.Who.Int/Teams/Global-Tuberculosis-Programme/Tb-Reports/Global-Tuberculosis-Report-2024/Tb-Disease-Burden (n.d.). https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2024/tb-disease-burden (accessed February 8, 2025).
  • 3.1.3 Drug-resistant TB, (n.d.). https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2024/tb-disease-burden/1-3-drug-resistant-tb (accessed February 7, 2025).
  • 4.Global Tuberculosis Report 2023, (n.d.). https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2023 (accessed July 8, 2024).
  • 5.Lee J.Y. Diagnosis and treatment of Extrapulmonary tuberculosis. Tuberc. Respir. Dis. 2015;78:47–55. doi: 10.4046/TRD.2015.78.2.47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Dartois V.A., Rubin E.J. Anti-tuberculosis treatment strategies and drug development: challenges and priorities. Nat. Rev. Microbiol. 2022;20(11):685–701. doi: 10.1038/s41579-022-00731-y. 20 (2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sharma S., Chikhale R., Shinde N., Khan A.M., Gupta V.K. Targeting dormant phenotype acquired mycobacteria using natural products by exploring its important targets: in vitro and in silico studies. Front. Cell. Infect. Microbiol. 2023;13 doi: 10.3389/FCIMB.2023.1111997/BIBTEX. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Chikhale R.V., Eldesoky G.E., Kolpe M.S., Suryawanshi V.S., Patil P.C., Bhowmick S. Identification of Mycobacterium tuberculosis transcriptional repressor EthR inhibitors: shape-based search and machine learning studies. Heliyon. 2024;10 doi: 10.1016/J.HELIYON.2024.E26802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chikhale R.V., Abdelghani H.T.M., Deka H., Pawar A.D., Patil P.C., Bhowmick S. Machine learning assisted methods for the identification of low toxicity inhibitors of Enoyl-Acyl Carrier Protein Reductase (InhA) Comput. Biol. Chem. 2024;110 doi: 10.1016/J.COMPBIOLCHEM.2024.108034. [DOI] [PubMed] [Google Scholar]
  • 10.Prasad M.S., Bhole R.P., Khedekar P.B., Chikhale R.V. Mycobacterium enoyl acyl carrier protein reductase (InhA): a key target for antitubercular drug discovery. Bioorg. Chem. 2021;115 doi: 10.1016/J.BIOORG.2021.105242. [DOI] [PubMed] [Google Scholar]
  • 11.Villemagne B., Flipo M., Blondiaux N., Crauste C., Malaquin S., Leroux F., Piveteau C., Villeret V., Brodin P., Villoutreix B.O., Sperandio O., Soror S.H., Wohlkönig A., Wintjens R., Deprez B., Baulard A.R., Willand N. Ligand efficiency driven design of new inhibitors of mycobacterium tuberculosis transcriptional repressor EthR using fragment growing, merging, and linking approaches. J. Med. Chem. 2014;57:4876–4888. doi: 10.1021/JM500422B/SUPPL_FILE/JM500422B_SI_001.PDF. [DOI] [PubMed] [Google Scholar]
  • 12.Chikhale R.V., Choudhary R., Malhotra J., Eldesoky G.E., Mangal P., Patil P.C. Identification of novel hit molecules targeting M. tuberculosis polyketide synthase 13 by combining generative AI and physics-based methods. Comput. Biol. Med. 2024;176 doi: 10.1016/J.COMPBIOMED.2024.108573. [DOI] [PubMed] [Google Scholar]
  • 13.Loeffler H.H., He J., Tibo A., Janet J.P., Voronov A., Mervin L.H., Engkvist O. Reinvent 4: modern AI–driven generative molecule design. J. Cheminf. 2024;16:1–16. doi: 10.1186/S13321-024-00812-5/FIGURES/5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Azarafza M., Azarafza M., Tanha J. COVID-19 infection forecasting based on deep learning in Iran. medRxiv. 2020 doi: 10.1101/2020.05.16.20104182. 2020.05.16.20104182. [DOI] [Google Scholar]
  • 15.Azarafza M., Azarafza M., Akgün H. Clustering method for spread pattern analysis of corona-virus (COVID-19) infection in Iran. J. Appl. Sci. Eng. Technol. Educat. 2021;3:1–6. doi: 10.35877/454RI.ASCI31109. [DOI] [Google Scholar]
  • 16.Nanehkaran Y.A., Licai Z., Azarafza M., Talaei S., Jinxia X., Chen J., Derakhshani R. The predictive model for COVID-19 pandemic plastic pollution by using deep learning method. Sci. Rep. 2023;13(1):1–14. doi: 10.1038/s41598-023-31416-y. 13 (2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Nanehkaran Y.A., Licai Z., Azarafza M., Talaei S., Jinxia X., Chen J., Derakhshani R. The predictive model for COVID-19 pandemic plastic pollution by using deep learning method. Sci. Rep. 2023;13(1):1–14. doi: 10.1038/s41598-023-31416-y. 13 (2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gaulton A., Bellis L.J., Bento A.P., Chambers J., Davies M., Hersey A., Light Y., McGlinchey S., Michalovich D., Al-Lazikani B., Overington J.P. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40:D1100–D1107. doi: 10.1093/NAR/GKR777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Pires D.E.V., Blundell T.L., Ascher D.B. pkCSM: predicting small-molecule pharmacokinetic and toxicity properties using graph-based signatures. J. Med. Chem. 2015;58:4066–4072. doi: 10.1021/ACS.JMEDCHEM.5B00104/SUPPL_FILE/JM5B00104_SI_001.PDF. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Morris G.M., Ruth H., Lindstrom W., Sanner M.F., Belew R.K., Goodsell D.S., Olson A.J. AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J. Comput. Chem. 2009;30:2785. doi: 10.1002/JCC.21256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Korb O., Stützle T., Exner T.E. PLANTS: application of ant colony optimization to structure-based drug design. 2006. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4150 LNCS. [DOI]
  • 22.Dorigo M., Gambardella L.M., Birattari M., Martinoli A., Poli R., Stützle T., editors. Ant Colony Optimization and Swarm Intelligence. 2006. p. 4150. [DOI] [Google Scholar]
  • 23.Jiménez J., Škalič M., Martínez-Rosell G., De Fabritiis G. KDEEP: protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks. J. Chem. Inf. Model. 2018;58:287–296. doi: 10.1021/ACS.JCIM.7B00650/SUPPL_FILE/CI7B00650_SI_001.PDF. [DOI] [PubMed] [Google Scholar]
  • 24.Singh V., Bhoir S., Chikhale R.V., Hussain J., Dwyer D., Bryce R.A., Kirubakaran S., De Benedetti A. Generation of phenothiazine with potent anti-TLK1 activity for prostate cancer therapy. iScience. 2020;23 doi: 10.1016/j.isci.2020.101474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Deka H., Pawar A., Battula M., Ghfar A.A., Assal M.E., Chikhale R.V. Identification and design of novel potential antimicrobial peptides targeting mycobacterial protein kinase PknB. Protein J. 2024:1–11. doi: 10.1007/S10930-024-10218-9/FIGURES/6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Singh G., Soni H., Tandon S., Kumar V., Babu G., Gupta V., Chaudhuri P. (Chattopadhyay), Identification of natural DHFR inhibitors in MRSA strains: structure-based drug design study. Results Chem. 2022;4 doi: 10.1016/J.RECHEM.2022.100292. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.docx (81.6KB, docx)

Data Availability Statement

The data related to this work could be provided on request.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES