Abstract
Computational algorithms and tools have retrenched the drug discovery and development timeline. The applicability of computational approaches has gained immense relevance owing to the dramatic surge in the structural information of biomacromolecules and their heteromolecular complexes. Computational methods are now extensively used in identifying new protein targets, druggability assessment, pharmacophore mapping, molecular docking, the virtual screening of lead molecules, bioactivity prediction, molecular dynamics of protein–ligand complexes, affinity prediction, and for designing better ligands. Herein, we provide an overview of salient components of recently reported computational drug-discovery workflows that includes algorithms, tools, and databases for protein target identification and optimized ligand selection.
Computational algorithms and tools have retrenched the drug discovery and development timeline.
Introduction
Proteins are critical cellular components that play critical role(s) in myriad physiological functions and signal transduction processes during an organism's life cycle.1,2 Most diseases have been found to be associated with an abnormalities in signal transduction networks mediated by various proteins, which in turn become primary targets for any medication or therapy.3 Thus, a significant number of modern-day pharmaceutical drugs target specific proteins inside the human body or pathogens with an aim to modulate their activity.4 Current FDA-approved drugs target 754 distinct human proteins majorly belonging to four protein families: ion channels, enzymes, transporters, and receptors.5 Additionally, the DisGeNET database incorporates about 17 000 genes and 117 000 genomic variants associated with more than 24 000 diseases and traits involving several cancers and various neurological, systemic, cardiovascular, and pathogenic diseases.6
Thus, the assessment of sequence–structure correlations, biochemical properties, and associated signal transduction networks could lead to the identification of newer druggable protein targets. Traditionally, lead molecules were discovered using an extensive high-throughput screening (HTS) process often known as forward pharmacology that even today remains a tedious, time consuming, and expensive process.7,8 The entire drug-discovery process including all phases of clinical trials might take approximately 10–15 years and cost more than USD 2.5 billion.9 Hence, it remains crucial to streamline the traditional drug-discovery process with the aid of faster and low-cost computational drug design and screening methods. The past decade has seen a significant surge in these methodologies owing to enormous advancements in cryo-electron microscopy, X-ray crystallography, and NMR spectroscopy, which have led to the enhanced availability of high-resolution structures.10,11
This has boosted reverse pharmacology-based rational drug-design approaches where the structural information of protein targets is used for screening novel lead molecules. In this approach, the design of new lead molecules depends on the availability of the residue level and structural details of druggable protein targets. Heteromolecular interactions between proteins and lead molecules majorly involve non-covalent interactions with the amino acid side chains present in the binding cavity. High-resolution residue maps help in modifying the chemical moieties of ligands for obtaining an optimized fit, and hydrogen bonding and non-covalent interactions suitable for the binding cavity inside the target protein. Nonetheless, the screening of customized lead libraries designed for specific target proteins still requires considerable time and cost. Thus, computational approaches for identifying drug-binding sites, the virtual screening of ligand libraries, and studying the heteromolecular interactions of screened leads with the target proteins have been developed for scaling down the drug-discovery timeline.12
Besides, the increased availability of various types of omics data, including gene and protein expression data, has also immensely aided the whole process. At present, several public databases, including STITCH and DrugBank, exist that contain comprehensive information about drug targets.13,14 Moreover, the analysis of druggable protein targets and lead molecules using computational approaches can even assist in the late stages of lead optimization and the development of better molecules. Eventually computationally aided reverse pharmacology has resulted in the development of several FDA-approved drugs, including saquinavir, ritonavir (HIV/AIDS), zanamivir (influenza virus), boceprevir (hepatitis C), captopril (hypertension), and dorzolamide (glaucoma), in recent times.15,16
Currently, several efficient molecular docking algorithms exist that can help in determining the structural features of the target–ligand complexes.17 These algorithms have even been applied for studying allosteric and multiple ligand binding sites in protein structures. Further, these algorithms are also structured to predict the biological activity of lead molecules against a specific target(s), while configurational sampling in docking algorithms can help predict the binding affinities of lead molecules with high accuracy. The analysis of ligand–protein complexes is further rationalized with the inclusion of physicochemical factors, such as the hydration of residues and the flexibility of ligand molecules.18 These and other quantum-mechanical parameters implemented by molecular dynamics (MD) simulations further enhance the precision of ligand–protein interaction analysis. At present, several small, drug-like molecules, chemicals, and natural product libraries are available for performing virtual screening (VS) against identified, druggable protein targets.19 In fact, the proven applicability of computational VS and analysis has resulted in a dramatic reduction in high-throughput drug-screening costs.
Herein, we provide a comprehensive overview of the computational screening of lead molecules against druggable protein targets. The review is organized into three major sections, wherein the first section discusses the major computational approaches for identifying druggable protein targets (Fig. 1). The second section is focused on the process of the VS of lead molecules against identified protein targets. It summarizes information on VS methodologies utilizing small-molecule/drug libraries, various docking algorithms, and the physicochemical features of the binding cavity. The last section emphasizes the analysis of ligand–protein complexes, assessment of the stability, and affinity predictions based on MD simulations. Overall, we provide an in-depth discussion of computational approaches that can aid in accelerating the drug-discovery process.
Fig. 1. Systematic computational workflow for predicting novel protein targets, the virtual screening of lead molecules, and the analysis of protein–ligand complexes for optimized lead discovery. Left panel shows two major methods for predicting novel target proteins, i.e., based on gene expression profiling and sequence–structure comparisons. Middle panel shows the second step after target identification and involves structure extraction or model building, binding cavity prediction, and docking-based virtual screening (DBVS) using virtual drug/small-molecule libraries. Right panel shows a schematic of a solvated cubic box depicting the periodic boundary conditions for the MD simulations of the protein–ligand complex.
Computational approaches for identifying druggable protein targets
Identifying druggable targets requires significant evidence of the therapeutic utility of certain proteins through their interactions, activity, and mechanistic role(s) in biological processes. The structures of previously identified target proteins can provide crucial information about the mechanism of drug binding and their mode of action. Currently several databases and tools harbor comprehensive details of known drug targets with details about their pathological associations, structure, binding with small molecules/drugs, and other signaling network interactions (Table 1).
List of the major computational tools and databases for drug-target identification and their interactions with known small molecules/drugs. In each case, the working principle and/or major components are mentioned along with the cited references.
Database/tool | Salient features | Reference |
---|---|---|
ChEMBL | • Target search options based on keyword, BLAST, and classification hierarchy | PMID: 27899562 |
• 2D-structure, bioactivity, and biochemical properties of drug-like small molecules | ||
DrugBank 3.0 | • Both target and drug/molecule search options | PMID: 21059682 |
• Information on nomenclature, ontology, structure, function, and pharmacological properties | ||
STITCH 5 | • Search and visualize both ‘experimentally validated’ and ‘bioinformatically predicted’ interactions of proteins and small molecules | PMID: 26590256 |
DGIdb 4.0 | • Drug–gene interaction | PMID: 33237278 |
• Potentially druggable category | PMID: 24122041 | |
Therapeutic Target Database (TTD) | • Database of known protein/nucleic acid targets, associated disease, pathway information, and known drugs against these targets | PMID: 31691823 |
Binding DB | • Database of measured binding affinities of known drug targets with small, drug-like molecules | PMID: 17145705 |
DINIES | • Predict interactions between drug molecules and target proteins, based on drug data and omics-scale protein data | PMID: 24838565 |
CSNAP | • Compound target identification based on network similarity graphs | PMID:25826798 |
BioGRID 4.4 | • Database that archives and disseminates genetic and protein interaction data from model organisms and humans | PMID: 16381927 |
DTO | • Database of standardized classifications and annotations of druggable protein targets | PMID: 29122012 |
CellMiner™ | • Database and query tool for molecular and pharmacological data for the NCI-60 cancerous cell lines | PMID: 19549304 |
LINCS | • Database of cell-based perturbation-response signatures, along with novel data analytics tools | PMID: 29140462 |
ECOdrug | • Tool for extracting drugs and conservation of their targets across species | PMID: 27910877 |
Mantra 2.0 | • Tool for the analysis of the mode of action (MoA) of novel drugs and identification of known and approved candidates for ‘drug repositioning’ | PMID: 20679242 |
PMID: 24558125 | ||
PharmMapper | • Tool for identifying potential target candidates for the given probe small molecules (drugs, chemical molecules, and natural products) | PMID: 20430828 |
PMID: 28472422 | ||
SwissTargetPrediction | • Estimates most probable macromolecular targets of a small molecule | PMID: 24048355 |
PMID: 31106366 | ||
Connectivity Map (CMap) | • Genome-scale library of cellular signatures in multiple cell types enlisting transcriptional responses to chemical, genetic, and disease perturbation | PMID: 17008526 |
PMID: 29195078 | ||
TargetNet | • QSAR-based predicting tool for the binding of multiple targets for a queried molecule | PMID: 27167132 |
Open Targets Platform | • Database for visualization of potential drug targets associated with a disease | PMID: 33196847 |
Drug2Gene | • Knowledge base containing 4 372 290 unified compound/drug–gene/protein information from 19 publicly available databases | PMID: 24618344 |
DSigDB | • Database for >18k computational drug signatures and >1200 approved drugs with gene sets | PMID: 25990557 |
These databases could be utilized both for studying specific drug targets as well as their known or predicted binding interactions with drugs and other reported small molecules. Owing to the enormous increase in time and cost of developing new drugs, several computational approaches have been reported for predicting and identifying novel target proteins.20 These approaches mainly involve analyzing gene expression profile under certain conditions (disease or perturbing environmental factors), evidence-based extrapolation of molecular interactions, comparison with analogous protein family members and organisms, and bioinformatically assigned putative roles.21 Further, sequence–structure analysis of the identified target protein generates more information about probable druggable sites and can help in defining the pharmacophore.
Gene expression profile-based target identification
Current advancements in gene expression analysis have paved the way for the development of new methods for studying the mechanism of action of drugs and small-molecule inhibitors. Transcriptomics data obtained under control and drug-induced conditions can help examine the expression profile of a multitude of genes longitudinally and concurrently.22 Several computational algorithms have been developed in the past few years that could predict potential drug targets based on the gene expression data. The major transcriptome-based drug-target-prediction tools include LINCS, CellMiner™, and the Connectivity Map (CMap) (Table 1).23–25 These tools are utilized for analyzing the transcriptome signature of different cells treated with a drug to obtain a drug-specific set of genes that can further lead to the prediction of potential drug targets.
The Connectivity Map (CMap) database initially comprised the genome-wide expression profiles of 164 drugs evaluated across four standard cell lines. The 564 gene expression profiles from the entire data set were combined to identify connections between the mechanism of action (MOA) of drugs, different genes, and the disease state. In its later version, CMap was extended to 1309 approved small molecules evaluated on five human cell lines, generating more than 7000 gene expression profiles.
Recently, the database was 1000-fold scaled-up using a high-throughput reduced representation expression profiling method termed as L1000.26 CMap now includes more than a million L1000 profiles from 42 080 perturbagens (19 811 small-molecule compounds, 18 493 shRNAs, 3462 cDNAs, and 314 biologics) that can be utilized for discovering the MOA of small molecules, functionally annotate genetic variants of disease genes, and inform clinical trials. CMap has been widely used for studying gene signatures for perturbations induced by a disease, pathology, or drug/small molecule, which can be retrieved with statistical significance for associating them with the associated phenotype. Using CMap, several therapeutic molecules have been identified against cancer, Alzheimer's disease, inflammatory bowel disease, Gaucher disease, pain management, and muscle atrophy through a significant number of studies.27–34 Another tool Mantra was developed based on drugs with similar signatures (drug–drug network; DDN) extracted from CMap (Table 1). The tool helps in analyzing drug-induced gene expression profiles based on connected DDNs that target the same proteins or pathways.35
Besides, the NCI-60 cell line dataset has been a crucial data source for drug discovery with expression profiles of over 400 000 compounds across 60 cell lines.36 Recently, an improvised CellMiner tool was developed that allows the data mining of transcripts for 22 379 genes and 360 microRNAs for more than 20 000 small molecules, including 102 FDA-approved drugs (Table 1). CellMiner extends the applications of the NCI-60 cell line dataset manifold by providing a retrieving platform for complex analyses, including multidrug resistance analysis, the identification of colon-specific genes, microRNAs, and microRNAs associated with a specific cluster, and drug identification patterns associated with specific therapeutics.24 Similarly, the library of integrated network-based cellular signatures (LINCS) data portal (LDP) comprehensively enlists perturbation-response signatures for gene expression across diverse model systems and assays.23 The algorithm incorporates a signature search functionality, conditions that mimic or reverse gene sets based on queried perturbations (chemicals/drugs), model systems, and gene expression signatures. In most cases, multiple compounds are identified that bind to several protein targets and so an additional enrichment analysis is carried out using a background dataset (>1000 genes). A gene ontology molecular function (GO MF) is utilized to screen redundant protein–compound interactions. Currently, more comprehensive knowledge bases that integrate compound/drug–gene/protein information from multiple databases, such as Drug2Gene and the Drug Signatures Database (DSigDB), are also available.137,138
Drug2Gene incorporates large-scale information about more than 4 372 290 unified relations between compounds and their targets with reported bioactivity data. It provides a web interface that integrates 19 publicly available source databases and hosts over 28 million compounds, 11 million genes, and more than 4 million compounds-target interactions.137 Similarly, DSigDB utilizes gene set enrichment analysis (GSEA) for analyzing drugs/compounds–target genes interactions of over 17 389 unique compounds covering 19 531 genes.138
Overall, the availability of an enormous amount of gene expression data has opened up newer avenues for predicting drug targets, drugs, and other therapeutics as well as the repurposing of both approved and rejected drugs. Additionally, the gene expression profile-based identification of drug targets also provides crucial insights into the MOA of drugs and other therapeutic molecules.
Sequence–structure correlation-based target identification
The identification of new druggable protein targets can be achieved by understanding clinical phenotypes as well as the pathological mechanisms of diseases. Indeed, exploring the mechanism and functional roles could yield information about specific sets of proteins involved in the pathogenesis that can function as potential drug targets.37,38 In fact, identifying the associations of potential protein targets with disease pathways can often discern their natural ligands, which can then be utilized for designing analogous therapeutic molecules.
A useful approach to extract functional connections among an identified set of proteins is to obtain information about their direct and indirect biological interactions.39 In this context, tools like ingenuity pathway analysis (IPA) can help in the rapid visualization of complex omics data and disease pathways, and can predict downstream effects that may lead to new target or biomarker discovery.40 More analysis of the entire sets of potential drug targets can be done by analyzing the functional connections between diseases, identified genes, and drugs using CMap (Table 1). Additionally, the STRING database can be utilized for integrating all known and predicted interactions between proteins based on data mining, annotated pathways, experimental evidence, and co-expression and neighboring gene conservation information.41,42
Several reports exist on identifying potential drug targets based on disease mechanism-based approaches followed by sequence–structure analysis. Most of these reports involve a rational approach where the reported experimental evidence was amalgamated with data from sequence conservation and the structural analysis of identified proteins for screening potential drugs/small molecules. Two simultaneous reports on natural coumarinolignoids isolated from the seeds of Cleome viscosa that demonstrated anti-inflammatory and immune-modulatory effects were previously published based on the above approach.43,44
The authors utilized quantitative structure–activity relationship (QSAR) modeling for discerning the correlations between the structural properties of the molecules and their experimentally validated in vivo activity in Swiss albino mice. Another report on drug screening against l-asparaginase enzymes of Leishmania donovani (LdAI) based on its critical role in pathogenicity and drug resistance was also published in the past.45 The authors analyzed LdAI conservation across other pathogenic organisms, followed by probing its catalytic site in a homology-based structural model. Using this integrated computational approach involving molecular modeling, docking, and molecular dynamics simulations, the authors reported five potential lead molecules. In their follow up study, the authors showed that in the presence of two of the inhibitors, L1 and L2, the survival of L. donovani was compromised whereas the overexpression of LdAI in these cells restored viability.46
Another rigorous differential screening protocol based on sequence–structure analysis was also utilized for screening active-site-based inhibitors against the l-asparaginase enzyme (MtA) of Mycobacterium tuberculosis (Mtb).47 This enzyme plays crucial role in the survival of Mtb inside the acidic microenvironment inside human alveolar macrophages. Several other reports have utilized similar approaches for identifying lead molecules against protein misfolding diseases, such as Alzheimer's disease (AD), Parkinson disease (PD), and transmissible spongiform encephalitis (TSEs) or prion diseases.48–52 Similarly, in a report on gelsolin amyloidosis, the authors screened small-molecule inhibitors against the critical amyloid forming stretches (residues 182–192) of the gelsolin protein.53 Two of the identified molecules showed strong heteromolecular associations and a potential abrogation of amyloid-induced toxicity in neuronal cell models. The authors later reported an improved efficacy of the identified molecules based on a nanoformulation.54
Recently, a few reports have also demonstrated the utility of a similar approach in screening small-molecule, peptides, and peptidomimetic inhibitors against the human prion protein and Aβ-42 polypeptide associated with TSEs and AD pathologies, respectively.52,55,56 Additionally, a comprehensive study exploring the inhibitory effects of a few DNA intercalators on various amyloid aggregates based on a sequence–structure approach was also reported.57 Most of these reports commonly included a sequence–structure pre-analysis step before proceeding with the screening of potential drug-like molecules.
Molecular modeling and druggability assessment
After the identification of the target protein, the foremost step is the analysis of the structure and substantiating the presence of a druggable pharmacophore. The 3D structure of proteins and other biomacromolecules are typically determined using high-resolution spectroscopic techniques, such as X-ray crystallography, NMR, or cryo-electron microscopy. In cases the structure is not experimentally determined, then structure-prediction algorithms are utilized, which can be categorized as homology modeling, threading, and ab initio modeling.58,59
Homology modeling is the preferred method if the information on the sequence similarity and identity of a known homolog (>30% match) with the experimentally determined structure is available. Further, if the sequence similarity is below 30% then the threading or ab initio modeling approaches are utilized for getting structural models of the target proteins. The threading approach compares a protein sequence against a library of structural templates to identify folds with the best scores for adapting these into a predicted model. On the contrary, ab initio modeling methods utilize information on the folding energetics to predict the 3D model with the minimal or lowest free energy. At present, several standalone software packages, including MOE (Molecular Operating Environment, CCG, Canada), Prime (Schrödinger, USA), and DSModeler (Accelrys Inc. USA), and automated homology modeling servers, such as SWISS-MODEL, PHYRE2, ROBETTA, and I-TASSER (Iterative Threading ASSEmbly Refinement), are available for obtaining structural models.60–63
The advent of AI-driven computational methods has drastically altered the landscape of protein structure prediction. By drawing on the power of neural networks trained on expansive sequence and structure datasets, protein conformations can now be predicted with remarkable fidelity, expediting the identification of potential drug targets. AlphaFold is a powerful tool for predicting protein structures.136 It uses databases like UniRef90 and PDB to gather information about amino acid sequences and known protein structures. This data helps AlphaFold build a detailed picture of how a protein might fold, resulting in highly accurate 3D models. Besides, RoseTTAFold is another cutting-edge protein modeling tool that leverages deep learning and a unique three-pronged analysis of protein data to achieve superior results.139 Similarly, ESMFold predicts protein structures using a language model-inspired design. It learns by analyzing protein data with hidden amino acids, forcing it to guess missing pieces and understand the structure.140
The validation of 3D models is widely done using Ramachandran plots that show energetically allowed and disallowed regions based on plots of the backbone dihedral angles ψ against the φ of amino acid residues present in any protein structure.64 Besides, other quality assessment methods are also available that utilize root-mean-squared deviation (RMSD) and other criteria based on energetics, bonding, and local scoring methods. A few of the widely used quality assessment tools include Verify3D, ProsaII, Errat, and Procheck.65–68
Next, the predicted structural models are analyzed for the presence of sites involved in intermolecular interactions based on their known or unknown functions. Sequence motif databases, like PROSITE, have been widely utilized for identifying residues involved in functional or interaction sites in protein structures or models.69 Since most functional regions result from mutation and selection in a protein sequence, methods like ‘evolutionary trace’ depicting highly conserved spots on the structural model remain equally useful.70 Since mutations or insertions play a critical role in determining pathological transformations (loss or gain of function) in proteins, several algorithms have been developed that can provide information on mutational or single nucleotide polymorphism (SNP) variants.71 These include the SIFT (Sorting Intolerant from Tolerant) algorithm, which predicts the effect of SNPs on protein function, PROVEAN (Protein Variation Effect Analyzer), which predicts the effect of mutations/insertions on biological functions, and MetaMapR, which provides information on the alterations in metabolic networks associated with proteins.72–74 Further structural insights on the 3D models or available structures can be obtained using visualization software tools, such as PyMOL (Schrödinger Inc. USA), VMD, or other similar software.75
Druggability assessment of the identified target protein and its 3D model is performed to predict if their activity could be modulated by a drug or drug-like molecule. In most cases, the identified target proteins have binding sites or catalytic sites that confer them with specific functional properties. These targets must have binding sites with typical structural and physicochemical properties that favor binding interactions with high affinity and specificity. Several computational algorithms exist that can help in evaluating druggability based on the sequence or structure-based properties of the target protein.76 However, the precision of these methods is dependent on identification of possible binding sites on the target protein.
Virtual screening of ligands against druggable protein targets
The next step after obtaining a druggable protein target is to identify bioactive molecules that could bind to it and modulate its functions, such as catalytic activity, interaction(s), or signaling network(s). Molecular modeling methods help in predicting cavities in the target protein that could function as binding sites for lead screening. These binding cavities are utilized for generating initial pharmacophores that could be utilized for selecting appropriate ligand libraries, followed by VS (DBVS) using a docking algorithm (Fig. 1). The efficacy of docking algorithms has enormously improved in the past few years and several studies have shown the utility of consensus docking approaches for improving the scoring and pose prediction.77,78 This has further substantiated VS-based hit identification, which has consequently been adapted by both industry and academia.
Binding pocket identification and druggability
The identification of binding sites remains the most crucial step for determining druggability of target proteins. The availability of crystallographic or NMR structures of protein, or protein–ligand or protein–small-molecule complexes can help in corroborating a druggable pharmacophore. However, if the structural information is not yet available, computational methods can be utilized to predict structural as well as binding pocket information. Several computational algorithms and tools are available for binding pocket identification based on the different properties of amino acid residues in proteins. Binding sites are universally understood as cavities that could accommodate drug or drug-like molecules and later modulate protein's functional properties.
Most available tools either utilize geometry-based, template-based, or energy-based approaches to identify cavities in target proteins that could be potential binding sites.79Table 2 lists the major binding site tools/algorithms broadly based on these three approaches. Geometry-based methods utilize the probe-based detection of cavities using certain physico-chemical parameters, such as the solvent accessible surface area of residues and cluster of residues. These approaches provide faster predictions and are not majorly affected by missing sequence or structural information. Recent improvisations in the geometry-based prediction methods have been reported based on the simulation of different geometric measures from the available sequences and structures. The SURFNET algorithm remains the pioneer for geometry-based predictions and utilizes a spherical probe moving at a tangent to the surface residues.80
List of the major computational tools for identifying binding pockets in target proteins based on different algorithm types. In each case, general information on the prediction approach along with the cited references is provided.
Tool | Algorithm type | Approach | Reference |
---|---|---|---|
CurPocket | Geometry-based | Curvature-based cavity detection | PMID: 31263275 |
Patch-Surfer 2.0 | Geometry-based | Physicochemical properties of the local regions | PMID: 25359888 |
MSPocket | Geometry-based | Detecting pockets on the solvent-excluded surface of proteins | PMID: 21134896 |
Fpocket | Geometry-based | Voronoi tessellation and alpha spheres | PMID: 19486540 |
CASTp | Geometry-based | Solvent accessible surface concavities | PMID: 29860391 |
SURFNET | Geometry-based | Molecular cavities and indentations displayed as surfaces | PMID: 8603061 |
LIGSITE | Geometry-based | Cubic grid-based detection of pockets on protein surface | PMID: 9704298 |
ConSurf | Template-based | Evolutionary conservation of amino acids or nucleic acids | PMID: 27166375 |
FINDSITE | Template-based | Threading-based binding site similarity in template structures | PMID: 19324930 |
3DLigandSite | Template-based | Residue conservation and binding site homologous structures | PMID: 19626715 |
S-SITE | Template-based | Sequence profile alignment | PMID: 23975762 |
TM-SITE | Template-based | Binding-specific substructure comparison | PMID: 23975762 |
SiteHound | Energy-based | Regions having favorable interactions with a probe molecule | PMID: 19398430 |
QSiteFinder | Energy-based | Clustering of energetically favorable probe sites | PMID: 15701681 |
FTSite | Energy-based | Probe-based mapping of energetically favorable regions | PMID: 22113084 |
SFCscore | Machine learning-based | Regression analysis of structure-derived descriptors | PMID: 18442132 |
MetaPocket | Machine learning-based | Consensus prediction from LIGSITE, PASS, QSiteFinder, and SURFNET | PMID: 19645590 |
COACH | Machine learning-based | Consensus prediction from multiple tools, including TM-SITE, and S-SITE | PMID: 23975762 |
LigandDSES | Machine learning-based | Random forest classifier-based similarity in target and training dataset | PMID: 26661785 |
Taba | Machine learning-based | Mass-spring system and supervised machine learning | PMID: 31410856 |
DeepDTA | Deep learning-based | Convolutional neural networks (CNNs) and sequence information | PMID: 30423097 |
DEEPSite | Deep learning-based | CNN-based evaluation with distance and volumetric overlap approach | PMID: 28575181 |
DeepDrug3D | Deep learning-based | CNN modeling on voxels assigned interaction energy-based attributes | PMID: 30716081 |
The size of the sphere volume is varied based on higher (decreased) or lower (increased) residue atom clashes with the moving probe. Similarly, the LIGSITE tool involves enveloping the target protein with a 3D mesh that scans in different directions, and scoring is based on binding grid.81 The recently developed CurPocket algorithm generates a set of points representing the solvent accessible surface to determine a curvature factor for each point and identify possible binding sites.82 Similarly, Patch-Surfer 2.0 utilizes 3D-Zernike descriptors and approximate patch positions to identify different patches corresponding to a possible binding pocket.83
Template-based prediction methods utilize the available cavity or binding pocket information from similar or homologous proteins. This method is based on cataloging the spatial and geometrical features of available protein–ligand structures or templates present in the PDB. Additionally, template-based methods can be either structure-based or sequence-based. Overall, this method compares similar homologs with available sequence or structural information about the binding site and applies this to the target protein. Nevertheless, the reliability of this approach depends hugely on the structural alignment program and degree of structural similarity. One of the earliest template-based algorithms was ConSurf, which estimates the evolutionary conservation of amino acids/nucleic acids in protein/DNA/RNA molecules based on phylogenetic associations between homologous sequences.84 Another program, FINDSITE utilizes a threading algorithm for identifying and overlaying template structures with ligands onto the target protein using TMalign.85 Similarly, the 3DligandSite algorithm scores the similarity between the target protein and its template structures for identifying ligand binding sites.86
Most recently, the S-SITE and TM-SITE algorithms were reported that utilize the Needleman–Wunsch algorithm and binding-specific substructure comparison to predict the binding cavity in the target protein using consensus voting.87 Besides, energy-based methods are available that rely on estimating the interaction energy between the residue atoms of the target protein and the probe molecule. Thus, the accuracy depends on the preciseness of the sequence and the structure of the target protein. The most widely used algorithm is FTSite, which considers a grid-based scanning of the target protein using 16 different small-molecule probes that are later clustered and ranked based on free energy functions for predicting the binding site.88 SITEHOUND is another multiple probe-based algorithm that utilizes a carbon probe for identifying binding sites for drug-like molecules and a phosphate probe for discerning phosphorylated ligands, such as ATP or phosphopeptides.89
In the past few years, machine learning (ML) methods have been implemented in biological and chemical sciences. Basically, ML techniques utilize newer models, such as a neural network (NN) or support vector machine (SVM), that are more accurate prediction methods compared to conventional models. COACH is a popular SVM-based binding site prediction algorithm that combines template-based and sequence information from S-SITE and TM-SITE and compares them to three other algorithms, namely COFACTOR, FINDSITE, and ConCavity (Table 2).87 More recently, more complex ML techniques, like deep learning, have been developed that simulate human brain learning for building NNs for interpreting and predicting data. Deep-learning methods involve convolutional neural networks (CNNs), deep belief networks (DBNs), and self-encoding neural networks. In this context, the DEEPSite algorithm considers protein structures as 3D images and segregates them into size voxels for sampling certain atomic attributes, such as hydrophobicity and H-bond donor/acceptor information, through a CNN-based model.90 Similarly, the DeepDrug3D algorithm identifies binding pockets based on molecular interaction patterns and the physicochemical properties of ligands and the target protein using CNNs.91
Finally, the physicochemical, and geometric attributes of the identified binding pocket are compared with the properties of drug-like molecules for ascertaining druggability. Also, the properties of the binding cavity should match with the drug molecules for substantiating their ‘drug-likeness’. The drug likeness of small molecules or leads has been mostly discerned to date using the Lipinski's rule of five (H-bond donors ≤ 5, H-bond acceptors ≤ 10, molecular weight < 500 Da, and partition coefficient log P < 5). Often, a large set of physiochemical descriptors, such as volume, size, H-bonding, hydrophobicity, and polarity, and geometrical descriptors, including the shape and size of the binding cavity, are utilized for comprehensively establish druggability. Besides, the prediction of biological activity, including adsorption, distribution, metabolism, excretion, and toxicity (ADMET) profile, further filters potential drug-like molecules. Early screening of the ligand library for ADMET properties removes molecules with poor bioactivity from the screening protocol, which further reduces the cost of the drug-development process. Several tools and databases, including DSSTox, TSAR (Accelrys Inc., USA), ADMET-score, and ADMETlab 2.0, are available for discerning the ADMET and associated biological activities of compounds.92–94
Docking-based virtual screening
At present, VS remains a fairly accurate method for identifying potential lead molecules against target proteins. VS involves the computational screening of many small molecules or drug-like compounds against the binding site in the target protein structure (Fig. 1). Several small molecules or drug-like molecules databases are available now, which has significantly reduced the timeline of lead discovery. These include large databases, such as ZINC (35 million compounds), ChemSpider (65 million compounds), and GDB17 (∼166 billion molecules), and a few other natural product libraries, such as the Traditional Chinese Medicine Database (TCM; 20 000 compounds).95–98
Besides, VS is majorly classified as either a ligand-based (LBVS) or structure-based (SBVS) virtual screening methodology.99 While LBVS screens ligands based on a consensus pharmacophore based on several descriptors, the SBVS approach involves docking a large library of molecules in the identified binding cavity followed by evaluation, scoring, and ranking. Owing to the availability of an enormous number of protein and other biomacromolecule structures, the SBVS approach remains the most preferred method for virtual screening.
A battery of different filters is also included in the SBVS workflow, where preliminary screening is done based on chemical similarity, Lipinski's rule of five, and Veber's rule along with other screening measures, including pharmacophore modeling and quantitative structure–activity relationship (QSAR) models.100 This combination of filters and screening measures along with the docking algorithm is comprehensively termed as docking-based virtual screening (DBVS).101 A significant example of the DBVS approach is the identification of a small-molecule (HTS-466284) inhibitor of type I TGF-beta receptor kinase, which was also validated by experimental and cell-based screening too.102,103 In recent years, several DBVS-based studies have reported promising lead molecules against protein targets linked to a variety of pathological conditions, including leishmaniasis, TB, prion disease, Alzheimer's disease, gelsolin amyloidosis, and diabetes.45,47,53,55,56,104,105 Molecular docking is employed to predict the binding poses of one or many lead molecules against a binding cavity present in a target protein. The high-ranking binding poses are evaluated based on their comparative stability, and are often further evaluated by the presence of non-covalent interactions and predicted binding energies.
Prediction of the binding energy is done either by employing force fields or by empirical and knowledge-based scoring functions.106 Consensus scoring involving two or more scoring functions are considered more reliable alongside the newly reported ML scoring functions that perform comparative scoring based on protein–ligand templates available in structural and chemical databases.107 Docking may involve either flexible ligand or flexible protein approaches that are fast or a more precise but slow induced-fit methodology that considers the flexibility of both the protein and ligand. Subsequently, several docking algorithms based on distance geometry, evolutionary programming (genetic algorithms), Tabu searching, simulated annealing, fast shape matching, incremental construction, and Monte Carlo (MC) simulations have been reported.108
At present, GOLD (Genetic Optimization for Ligand Docking) has been widely used for the docking-based screening of lead molecules and studies on the heteromolecular associations of a number protein–ligand complexes.109 The algorithm generates a grid over the binding cavity followed by rigorous hydrophobicity estimation using the Lennard-Jones potential between the atoms in the protein pocket and an sp3 carbon probe. Another program, Glide (Schrödinger, USA) utilizes a comprehensive search algorithm incorporating conformational, orientational, and the positional space of the docked molecule.110 This is followed by filtering several ligand poses based on torsionally flexible energy optimization and later refinement using Monte Carlo sampling. Besides, SwissDock remains a prominent web-based docking server that is based on the EADock DSS algorithm.111 It generates several binding modes for a ligand inside a virtual box or close to the binding cavities, following which binding energy estimations are done and the lowest energy poses are clustered.112
A few other web-based servers, including PatchDock, EDock, and ParDock, have been reported extensively in a variety of DBVS studies.113–115 Besides these, AutoDock is another widely reported tool that is available both as a standalone system and as implemented in server mode.116 AutoDock provides several search algorithms, including the Monte Carlo simulated annealing algorithm, GA, and a hybrid local search GA or the Lamarckian genetic algorithm (LGA). The outcome of search algorithms is utilized for predicting optimal docking poses, which are later clustered and ranked to obtain the ligand accessible conformational space. Subsequently, the comparatively faster AutoDock Vina algorithm was also developed with an easier docking methodology for non-experts together with an additional ability for virtual screening.117
Analysis of lead molecules and target protein complexes
Studying the binding interactions of potential lead molecules with the target protein remains the ultimate step for most computational drug design and development pipelines. This step provides structural insights into critical factors responsible for the association, dissociation, and equilibration of the protein–ligand complexes. Structural biology techniques, including X-ray crystallography, NMR, or cryo-electron microscopy, provide residue level details but fail to provide much information on the conformational transitions in a protein–ligand complex. Apparently, molecular dynamics (MD) simulations have emerged as a crucial computational method for extracting the dynamics of a structure, both in its apo- or ligand-bound state (Fig. 1). MD simulations are widely utilized for analyzing the energy landscape, free energy profile, and several other equilibrium and kinetic parameters that can define the stability of any protein–ligand complex. Several sampling methods in MD are now available that accelerate thermodynamics calculations and decrease the energy barriers for a more effective sampling of conformational transitions. These enhanced sampling methods include umbrella sampling, J-walking, local elevation, conformational flooding, hyperdynamics, conformational space annealing, metadynamics (MetaD), and variationally enhanced sampling (VES).118
Molecular dynamics (MD) simulations of protein–ligand complexes
Top scoring docked complexes can be screened out and subjected to MD simulations using any of the widely used platforms, such as Gromacs, NAMD, or AMBER. There are several force fields (FFs) that are utilized depending on the study design, including AMBER, CHARMM, GROMOS, OPLS-AA, and AMOEBA.119 However, for studying protein–ligand complexes, CHARMM36 remains the most utilized FF in all-atomistic MD runs. Initially, the topologies and parameters of the ligands are extracted using either an inbuilt program, or else from CGenFF and PRODRG2 programs.120,121
Following this, a protein structure file (PSF) is generated using VMD or another similar tool. Next, the complex is solvated into a cubic box containing transferable intermolecular potential with 3 points (TIP3P) water molecules.122 In most cases, the box size is determined keeping a consistent separation of 10 Å between the protein surface and the edges of the periodic box. Further, steepest descent and or conjugate gradient algorithm-based energy minimization of the system is done for removing bad contacts and clashes. After this step, the minimized system is utilized for performing simulation runs using an NVT ensemble at a constant temperature achieved by the Nose–Hoover method.123 In general, a time step of 2 fs is kept during the production run of the simulation. The simulation trajectory and other parameters are visualized using independent scripts or VMD and similar programs.
Binding energy estimation of protein–ligand complexes
Prediction of the binding affinity remains the foremost parameter for analyzing and comparing protein–ligand complexes. Both free energy perturbation (FEP) and thermodynamic integration (TI) approaches utilize MD simulation runs for estimating the free energy by comparing the original and fully perturbed states of a complex.124 In the absolute binding free energy (ABFE) approach, a comparative assessment of the ligand in the bulk state and with the bound state (with the target protein) is done to obtain the free energy of binding.125 The potential of mean force (PMF) is another widely used methodology for predicting the binding affinity between the protein and ligand.126 This is based on relative translational and orientational degrees of freedom for the bound ligand and finding the minimum energy path for dissociation.
Apparently, a faster method for estimating the binding free energy has been devised based on molecular mechanics (MM) integrated with Poisson–Boltzmann (MM/PBSA) or with generalized Born and surface area solvation (MM/GBSA).127 In either case, a short sampling profile is created by taking snapshots from the MD trajectory and extracting the free energy differences between different snapshot ensembles. Following this, the solvation free energy is calculated by solving Poisson–Boltzmann or generalized Born equations through a linear relation with the solvent accessible surface area (SASA). Finally, the free energy equation, consisting of energy contributions from MM (bonded, electrostatic, and van der Waals energy), solvation energy (PBSA or GBSA) with temperature and entropy, is utilized to calculate the free energy of binding.
Structural stability analysis upon ligand binding
The structural stability of a protein upon ligand/lead molecule binding is crucial and dictates the anchoring of lead molecules inside the binding cavity. The root mean square deviation (RMSD) is the foremost parameter to analyze variations in the protein backbone after complex formation with the ligand molecule. The stabilities of the structural features are determined by comparing deviations of the RMSD for a ligand–protein complex and protein alone during the MD trajectory. Significant variations in the RMSD of a complex could mean an unstable conformation and destabilization.
Similarly, the RMSF (root mean square fluctuation) provides information on residue-wise fluctuations detected in a structure during the simulation. The RMSF for structural regions in proteins, including alpha-helix, 3/10helix, or beta-sheets, show the least variations and coils, while other unstructured regions show the highest deviations. Further, the overall compactness of the folded architecture of a protein is studied based on deviations in the radius of gyration (Rg). For globular, folded protein structures, Rg values remain low, while high variations are noted for destabilized protein complexes. Further, analyzing the solvent accessible surface area (SASA) of the protein also provides important insights into the local unfolding or burying of residues. SASA values increase with any perturbations in secondary structural elements caused by ligand binding, temperature, or denaturing milieu. Any correlation with secondary structural changes is obtained by cataloging fractional changes in different secondary structural elements (α, β, or coil) in the simulation trajectory.
The structural changes after ligand binding have been studied extensively using MD simulation in a variety of protein targets, as evident from several reports in the last few years.128–131l-Asparaginases (l-Asn) represents an excellent example where binding of the substrate molecule in the active site reorganizes a rigid-β hairpin loop into a disordered and mobile loop that aids catalysis and product release. The loop reorganization of a hyperthermophile derived l-Asn was initially studied using MD simulations, and the results were later substantiated by X-ray crystallography. Later, a similar loop remodeling phenomenon was also found in l-Asn enzymes present in pathogenic microbes, including Leishmania donovani and Mycobacterium tuberculosis.45–47,104 The structural changes were identified by comparing RMSD and RMSF variations brought about by a tyrosine residue in the loop region. These structural insights were utilized in the screening of novel lead molecules that could target l-Asn in both pathogens. Specific active site-based inhibitor molecules were identified using the substrate-based DBVS protocol through the TCM database, ZINC library, and an FDA-approved drug database.
Another report provided crucial insights into how hetero-aromatic stacking might modulate toxic amyloid formation in peptides and proteins associated with various neuropathies.57 This report evaluated islet amyloid polypeptide (IAP) amyloid assembly alone (control) and in the presence of ethidium bromide (EtBr), doxorubicin (Dxr), and mitoxantrone (Mtx). Free energy landscape (FEL) projections showed that the association of EtBr and Dxr induced a partially condensed conformational state, with the higher Rg resulting in destabilization and unpacking of the fibril assembly. In contrast, the presence of Mtx did not induce any significant alteration in the conformational space, with a compact fibril assembly-like control.
Non-covalent bonding analysis and essential dynamics
The heteromolecular interactions of lead molecules within the cavity of the target protein involves non-covalent bonds. Non-covalent bonding determines the stability of the protein–lead molecule complex formation and majorly includes hydrogen bonds, hydrophobic interactions, π–π stacking, and salt bridges.132 In all cases, distance and bonding angle cut-offs are applied for identifying potential bonds among the protein or ligand residues. Additionally, van der Waals and electrostatic interactions are also reported for protein–ligand interactions. The stability of these non-covalent interactions are assessed based on quantifying their presence during the entire MD trajectory. Longer occupancies of certain non-covalent interactions strongly suggest their importance in ligand stabilization inside the cavity.133
Further, contact maps of the protein and lead molecules are also analyzed to find residues that interact with and stabilize the ligand. A recent MD simulation-based report investigated the drug candidate AG10, its derivatives, and the drug tafamidis binding with familial amyloidal cardiomyopathy associated mutant transthyretin protein.134 The authors showed that the AG10 ligands formed stable hydrogen bonds during the MD simulation run. Interestingly, both the removal of a carboxylate group and insertion of a methyl group in AG-10 disrupted the hydrogen bonding and produced conformational change.
Finally, analysis of the motions inside a target protein during MD simulation, termed as essential dynamics (ED), can also provide crucial structural insights. ED studies incorporate principal component analysis (PCA) for describing the comprehensive motion of a protein resulting from atomic fluctuations. These atomic fluctuations in the structure are compiled into a covariance matrix for predicting eigen values that affect the overall motion of the protein under apo- or liganded states. An excellent example of ED analysis was provided in a recent report where the dynamics of the CDK8–CycC (cyclin-dependent kinase 8–cyclin C) system implicated in several pathological conditions was studied.135 The authors reported all α-carbon motions associated with the MD trajectory were reduced to its principal components (PC), and the first PC mode was then analyzed as it consisted of the largest variation. They found two significant motions affecting ligand binding and unbinding: one involving bending and unbending at the hinge region connecting N- and C-lobes and the other involving bending and rotational motions between CDK8 and CycC. Overall, ED analysis can provide well-sampled cluster conformations that facilitate a more precise analysis of the outcomes from DBVS pipelines.
Conclusions
Computational approaches have streamlined and economized the conventional drug-discovery pipeline. Combinatorial methodologies involving newly identified biomolecular structure and computational biology approaches have immensely aided the development of efficacious therapeutic molecules. Most of these integrated computational structural biology approaches, including gene expression and sequence/structure correlation-based target identification, molecular docking-based virtual screening (DBVS), and structure–activity relationship (SAR)-based ligand design and engineering, are now widely implemented by the pharmaceutical industry. Besides, the availability of faster experimental assays, cell, and animal models for validating the efficacy of identified lead molecules have further substantiated the utility of computational approaches. Currently, cryo-EM has obvious advantages for the structural analysis of macromolecular proteins, and the resolution level is the main consideration for whether cryo-EM can contribute to DBVS and lead discovery. Recently, small-molecule screening against cancer-related kinase pyruvate kinase isozymes M2 (PKM2) PKM2 followed by the cryo-EM structures of multiple compounds docked with PKM2 were determined, which further substantiating the applicability of DBVS and cryo-EM in the drug-discovery process.141
The availability of newer machine-learning and deep-learning algorithms, such as AlphaFold, which integrates biophysical knowledge about the protein structure and multi-sequence alignments, might result in more precise drug-discovery workflows.136 Finally, the next major challenge is to provide easier working interfaces for modern-day drug-discovery pipelines for both academia and industry.
Conflicts of interest
The authors declare no competing financial interests.
Acknowledgments
The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number ISP23-101. The authors acknowledge the Ministry of Education 2022R1F1A1074105 and Kyung Hee University in 2022 (KHU-20220787), Republic of Korea.
References
- Alberts B. The cell as a collection of protein machines: preparing the next generation of molecular biologists. Cell. 1998;92(3):291–294. doi: 10.1016/S0092-8674(00)80922-8. [DOI] [PubMed] [Google Scholar]
- Srivastava A. B. S. and Shankar J., Developments and Diversity of Proteins and Enzymes, in Metabolic Engineering for Bioactive Compounds, ed. V. S. A. Kalia, Springer, Singapore, 2017 [Google Scholar]
- Gonzalez M. W. Kann M. G. Chapter 4: Protein interactions and disease. PLoS Comput. Biol. 2012;8(12):e1002819. doi: 10.1371/journal.pcbi.1002819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scott D. E. Bayly A. R. Abell C. Skidmore J. Small molecules, big targets: drug discovery faces the protein-protein interaction challenge. Nat. Rev. Drug Discovery. 2016;15(8):533–550. doi: 10.1038/nrd.2016.29. [DOI] [PubMed] [Google Scholar]
- Uhlen M. Fagerberg L. Hallstrom B. M. Lindskog C. Oksvold P. Mardinoglu A. Sivertsson A. Kampf C. Sjostedt E. Asplund A. et al. Proteomics. Tissue-based map of the human proteome. Science. 2015;347(6220):1260419. doi: 10.1126/science.1260419. [DOI] [PubMed] [Google Scholar]
- Pinero J. Ramirez-Anguita J. M. Sauch-Pitarch J. Ronzano F. Centeno E. Sanz F. Furlong L. I. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020;48(D1):D845–D855. doi: 10.1093/nar/gkz1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szymanski P. Markowicz M. Mikiciuk-Olasik E. Adaptation of high-throughput screening in drug discovery-toxicological screening tests. Int. J. Mol. Sci. 2012;13(1):427–452. doi: 10.3390/ijms13010427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paul A., Translational and Reverse Pharmacology, in Introduction to Basics of Pharmacology and Toxicology, ed. G. R. R. Raj, Springer, Singapore, 2019 [Google Scholar]
- Wouters O. J. McKee M. Luyten J. Estimated Research and Development Investment Needed to Bring a New Medicine to Market, 2009-2018. JAMA, J. Am. Med. Assoc. 2020;323(9):844–853. doi: 10.1001/jama.2020.1166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Venien-Bryan C. Li Z. Vuillard L. Boutin J. A. Cryo-electron microscopy and X-ray crystallography: complementary approaches to structural biology and drug discovery. Acta Crystallogr., Sect. F: Struct. Biol. Commun. 2017;73(Pt 4):174–183. doi: 10.1107/S2053230X17003740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lengyel J. Hnath E. Storms M. Wohlfarth T. Towards an integrative structural biology approach: combining Cryo-TEM, X-ray crystallography, and NMR. J. Struct. Funct. Genomics. 2014;15(3):117–124. doi: 10.1007/s10969-014-9179-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lionta E. Spyrou G. Vassilatis D. K. Cournia Z. Structure-based virtual screening for drug discovery: principles, applications and recent advances. Curr. Top. Med. Chem. 2014;14(16):1923–1938. doi: 10.2174/1568026614666140929124445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szklarczyk D. Santos A. von Mering C. Jensen L. J. Bork P. Kuhn M. STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Res. 2016;44(D1):D380–D384. doi: 10.1093/nar/gkv1277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wishart D. S. Knox C. Guo A. C. Shrivastava S. Hassanali M. Stothard P. Chang Z. Woolsey J. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34(Database issue):D668–D672. doi: 10.1093/nar/gkj067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newman D. J. Cragg G. M. Natural Products as Sources of New Drugs from 1981 to 2014. J. Nat. Prod. 2016;79(3):629–661. doi: 10.1021/acs.jnatprod.5b01055. [DOI] [PubMed] [Google Scholar]
- Talele T. T. Khedkar S. A. Rigby A. C. Successful applications of computer aided drug discovery: moving drugs from concept to the clinic. Curr. Top. Med. Chem. 2010;10(1):127–141. doi: 10.2174/156802610790232251. [DOI] [PubMed] [Google Scholar]
- Torres P. H. M. Sodero A. C. R. Jofily P. Silva Jr. F. P. Key Topics in Molecular Docking for Drug Design. Int. J. Mol. Sci. 2019;20(18):4574. doi: 10.3390/ijms20184574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kitchen D. B. Decornez H. Furr J. R. Bajorath J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discovery. 2004;3(11):935–949. doi: 10.1038/nrd1549. [DOI] [PubMed] [Google Scholar]
- Salmaso V. Moro S. Bridging Molecular Docking to Molecular Dynamics in Exploring Ligand-Protein Recognition Process: An Overview. Front. Pharmacol. 2018;9:923. doi: 10.3389/fphar.2018.00923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chandra N. Computational approaches for drug target identification in pathogenic diseases. Expert Opin. Drug Discovery. 2011;6(10):975–979. doi: 10.1517/17460441.2011.611128. [DOI] [PubMed] [Google Scholar]
- Dai Y. F. Zhao X. M. A survey on the computational approaches to identify drug targets in the postgenomic era. BioMed Res. Int. 2015;2015:239654. doi: 10.1155/2015/239654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pabon N. A. Xia Y. Estabrooks S. K. Ye Z. Herbrand A. K. Suss E. Biondi R. M. Assimon V. A. Gestwicki J. E. Brodsky J. L. et al. Predicting protein targets for drug-like compounds using transcriptomics. PLoS Comput. Biol. 2018;14(12):e1006651. doi: 10.1371/journal.pcbi.1006651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koleti A. Terryn R. Stathias V. Chung C. Cooper D. J. Turner J. P. Vidovic D. Forlin M. Kelley T. T. D'Urso A. et al. Data Portal for the Library of Integrated Network-based Cellular Signatures (LINCS) program: integrated access to diverse large-scale cellular perturbation response data. Nucleic Acids Res. 2018;46(D1):D558–D566. doi: 10.1093/nar/gkx1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shankavaram U. T. Varma S. Kane D. Sunshine M. Chary K. K. Reinhold W. C. Pommier Y. Weinstein J. N. CellMiner: a relational database and query tool for the NCI-60 cancer cell lines. BMC Genomics. 2009;10:277. doi: 10.1186/1471-2164-10-277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lamb J. Crawford E. D. Peck D. Modell J. W. Blat I. C. Wrobel M. J. Lerner J. Brunet J. P. Subramanian A. Ross K. N. et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313(5795):1929–1935. doi: 10.1126/science.1132939. [DOI] [PubMed] [Google Scholar]
- Subramanian A. Narayan R. Corsello S. M. Peck D. D. Natoli T. E. Lu X. Gould J. Davis J. F. Tubelli A. A. Asiedu J. K. et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell. 2017;171(6):1437–1452. doi: 10.1016/j.cell.2017.10.049. [DOI] [PMC free article] [PubMed] [Google Scholar]; , e1417
- Sirota M. Dudley J. T. Kim J. Chiang A. P. Morgan A. A. Sweet-Cordero A. Sage J. Butte A. J. Discovery and preclinical validation of drug indications using compendia of public gene expression data. Sci. Transl. Med. 2011;3(96):96ra77. doi: 10.1126/scitranslmed.3001318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Siu F. M. Ma D. L. Cheung Y. W. Lok C. N. Yan K. Yang Z. Yang M. Xu S. Ko B. C. He Q. Y. et al. Proteomic and transcriptomic study on the action of a cytotoxic saponin (Polyphyllin D): induction of endoplasmic reticulum stress and mitochondria-mediated apoptotic pathways. Proteomics. 2008;8(15):3105–3117. doi: 10.1002/pmic.200700829. [DOI] [PubMed] [Google Scholar]
- Hassane D. C. Guzman M. L. Corbett C. Li X. Abboud R. Young F. Liesveld J. L. Carroll M. Jordan C. T. Discovery of agents that eradicate leukemia stem cells using an in silico screen of public gene expression data. Blood. 2008;111(12):5654–5662. doi: 10.1182/blood-2007-11-126003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reka A. K. Kuick R. Kurapati H. Standiford T. J. Omenn G. S. Keshamouni V. G. Identifying inhibitors of epithelial-mesenchymal transition by connectivity map-based systems approach. J. Thorac. Oncol. 2011;6(11):1784–1792. doi: 10.1097/JTO.0b013e31822adfb0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang M. Smith S. Thorpe A. Barratt M. J. Karim F. Evaluation of phenoxybenzamine in the CFA model of pain following gene expression studies and connectivity mapping. Mol. Pain. 2010;6:56. doi: 10.1186/1744-8069-6-56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kunkel S. D. Suneja M. Ebert S. M. Bongers K. S. Fox D. K. Malmberg S. E. Alipour F. Shields R. K. Adams C. M. mRNA expression signatures of human skeletal muscle atrophy identify a natural compound that increases muscle mass. Cell Metab. 2011;13(6):627–638. doi: 10.1016/j.cmet.2011.03.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dudley J. T. Sirota M. Shenoy M. Pai R. K. Roedder S. Chiang A. P. Morgan A. A. Sarwal M. M. Pasricha P. J. Butte A. J. Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci. Transl. Med. 2011;3(96):96ra76. doi: 10.1126/scitranslmed.3002648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen F. Guan Q. Nie Z. Y. Jin L. J. Gene expression profile and functional analysis of Alzheimer's disease. Am. J. Alzheimers Dis. Other Demen. 2013;28(7):693–701. doi: 10.1177/1533317513500838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iorio F. Bosotti R. Scacheri E. Belcastro V. Mithbaokar P. Ferriero R. Murino L. Tagliaferri R. Brunetti-Pierri N. Isacchi A. et al. Discovery of drug mode of action and drug repositioning from transcriptional responses. Proc. Natl. Acad. Sci. U. S. A. 2010;107(33):14621–14626. doi: 10.1073/pnas.1000138107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shoemaker R. H. The NCI60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer. 2006;6(10):813–823. doi: 10.1038/nrc1951. [DOI] [PubMed] [Google Scholar]
- Chen X. Wong Y. K. Wang J. Zhang J. Lee Y. M. Shen H. M. Lin Q. Hua Z. C. Target identification with quantitative activity based protein profiling (ABPP) Proteomics. 2017;17:3–4. doi: 10.1002/pmic.201600212. [DOI] [PubMed] [Google Scholar]
- Gregori-Puigjane E. Setola V. Hert J. Crews B. A. Irwin J. J. Lounkine E. Marnett L. Roth B. L. Shoichet B. K. Identifying mechanism-of-action targets for drugs and probes. Proc. Natl. Acad. Sci. U. S. A. 2012;109(28):11178–11183. doi: 10.1073/pnas.1204524109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Modell A. E. Blosser S. L. Arora P. S. Systematic Targeting of Protein-Protein Interactions. Trends Pharmacol. Sci. 2016;37(8):702–713. doi: 10.1016/j.tips.2016.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kramer A. Green J. Pollard Jr. J. Tugendreich S. Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics. 2014;30(4):523–530. doi: 10.1093/bioinformatics/btt703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- von Mering C. Jensen L. J. Snel B. Hooper S. D. Krupp M. Foglierini M. Jouffre N. Huynen M. A. Bork P. STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res. 2005;33(Database issue):D433–D437. doi: 10.1093/nar/gki005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szklarczyk D. Gable A. L. Nastou K. C. Lyon D. Kirsch R. Pyysalo S. Doncheva N. T. Legeay M. Fang T. Bork P. et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49(D1):D605–D612. doi: 10.1093/nar/gkaa1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meena A. Yadav D. K. Srivastava A. Khan F. Chanda D. Chattopadhyay S. K. In silico exploration of anti-inflammatory activity of natural coumarinolignoids. Chem. Biol. Drug Des. 2011;78(4):567–579. doi: 10.1111/j.1747-0285.2011.01173.x. [DOI] [PubMed] [Google Scholar]
- Yadav D. K. Meena A. Srivastava A. Chanda D. Khan F. Chattopadhyay S. K. Development of QSAR model for immunomodulatory activity of natural coumarinolignoids. Drug Des., Dev. Ther. 2010;4:173–186. doi: 10.2147/dddt.s10875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singh J. Srivastava A. Jha P. Sinha K. K. Kundu B. L-Asparaginase as a new molecular target against leishmaniasis: insights into the mechanism of action and structure-based inhibitor design. Mol. BioSyst. 2015;11(7):1887–1896. doi: 10.1039/C5MB00251F. [DOI] [PubMed] [Google Scholar]
- Singh J. Khan M. I. Singh Yadav S. P. Srivastava A. Sinha K. K. Ashish Das P. Kundu B. L-Asparaginase of Leishmania donovani: Metabolic target and its role in Amphotericin B resistance. Int. J. Parasitol.: Drugs Drug Resist. 2017;7(3):337–349. doi: 10.1016/j.ijpddr.2017.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kataria A. Singh J. Kundu B. Identification and validation of l-asparaginase as a potential metabolic target against Mycobacterium tuberculosis. J. Cell. Biochem. 2019;120(1):143–154. doi: 10.1002/jcb.27169. [DOI] [PubMed] [Google Scholar]
- Khan R. H. Siddiqi M. K. Uversky V. N. Salahuddin P. Molecular docking of Abeta1-40 peptide and its Iowa D23N mutant using small molecule inhibitors: Possible mechanisms of Abeta-peptide inhibition. Int. J. Biol. Macromol. 2019;127:250–270. doi: 10.1016/j.ijbiomac.2018.12.271. [DOI] [PubMed] [Google Scholar]
- Jiang L. Liu C. Leibly D. Landau M. Zhao M. Hughes M. P. Eisenberg D. S. Structure-based discovery of fiber-binding compounds that reduce the cytotoxicity of amyloid beta. eLife. 2013;2:e00857. doi: 10.7554/eLife.00857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pujols J. Pena-Diaz S. Lazaro D. F. Peccati F. Pinheiro F. Gonzalez D. Carija A. Navarro S. Conde-Gimenez M. Garcia J. et al. Small molecule inhibits alpha-synuclein aggregation, disrupts amyloid fibrils, and prevents degeneration of dopaminergic neurons. Proc. Natl. Acad. Sci. U. S. A. 2018;115(41):10481–10486. doi: 10.1073/pnas.1804198115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ishibashi D. Nakagaki T. Ishikawa T. Atarashi R. Watanabe K. Cruz F. A. Hamada T. Nishida N. Structure-Based Drug Discovery for Prion Disease Using a Novel Binding Simulation. EBioMedicine. 2016;9:238–249. doi: 10.1016/j.ebiom.2016.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Srivastava A. Sharma S. Sadanandan S. Gupta S. Singh J. Gupta S. Haridas V. Kundu B. Modulation of prion polymerization and toxicity by rationally designed peptidomimetics. Biochem. J. 2017;474(1):123–147. doi: 10.1042/BCJ20160737. [DOI] [PubMed] [Google Scholar]
- Arya P. Srivastava A. Vasaikar S. V. Mukherjee G. Mishra P. Kundu B. Selective interception of gelsolin amyloidogenic stretch results in conformationally distinct aggregates with reduced toxicity. ACS Chem. Neurosci. 2014;5(10):982–992. doi: 10.1021/cn500002v. [DOI] [PubMed] [Google Scholar]
- Srivastava A. Arya P. Goel S. Kundu B. Mishra P. Fnu A. Gelsolin Amyloidogenesis Is Effectively Modulated by Curcumin and Emetine Conjugated PLGA Nanoparticles. PLoS One. 2015;10(5):e0127011. doi: 10.1371/journal.pone.0127011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Admane N. Srivastava A. Jamal S. Kundu B. Grover A. Protective Effects of a Neurohypophyseal Hormone Analogue on Prion Aggregation, Cellular Internalization, and Toxicity. ACS Chem. Neurosci. 2020;11(16):2422–2430. doi: 10.1021/acschemneuro.9b00299. [DOI] [PubMed] [Google Scholar]
- Rajput R. Balasubramani G. L. Srivastava A. Wahi D. Shrivastava N. Kundu B. Grover A. Specific keratinase derived designer peptides potently inhibit Aβ aggregation resulting in reduced neuronal toxicity and apoptosis. Biochem. J. 2019;476(12):1817–1841. doi: 10.1042/BCJ20190183. [DOI] [PubMed] [Google Scholar]
- Singh J. S. A. Sharma P. Pradhan P. Kundu B. DNA intercalators as amyloid assembly modulators: mechanistic insights. RSC Adv. 2017;7:493–506. doi: 10.1039/C6RA26313E. [DOI] [Google Scholar]
- Ginalski K. Comparative modeling for protein structure prediction. Curr. Opin. Struct. Biol. 2006;16(2):172–177. doi: 10.1016/j.sbi.2006.02.003. [DOI] [PubMed] [Google Scholar]
- Kashani-Amin E. Tabatabaei-Malazy O. Sakhteman A. Larijani B. Ebrahim-Habibi A. A Systematic Review on Popularity, Application and Characteristics of Protein Secondary Structure Prediction Tools. Curr. Drug Discovery Technol. 2019;16(2):159–172. doi: 10.2174/1570163815666180227162157. [DOI] [PubMed] [Google Scholar]
- Waterhouse A. Bertoni M. Bienert S. Studer G. Tauriello G. Gumienny R. Heer F. T. de Beer T. A. P. Rempfer C. Bordoli L. et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46(W1):W296–W303. doi: 10.1093/nar/gky427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kelley L. A. Mezulis S. Yates C. M. Wass M. N. Sternberg M. J. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 2015;10(6):845–858. doi: 10.1038/nprot.2015.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim D. E. Chivian D. Baker D. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 2004;32(Web Server issue):W526–W531. doi: 10.1093/nar/gkh468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roy A. Kucukural A. Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protoc. 2010;5(4):725–738. doi: 10.1038/nprot.2010.5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramachandran G. N. Ramakrishnan C. Sasisekharan V. Stereochemistry of polypeptide chain configurations. J. Mol. Biol. 1963;7:95–99. doi: 10.1016/S0022-2836(63)80023-6. [DOI] [PubMed] [Google Scholar]
- Eisenberg D. Luthy R. Bowie J. U. VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol. 1997;277:396–404. doi: 10.1016/s0076-6879(97)77022-8. [DOI] [PubMed] [Google Scholar]
- Sippl M. J. Recognition of errors in three-dimensional structures of proteins. Proteins. 1993;17(4):355–362. doi: 10.1002/prot.340170404. [DOI] [PubMed] [Google Scholar]
- Colovos C. Yeates T. O. Verification of protein structures: patterns of nonbonded atomic interactions. Protein Sci. 1993;2(9):1511–1519. doi: 10.1002/pro.5560020916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowie J. U. Luthy R. Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991;253(5016):164–170. doi: 10.1126/science.1853201. [DOI] [PubMed] [Google Scholar]
- Sigrist C. J. Cerutti L. Hulo N. Gattiker A. Falquet L. Pagni M. Bairoch A. Bucher P. PROSITE: a documented database using patterns and profiles as motif descriptors. Briefings Bioinf. 2002;3(3):265–274. doi: 10.1093/bib/3.3.265. [DOI] [PubMed] [Google Scholar]
- Lichtarge O. Bourne H. R. Cohen F. E. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 1996;257(2):342–358. doi: 10.1006/jmbi.1996.0167. [DOI] [PubMed] [Google Scholar]
- Cavallo A. Martin A. C. Mapping SNPs to protein sequence and structure data. Bioinformatics. 2005;21(8):1443–1450. doi: 10.1093/bioinformatics/bti220. [DOI] [PubMed] [Google Scholar]
- Ng P. C. Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31(13):3812–3814. doi: 10.1093/nar/gkg509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choi Y. Chan A. P. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics. 2015;31(16):2745–2747. doi: 10.1093/bioinformatics/btv195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grapov D. Wanichthanarak K. Fiehn O. MetaMapR: pathway independent metabolomic network analysis incorporating unknowns. Bioinformatics. 2015;31(16):2757–2760. doi: 10.1093/bioinformatics/btv194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Humphrey W. Dalke A. Schulten K. VMD: visual molecular dynamics. J. Mol. Graphics. 1996;14(1):33–38. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]; , 27–38
- Metz A. Ciglia E. Gohlke H. Modulating protein-protein interactions: from structural determinants of binding to druggability prediction to application. Curr. Pharm. Des. 2012;18(30):4630–4647. doi: 10.2174/138161212802651553. [DOI] [PubMed] [Google Scholar]
- Ren X. Shi Y. S. Zhang Y. Liu B. Zhang L. H. Peng Y. B. Zeng R. Novel Consensus Docking Strategy to Improve Ligand Pose Prediction. J. Chem. Inf. Model. 2018;58(8):1662–1668. doi: 10.1021/acs.jcim.8b00329. [DOI] [PubMed] [Google Scholar]
- Houston D. R. Walkinshaw M. D. Consensus docking: improving the reliability of docking in a virtual screening context. J. Chem. Inf. Model. 2013;53(2):384–390. doi: 10.1021/ci300399w. [DOI] [PubMed] [Google Scholar]
- Ghersi D. Sanchez R. Beyond structural genomics: computational approaches for the identification of ligand binding sites in protein structures. J. Struct. Funct. Genomics. 2011;12(2):109–117. doi: 10.1007/s10969-011-9110-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laskowski R. A. SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J. Mol. Graphics. 1995;13(5):323–330. doi: 10.1016/0263-7855(95)00073-9. [DOI] [PubMed] [Google Scholar]; , 307–328
- Hendlich M. Rippmann F. Barnickel G. LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J. Mol. Graphics Modell. 1997;15(6):359–363. doi: 10.1016/S1093-3263(98)00002-3. [DOI] [PubMed] [Google Scholar]; , 389
- Liu Y. Grimm M. Dai W. T. Hou M. C. Xiao Z. X. Cao Y. CB-Dock: a web server for cavity detection-guided protein-ligand blind docking. Acta Pharmacol. Sin. 2020;41(1):138–144. doi: 10.1038/s41401-019-0228-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu X. Xiong Y. Kihara D. Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0. Bioinformatics. 2015;31(5):707–713. doi: 10.1093/bioinformatics/btu724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashkenazy H. Abadi S. Martz E. Chay O. Mayrose I. Pupko T. Ben-Tal N. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 2016;44(W1):W344–W350. doi: 10.1093/nar/gkw408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skolnick J. Brylinski M. FINDSITE: a combined evolution/structure-based approach to protein function prediction. Briefings Bioinf. 2009;10(4):378–391. doi: 10.1093/bib/bbp017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wass M. N. Sternberg M. J. Prediction of ligand binding sites using homologous structures and conservation at CASP8. Proteins. 2009;77(Suppl 9):147–151. doi: 10.1002/prot.22513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J. Roy A. Zhang Y. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics. 2013;29(20):2588–2595. doi: 10.1093/bioinformatics/btt447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ngan C. H. Hall D. R. Zerbe B. Grove L. E. Kozakov D. Vajda S. FTSite: high accuracy detection of ligand binding sites on unbound protein structures. Bioinformatics. 2012;28(2):286–287. doi: 10.1093/bioinformatics/btr651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hernandez M. Ghersi D. Sanchez R. SITEHOUND-web: a server for ligand binding site identification in protein structures. Nucleic Acids Res. 2009;37(Web Server issue):W413–W416. doi: 10.1093/nar/gkp281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jimenez J. Doerr S. Martinez-Rosell G. Rose A. S. De Fabritiis G. DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics. 2017;33(19):3036–3042. doi: 10.1093/bioinformatics/btx350. [DOI] [PubMed] [Google Scholar]
- Pu L. Govindaraj R. G. Lemoine J. M. Wu H. C. Brylinski M. DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural network. PLoS Comput. Biol. 2019;15(2):e1006718. doi: 10.1371/journal.pcbi.1006718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richard A. M. Williams C. R. Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. Mutat. Res. 2002;499(1):27–52. doi: 10.1016/S0027-5107(01)00289-5. [DOI] [PubMed] [Google Scholar]
- Guan L. Yang H. Cai Y. Sun L. Di P. Li W. Liu G. Tang Y. ADMET-score – a comprehensive scoring function for evaluation of chemical drug-likeness. MedChemComm. 2019;10(1):148–157. doi: 10.1039/C8MD00472B. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiong G. Wu Z. Yi J. Fu L. Yang Z. Hsieh C. Yin M. Zeng X. Wu C. Lu A. et al. ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Res. 2021;49(W1):W5–W14. doi: 10.1093/nar/gkab255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Irwin J. J. Shoichet B. K. ZINC–a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 2005;45(1):177–182. doi: 10.1021/ci049714+. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pence H. E. W. A. ChemSpider: An Online Chemical Information Resource. J. Chem. Educ. 2010;11(87):1123–1124. doi: 10.1021/ed100697w. [DOI] [Google Scholar]
- Ruddigkeit L. van Deursen R. Blum L. C. Reymond J. L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 2012;52(11):2864–2875. doi: 10.1021/ci300415d. [DOI] [PubMed] [Google Scholar]
- Chen C. Y. TCM Database@Taiwan: the world's largest traditional Chinese medicine database for drug screening in silico. PLoS One. 2011;6(1):e15939. doi: 10.1371/journal.pone.0015939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vazquez J. Lopez M. Gibert E. Herrero E. Luque F. J. Merging Ligand-Based and Structure-Based Methods in Drug Discovery: An Overview of Combined Virtual Screening Approaches. Molecules. 2020;25(20):4723. doi: 10.3390/molecules25204723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Q. Shah S. Structure-Based Virtual Screening. Methods Mol. Biol. 2017;1558:111–124. doi: 10.1007/978-1-4939-6783-4_5. [DOI] [PubMed] [Google Scholar]
- Cheng T. Li Q. Zhou Z. Wang Y. Bryant S. H. Structure-based virtual screening for drug discovery: a problem-centric review. AAPS J. 2012;14(1):133–141. doi: 10.1208/s12248-012-9322-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singh J. Chuaqui C. E. Boriack-Sjodin P. A. Lee W. C. Pontz T. Corbley M. J. Cheung H. K. Arduini R. M. Mead J. N. Newman M. N. et al. Successful shape-based virtual screening: the discovery of a potent inhibitor of the type I TGFbeta receptor kinase (TbetaRI) Bioorg. Med. Chem. Lett. 2003;13(24):4355–4359. doi: 10.1016/j.bmcl.2003.09.028. [DOI] [PubMed] [Google Scholar]
- Sawyer J. S. Anderson B. D. Beight D. W. Campbell R. M. Jones M. L. Herron D. K. Lampe J. W. McCowan J. R. McMillen W. T. Mort N. et al. Synthesis and activity of new aryl- and heteroaryl-substituted pyrazole inhibitors of the transforming growth factor-beta type I receptor kinase domain. J. Med. Chem. 2003;46(19):3953–3956. doi: 10.1021/jm0205705. [DOI] [PubMed] [Google Scholar]
- Kataria A. Patel A. K. Kundu B. Distinct functional properties of secretory l-asparaginase Rv1538c involved in phagosomal survival of Mycobacterium tuberculosis. Biochimie. 2021;182:1–12. doi: 10.1016/j.biochi.2020.12.023. [DOI] [PubMed] [Google Scholar]
- Musoev A. Numonov S. You Z. Gao H. Discovery of Novel DPP-IV Inhibitors as Potential Candidates for the Treatment of Type 2 Diabetes mellitus Predicted by 3D QSAR Pharmacophore Models, Molecular Docking and de novo Evolution. Molecules. 2019;24(16):2870. doi: 10.3390/molecules24162870. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guedes I. A. Pereira F. S. S. Dardenne L. E. Empirical Scoring Functions for Structure-Based Virtual Screening: Applications, Critical Aspects, and Challenges. Front. Pharmacol. 2018;9:1089. doi: 10.3389/fphar.2018.01089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feher M. Consensus scoring for protein-ligand interactions. Drug Discovery Today. 2006;11(9–10):421–428. doi: 10.1016/j.drudis.2006.03.009. [DOI] [PubMed] [Google Scholar]
- Dias R. de Azevedo W. F. Molecular docking algorithms. Curr. Drug Targets. 2008;9(12):1040–1047. doi: 10.2174/138945008786949432. [DOI] [PubMed] [Google Scholar]
- Jones G. Willett P. Glen R. C. Leach A. R. Taylor R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 1997;267(3):727–748. doi: 10.1006/jmbi.1996.0897. [DOI] [PubMed] [Google Scholar]
- Friesner R. A. Banks J. L. Murphy R. B. Halgren T. A. Klicic J. J. Mainz D. T. Repasky M. P. Knoll E. H. Shelley M. Perry J. K. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 2004;47(7):1739–1749. doi: 10.1021/jm0306430. [DOI] [PubMed] [Google Scholar]
- Grosdidier A. Zoete V. Michielin O. Fast docking using the CHARMM force field with EADock DSS. J. Comput. Chem. 2011;32(10):2149–2159. doi: 10.1002/jcc.21797. [DOI] [PubMed] [Google Scholar]
- Grosdidier A. Zoete V. Michielin O. SwissDock, a protein-small molecule docking web service based on EADock DSS. Nucleic Acids Res. 2011;39(Web Server issue):W270–W277. doi: 10.1093/nar/gkr366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schneidman-Duhovny D. Inbar Y. Nussinov R. Wolfson H. J. PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res. 2005;33(Web Server issue):W363–W367. doi: 10.1093/nar/gki481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang W. Bell E. W. Yin M. Zhang Y. EDock: blind protein-ligand docking by replica-exchange monte carlo simulation. J. Cheminf. 2020;12(1):37. doi: 10.1186/s13321-020-00440-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gupta A. Gandhimathi A. Sharma P. Jayaram B. ParDOCK: an all atom energy based Monte Carlo docking protocol for protein-ligand complexes. Protein Pept. Lett. 2007;14(7):632–646. doi: 10.2174/092986607781483831. [DOI] [PubMed] [Google Scholar]
- Forli S. Huey R. Pique M. E. Sanner M. F. Goodsell D. S. Olson A. J. Computational protein-ligand docking and virtual drug screening with the AutoDock suite. Nat. Protoc. 2016;11(5):905–919. doi: 10.1038/nprot.2016.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trott O. Olson A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010;31(2):455–461. doi: 10.1002/jcc.21334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bernardi R. C. Melo M. C. R. Schulten K. Enhanced sampling techniques in molecular dynamics simulations of biological systems. Biochim. Biophys. Acta. 2015;1850(5):872–877. doi: 10.1016/j.bbagen.2014.10.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lopes P. E. Guvench O. MacKerell Jr. A. D. Current status of protein force fields for molecular dynamics simulations. Methods Mol. Biol. 2015;1215:47–71. doi: 10.1007/978-1-4939-1465-4_3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vanommeslaeghe K. Hatcher E. Acharya C. Kundu S. Zhong S. Shim J. Darian E. Guvench O. Lopes P. Vorobyov I. et al. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J. Comput. Chem. 2010;31(4):671–690. doi: 10.1002/jcc.21367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Aalten D. M. Bywater R. Findlay J. B. Hendlich M. Hooft R. W. Vriend G. PRODRG, a program for generating molecular topologies and unique molecular descriptors from coordinates of small molecules. J. Comput.-Aided Mol. Des. 1996;10(3):255–262. doi: 10.1007/BF00355047. [DOI] [PubMed] [Google Scholar]
- Jorgensen W. L. C. J. Madura J. D. Impey R. W. Klein M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983;2(79):926–935. doi: 10.1063/1.445869. [DOI] [Google Scholar]
- Braga C. Travis K. P. A configurational temperature Nose-Hoover thermostat. J. Chem. Phys. 2005;123(13):134101. doi: 10.1063/1.2013227. [DOI] [PubMed] [Google Scholar]
- Meng Y. Dashti D. S. Roitberg A. E. Computing Alchemical Free Energy Differences with Hamiltonian Replica Exchange Molecular Dynamics (H-REMD) Simulations. J. Chem. Theory Comput. 2011;7(9):2721–2727. doi: 10.1021/ct200153u. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aldeghi M. Heifetz A. Bodkin M. J. Knapp S. Biggin P. C. Accurate calculation of the absolute free energy of binding for drug molecules. Chem. Sci. 2016;7(1):207–218. doi: 10.1039/C5SC02678D. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lemkul J. A. Bevan D. R. Assessing the stability of Alzheimer's amyloid protofibrils using molecular dynamics. J. Phys. Chem. B. 2010;114(4):1652–1660. doi: 10.1021/jp9110794. [DOI] [PubMed] [Google Scholar]
- Genheden S. Ryde U. The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert Opin. Drug Discovery. 2015;10(5):449–461. doi: 10.1517/17460441.2015.1032936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hirano Y. Okimoto N. Fujita S. Taiji M. Molecular Dynamics Study of Conformational Changes of Tankyrase 2 Binding Subsites upon Ligand Binding. ACS Omega. 2021;6(27):17609–17620. doi: 10.1021/acsomega.1c02159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu Y. Zhao J. Chen Z. Insights into the Molecular Mechanisms of Protein-Ligand Interactions by Molecular Docking and Molecular Dynamics Simulation: A Case of Oligopeptide Binding Protein. Comput. Math. Methods Med. 2018;2018:3502514. doi: 10.1155/2018/3502514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sakkiah S. Kusko R. Pan B. Guo W. Ge W. Tong W. Hong H. Structural Changes Due to Antagonist Binding in Ligand Binding Pocket of Androgen Receptor Elucidated Through Molecular Dynamics Simulations. Front. Pharmacol. 2018;9:492. doi: 10.3389/fphar.2018.00492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. Cao Z. Hu G. Zhao L. Wang C. Wang J. Ligand-induced structural changes analysis of ribose-binding protein as studied by molecular dynamics simulations. Technol. Health Care. 2021;29(S1):103–114. doi: 10.3233/THC-218011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferreira de Freitas R. Schapira M: A systematic analysis of atomic protein-ligand interactions in the PDB. MedChemComm. 2017;8(10):1970–1981. doi: 10.1039/C7MD00381A. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith A. J. Zhang X. Leach A. G. Houk K. N. Beyond picomolar affinities: quantitative aspects of noncovalent and covalent binding of drugs to proteins. J. Med. Chem. 2009;52(2):225–233. doi: 10.1021/jm800498e. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morris K. F. Geoghegan R. M. Palmer E. E. George Jr. M. Fang Y. Molecular dynamics simulation study of AG10 and tafamidis binding to the Val122Ile transthyretin variant. Biochem. Biophys. Rep. 2020;21:100721. doi: 10.1016/j.bbrep.2019.100721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cholko T. Chen W. Tang Z. Chang C. A. A molecular dynamics investigation of CDK8/CycC and ligand binding: conformational flexibility and implication in drug discovery. J. Comput.-Aided Mol. Des. 2018;32(6):671–685. doi: 10.1007/s10822-018-0120-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jumper J. Evans R. Pritzel A. Green T. Figurnov M. Ronneberger O. Tunyasuvunakool K. Bates R. Zidek A. Potapenko A. et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roider H. G. Pavlova N. Kirov I. et al. Drug2Gene: an exhaustive resource to explore effectively the drug-target relation network. BMC Bioinf. 2014;15:68. doi: 10.1186/1471-2105-15-68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yoo M. Shin J. Kim J. Ryall K. A. Lee K. Lee S. Jeon M. Kang J. Tan A. C. DSigDB: drug signatures database for gene set analysis. Bioinformatics. 2015;31(18):3069–3071. doi: 10.1093/bioinformatics/btv313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krishna R. Wang J. Ahern W. Sturmfels P. Venkatesh P. Kalvet I. Lee G. R. Morey-Burrows F. S. Anishchenko I. Humphreys I. R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science. 2024:2528. doi: 10.1126/science.adl2528. [DOI] [PubMed] [Google Scholar]
- Lin Z. Akin H. Rao R. Hie B. Zhu Z. Lu W. Smetanin N. Verkuil R. Kabeli O. Shmueli Y. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379:1123–1130. doi: 10.1126/science.ade2574. [DOI] [PubMed] [Google Scholar]
- Saur M. Hartshorn M. J. Dong J. Reeks J. Bunkoczi G. Jhoti H. et al. Fragment-based drug discovery using cryo-EM. Drug Discovery Today. 2020;25(3):485–490. doi: 10.1016/j.drudis.2019.12.006. [DOI] [PubMed] [Google Scholar]