Skip to main content
Computational and Structural Biotechnology Journal logoLink to Computational and Structural Biotechnology Journal
. 2020 Feb 17;18:417–426. doi: 10.1016/j.csbj.2020.02.008

Exploring the computational methods for protein-ligand binding site prediction

Jingtian Zhao a, Yang Cao b,, Le Zhang a,
PMCID: PMC7049599  PMID: 32140203

Abstract

Proteins participate in various essential processes in vivo via interactions with other molecules. Identifying the residues participating in these interactions not only provides biological insights for protein function studies but also has great significance for drug discoveries. Therefore, predicting protein–ligand binding sites has long been under intense research in the fields of bioinformatics and computer aided drug discovery. In this review, we first introduce the research background of predicting protein–ligand binding sites and then classify the methods into four categories, namely, 3D structure-based, template similarity-based, traditional machine learning-based and deep learning-based methods. We describe representative algorithms in each category and elaborate on machine learning and deep learning-based prediction methods in more detail. Finally, we discuss the trends and challenges of the current research such as molecular dynamics simulation based cryptic binding sites prediction, and highlight prospective directions for the near future.

Keywords: Protein, Ligand binding site, Machine learning, Deep learning, Protein–ligand binding

1. Background and significance of protein ligand binding site research

Proteins are some of the most important elements for life. They are not only critical cellular components, but they also participate in various critical activities and processes in the life cycle of organisms, which can achieve or help achieve important biological functions. Proteins do not work independently in living organisms. They need to bind to other biomolecules or ions (such as metal ions, nucleic acids, inorganic or organic small molecules) to create specific interactions to achieve corresponding functions [1]. These molecules and ions are called ligands (Fig. 1). Particularly, intermolecular interactions between proteins and ligands, such as small compounds, occur via amino acid residues at specific positions in the protein, usually located in pocket-like regions. These specific key amino acid residues are called ligand binding sites (LBSs). LBSs have attracted much attention in the fields of molecular docking, drug-target interactions, compound design, ligand affinity prediction, and even molecular dynamics [2], [3], [4], [5], [6]. Thus, identification of LBSs not only helps to explore the mechanism of intermolecular interactions but also effectively explains the pathogenesis of diseases, which provides insights for drug discovery and design [7].

Fig. 1.

Fig. 1

3D schematic of a protein structure and its binding ligands generated from the PDB website. The protein shown above is the crystal structure of human deoxyhaemoglobin at 1.74 Å resolution, published on PDB (Access Code: 4HHB). The amplified ligand is [HEM (PROTOPORPHYRIN IX CONTAINING FE)] 142: C with its bonds (Hydrogen, Halogen, et al).

Compared with highly accurate but time-consuming biological experiments [8], the advantage of computational methods is that LBS predictions can be performed based on sequence and structure information without relying on annotating the biological function of protein binding residues [9]. In addition, combining multiple computational methods, or combining experimental methods with computational methods can improve both accuracy and efficiency of LBS prediction, provide valuable assistance for drug design and drug discovery researches [10], [11], [12], [13]. The emergence of Critical Assessment of Protein Structure Prediction (CASP) [14], Continuous Automated Model EvaluatiOn (CAMEO) projects [15], Critical Assessment of Function Annotation (CAFA) [16], PDB database [17], [18], and BioLip database [19] etc. have promoted the development of this field and provided some standard evaluation indicators and relatively unified concepts and definitions. According to the definition given in BioLip, if the distance between any one of the atoms in the ligand molecule and at least one of the atoms in the amino acid residue of the protein does not exceed the sum of the radii of these two atoms plus 0.5 Å, the amino acid residue is regarded as a ligand binding residue. Since the prediction of ligand binding residues is a typical dichotomy problem from an algorithmic point of view, the evaluation index for the prediction method in this field is very similar to the index for evaluating the accuracy of the dichotomy algorithm. The common LBS prediction indicators are sensitivity (Sen), accuracy (Acc), specificity (Spe), precision (Pre), and Mattheu’s correlation coefficients (MCC) [20], which are defined as below:

Sen=TPTP+FN (1)
Acc=TP+TNTP+FN+TN+FP (2)
Spe=TNTN+FP (3)
Pre=TPTP+FP (4)
MCC=TP·TN-FP·FNTP+FP·TP+FN·TN+FP·TN+FN (5)

where TPTruePositive indicates the number of samples to which the binding site is correctly predicted, TN(TrueNegative) indicates the number of samples in which the false binding site is correctly predicted, FP(FalsePositive) indicates the number of samples in which the binding site was incorrectly predicted, and FN(FalseNegative) indicates the number of samples in which the false binding site was incorrectly predicted [21], [22], [23], [24], [25].

In the last twenty years, under the promotion of CASP and other research goals, researchers have made great progress in the field of LBS predictions. A series of different prediction methods based on sequence information, structural templates, and three-dimensional structures have been developed. These methods employ various computational methods, including geometry or energy feature searching, sequence or structure similarity comparison, as well as machine learning related algorithms [26], [27], [28], [29], [30], [31]. Recently, deep learning-based methods have stood out from machine learning methods and have drawn much attention in computational biology [32], [33], [34]. Some state-of-the-art LBS prediction methods that employ machine learning and deep learning algorithms show significant advances over traditional methods [35], [36]. In this paper, we systematically introduce the background, principles, algorithms and performance of popular LBS prediction methods by clustering prediction methods into four groups according to their working principles. Particularly, this paper highlights the most recent progress in deep learning-based methods.

2. 3D structure-based LBS prediction methods

Most small ligand binding occurs in hollows or cavities on protein surfaces because high affinity can only be gained by sufficiently large interfaces [37]. This feature has been observed in spatial structures from many detailed studies of protein–ligand complexes in PDB [38]. Therefore, attempting to locate LBSs by searching for special geometry or energy features in protein structures has long been one of the most popular methods in this area. This method generally has two different implementations. One is to perform spatial geometric measurements on the protein structure to find hollows or cavities on the surface of the protein. The second is to place some probes on the surface of the protein and then to find the cavities by estimating the energy potentials between the probe and the cavities. Table 1 lists some published 3D structure-based LBS prediction methods.

Table 1.

Published 3D structure-based LBS prediction methods.

Method Type Feature Year
A computational procedure (with no specific name) [39] Probe Energy-based Contour surfaces at appropriate energy levels are calculated for each probe and displayed with the protein structure 1985
POCKET [27] Spatial Geometry Measurement Place spheres between atoms and surfaces of pockets are modeled using marching cubes algorithm 1992
SURFNET [40] Spatial Geometry Measurement Place spheres at the gap between any two protein atoms 1995
LIGSITE [26] Spatial Geometry Measurement Set up some regular 3D meshes to cover the target protein 1997
CAST [41] Spatial Geometry Measurement Calculate by using alpha shape and discrete flow theory 1998
CASTp [42], [43] Spatial Geometry Measurement Use alpha shape and the pocket algorithm [44] developed in computational geometry 2003
QSiteFinder [45] Probe Energy-based Use the interaction energy between the protein and a simple van der Waals probe 2005
LIGSITECSC[46] Spatial Geometry Measurement An extension and implementation of the LIGSITE algorithm by using the Connolly surface 2006
VISCANA [47] Probe Energy-based A total energy of the molecule is evaluated by summation of fragment energies and interfragment interaction energies 2006
Fpocket [48] Spatial Geometry Measurement Voronoi tessellation and alpha spheres are used to detect pockets 2009
SITEHOUND [28], [49] Probe Energy-based The carbon probe and phosphate probe used to detect interaction force between the probe and the protein 2009
MSPocket [50] Spatial Geometry Measurement Identify surface pocket regions according to the normal vector directions at the vertices on the surface 2010
FTSite [51] Probe Energy-based Use 16 different probes on these grids to detect free energy 2011
SiteComp [52] Probe Energy-based Discovery of subsites with different interaction properties and for fast calculations of residue contribution to binding sites 2012
LISE [53] Spatial Geometry Measurement Compute a score by counting geometric motifs extracted from substructures of interaction networks connecting protein and ligand atoms 2013
Patch-Surfer2. 0 [54] Spatial Geometry Measurement Represent and compare pockets at the level of small local surface patches that characterize physicochemical properties of the local regions 2014
CurPocket [55] Spatial Geometry Measurement Compute the curvature distribution of protein surface and identify the clusters of concave regions 2019

The basic idea of LBS prediction methods based on spatial geometry measurements is to locate large or even the largest hollow or cavity on the protein structure by calculating and simulating some certain geometric measures from the protein structure information. Researchers have come up with many different and creative ways to accomplish this over the past few decades.

The pioneer works of spatial geometry measurement were published in the 1990s [27], [40]. The idea of these methods is to place a sphere at the gap between any two protein atoms according to the three-dimensional coordinate information in the protein database to detect ligand binding residuals. In SURFNET [40], for example, the size of the sphere is adjusted to be tangent to the surface of the atom. If the sphere collides with other neighboring atoms, the sphere volume is reduced to ensure that no conflicts occur. The above process is repeated until all pairs of protein atoms have been considered. Finally, a set of spheres filled with gaps between protein atoms is found, thereby allowing the localization of large hollows or cavities in the protein molecule, which are regarded as possible ligand binding residues.

Later, in 1997, Manfred Hendlich et al. published LIGSITE [26], which sets up some regular 3D meshes to cover the target protein. Starting from each grid point, they scan a total of 7 directions, including the x, y, and z axes and the 4 grid diagonals, and then score the grid points. If both ends of the scan line in a certain direction are included in the protein area, the point may be in the pocket, and the grid point is added by one point. After all grid points have been scanned in all directions, the candidate ligand binding residues are determined based on the final score of each grid point. The main advantage of the LIGSITE method is its running speed, as its typical search time is between 5 and 20 s for proteins with medium sizes, so it is suitable for detecting LBSs for a large number of proteins.

The principle of the probe energy-based LBS prediction method is to first place a specific probe molecule on the protein to be tested and to measure the interaction energy signals between the probe molecule and the surrounding residues, and then to find pockets in the protein structure from the distribution of energy signal intensities. The probe energy-based prediction method usually employs different probe parameters or multiple probes at the same time to achieve better performance.

SITEHOUND is a classical probe energy-based LBS prediction method [49], [28]. The method uses a box with a grid that covers the entire target protein. A carbon probe and a phosphate probe are released to the grid points and the interaction forces between the molecules of each grid point probe and the protein are calculated. The grid points with higher interaction energies are extracted and further clustered. After mapping the grid points on to the residues, the potential LBSs are determined according to the clustered residues. A dataset that contains 77 experimentally determined protein structures with known protein–ligand complexes was used to test SITEHOUND, and the result showed that in 95% of the cases, the correct binding site was located in the top three clusters.

In 2011, Chi-Ho Ngan et al. released another probe energy-based LBS prediction method, FTSite [51].The basic idea for this method is to place a dense grid around the protein, spread 16 different small molecule probes on this grids, and use the objective free energy functions to determine the appropriate position. The probes are clustered and ranked according to the average free energy value. The overlapping sites clustered by different probes are ranked by the interactions between the probe and the protein. Amino acid residues that interact with the top cluster are regarded as possible ligand binding residues. FTSite empoyed LIGSITECSC set [46] and QSiteFinder set [45] to benchmark the method which achieved the accuracy rates of 94% and 97%, respectively.

3D structure-based LBS prediction methods have been widely used for years. However, these methods strongly depend on the state of the given protein 3D structure, which means that LBSs may not be discovered if the binding pocket does not exist in the apo state but is induced by protein–ligand interaction in the holo state. In many scenarios which lack the protein structures in holo states, those methods may not be valid.

3. Template similarity-based LBS prediction methods

Protein 3D structures provide geometry and energy clues for LBSs that allow us to make predictions using a single structure of a protein. If considering that proteins are not an independent molecule, but are evolved from others, structural or functional information can be transferred between homologous or structurally similar proteins. Hence, an LBS can be predicted using the known proteins as templates to obtain similar characteristics in the query protein. Template similarity-based LBS prediction methods mainly include two types: structure template-based methods and sequence template-based methods. Table 2 lists some template similarity-based LBS prediction methods that have been published in the last twenty years.

Table 2.

Published template similarity-based LBS prediction methods.

Method Type Feature Year
ConSurf [56] Sequence Template-based Phylogenetic relationships among the sequences and the similarity between the amino acids are taken into account 2003
A Sequence template-based approach with no specific name [57] Sequence Template-based An information-theoretic approach for estimating sequence conservation based on Jensen–Shannon divergence 2007
FINDSITE [58] Structure Template-based PROSPECTOR 3 threading algorithm and TMalign tool are used 2008
A two‐stage template‐based LBS prediction method [59] Structure Template-based Construct protein’s 3D model and use structural clustering of ligand‐containing templates on the predicted 3D model 2009
3DLigandSite [29] Structure Template-based MAMMOTH is used 2010
FunFOLD [60] Structure Template-based Use an automatic approach for cluster identification and residue selection 2011
COFACTOR [61] Structure and Sequence Template-based Use global-to-local sequence and structural comparison algorithm 2012
webPDBinder [62] Structure Template-based Search a protein structure against a library of known binding sites and a collection of control nonbinding pockets. 2013
S-SITE [31] Sequence Template-based Needleman–Wunsch algorithms are used 2013
TM-SITE [31] Structure and Sequence Template-based Mix Structure Template-based and Sequence Template-based method 2013

The basic idea of the structure template-based LBS prediction method is to search for the most similar proteins in databases that have been labeled with LBSs using a structure alignment algorithm and then to transfer the known LBS from the most similar proteins onto the query protein. This method takes advantage of the increasingly accumulated protein structure databases. It could be highly reliable if proteins are of significant structural similarity.

In 2008, a popular template-based ligand binding site prediction method, FINDSITE, was published [58]. For a given target protein sequence, FINDSITE uses the PROSPECTOR 3 threading algorithm [63], [64] to identify a structural template that binds to the ligand from the PDB database and overlays the template with the target protein using TMalign [65]. Then, the LBSs that bound to the structural template are clustered and ranked as predictions. FINDSITE achieved a 67.3% success rate with 75.5% ranking accuracy on protein models that have a less than 35% sequence identity to the closest template structure. Although the prediction accuracy is comparable to some 3D structure-based LBS prediction methods, it can make some very unique LBS discoveries.

Later, in 2010, Mark N. Wass et al. developed the 3DLigandSite prediction method [29]. 3DLigandSite first used MAMMOTH [66] to score the similarity between a target protein and structural templates, and the 25 template proteins with the highest similarity to the target protein structure and their corresponding ligand information were selected as templates. Similar to FINDSITE, these templates are overlaid with the target protein, and these overlaid ligands are clustered using the Single linkage clustering algorithm. The cluster with the most template ligands was chosen as the basis for the prediction of the LBS. The performance of 3DLigandSite has been tested on CASP8 [67] targets with a set of 617 proteins from the FINDSITE test set and achieved an MCC of 0.64, a coverage of 71%, and an accuracy of 60%.

Up to now (December 21, 2019), 158787 protein structures have been published in the PDB [38]. However, for a large number of proteins, it is still impossible to detect their LBS using the above methods. Meanwhile, with the continuous development of sequencing technology, a huge number of protein sequences are published every year. Therefore, sequence template-based LBS prediction methods have received extensive attention. The basic idea of sequence template-based LBS prediction methods is similar to the structure template-based LBS prediction methods, that is, the alignment tool is used to align the sequence of the protein to be tested with the sequence of the known protein, and then, the template is selected according to the similarity. Finally, the ligand-binding residues of the protein to be tested are presumed by referring the known ligand-binding residues on the aligned regions.

In 2013, Yang Zhang's team published a ligand binding site prediction method called S-SITE [31], which employs the Needleman–Wunsch algorithm [68] to align the query protein to each of the proteins in the BioLip [19] database and screens similar sequences from the query protein according to the alignment result. The residues of the query protein are aligned with the template protein residues which were annotated as binding residues. Consensus voting is used to score the alignment results of the templates. Residues that received more than 25% of the votes were considered an LBS. S-SITE achieved both an MCC and Pre of 0.45 on the test datasets.

Hybrid methods have been proposed to further improve LBS predictions. A representative algorithm, TM-SITE [31], mixes the structure template-based and the sequence information-based prediction methods. The TMalign algorithm is first used to align the protein to be tested with the known template proteins. The evolutionary information of the sequence and the spatial distance information of the structure are combined to form a comprehensive scoring function to score the similarity of each template protein, and the qualified template proteins are screened from the BioLip database according to the scoring results. Finally, the ligand-binding residues of the protein being tested are predicted based on these eligible templates. TM-SITE achieved an MCC of 0.51 and Pre of 0.59 on the test datasets.

4. Traditional machine learning-based LBS prediction methods

The continuous development of computer technology has promoted the application of artificial intelligence-related theories and algorithms to other fields. In the study of protein LBS predictions, 3D structure-based and template similarity-based prediction methods have shown complementary advantages to LBS predictions. How to integrate that information and further improve the prediction accuracy is one of the urgent questions of this area. Many researchers try to use machine learning algorithms not only for carrying out LBS predictions but also for the binding affinity research, which has caused significant breakthroughs. Table 3 lists some traditional machine learning-based LBS prediction methods and a few related binding affinity research methods published in recent years. However, to focus the topic, we only detail a few representative LBS prediction methods listed above. Binding affinity related methods are elaborated on in the discussion.

Table 3.

Traditional machine learning-based LBS prediction and binding affinity research methods.

Method Machine Learning Algorithm Year
Knowledge-based QSAR approach [69] Kernel-Partial Least Squares (K-PLS) [70] 2004
Multi-RELIEF [71] RELIEF algorithm [72] 2007
SFCscore [73] multiple linear regression partial least squares analysis 2008
ATPint [74] Support Vector Machine 2009
ConCavity [75] K-Means algorithm 2009
MetaPocket [76] hierarchical clustering algorithm [77] 2009
RF-Score [4] The Random Forest algorithm 2010
MetaDBSite [78] Support Vector Machine 2011
NsitePred [79] Support Vector Machine 2011
NNSCORE [80], [81] Artificial Neural Network (shallow neural network [82]) 2011
L1pred [30] L1-Logreg Regression classifier 2012
TargetS [83] Support Vector Machine 2013
eFindSite [84] Support Vector Machine 2013
VitaPred [85] Support Vector Machine 2013
COACH [31] Support Vector Machine 2013
LigandRFs [86] The Random Forest algorithm 2014
OSML [87] Support Vector Machine 2015
LigandDSES [88] The Random Forest algorithm 2015
PRANK [89] The Random Forest algorithm 2015
A method for protein‐ligand binding affinity prediction [90] Gradient Boosting Regressor [91] 2018
SAnDReS [92] Regression Analysis 2016
P2Rank [93] The Random Forest algorithm 2018
COACH-D [94] Support Vector Machine 2018
Taba [95] Regression Analysis 2019

As mentioned earlier, predicting protein ligand binding sites is a typical dichotomous problem from a mathematical point of view, and there is a state of sample imbalance. Among the many classic machine learning algorithms that can implement the dichotomy, the naive Bayesian algorithm needs to calculate the prior probability and does not apply to data with a correlation between samples. Although the logistic regression is simple to implement, its accuracy is poor because it tends to under-fit characteristics. Besides, although the KNN algorithm is fast and has low training costs, the classification effect is poor under the sample imbalance situation. Therefore, a support vector machine (SVM) stands out from many traditional machine learning algorithms by virtue of its high classification accuracy, strong generalization ability, and excellent classification ability for high-dimensional small sample data. It has become the most popular machine learning method in the field of LBS predictions. As demonstrated in Fig. 2, SVM is a supervised learning algorithm that classifies data by solving hyperplanes that can binarily classify data in space. In the past few years, SVM-based prediction methods have been published. Three representative methods are introduced below.

Fig. 2.

Fig. 2

A simple schematic of SVM A hyperplane divides the points into two categories.

In 2011, Jingna Si et al. developed the MetaDBSite server [78], relying on sequence information to predict protein-DNA binding residues. MetaDBSite uses SVM to integrate the results of the six predictive tools: DISIS [96], DNABindR [97], BindN [98], BindN-rf [99], DP-Bind [100] and DBS-PRED [101]. The final output is superior to any single prediction method. The prediction results returned by DISIS, DNABindR, BindN, and BindN-rf are the main input parameters of SVM, while DP-Bind and DBS-PRED provide smaller score effects as auxiliary parameters. MetaDBSite achieved ACC, Spe, Sen of 0.77 and MCC of 0.32 on a test set, which is better than any of the single methods it combined.

In 2011, Ke Chen et al. published the NsitePred algorithm [79], which predicted the five most common nucleotide residues in the PDB database. The main steps of the NsitePred algorithm are to first extract the secondary structure, relative solvent accessibility and dihedral angles, determine the PSSM profile and other information from a given protein sequence to be tested, and use sliding window technology to process the information to generate an eigenvector describing the residue. These eigenvectors are used as inputs to the SVM to obtain a classification model. The model is used to predict the protein, and the SVM-based prediction results are combined with the BLAST [102] results as the final output. In the benchmarks, NsitePred showed better performance over ATPint [74] and GTPbinder [103].

In 2013, Yang Zhang's team published the SVM-based prediction method COACH [31]. It combines the structure template-based and sequence information-based prediction methods S-SITE and TM_SITE with the prediction results of the three methods of the new COFACTOR [104], FINDSITE [58], and ConCavity [75] as eigenvectors to the SVM for training and to form a classification model, and finally uses this classification model to output the prediction results. The benchmark results show that COACH outperforms other classical prediction algorithms (MCC=0.54 and Pre=0.59), making it the most popular protein LBS prediction method over the past few years.

5. Deep learning-based LBS prediction methods

In 2006, deep learning led the third wave of artificial intelligence [105], which far surpassed traditional machine learning in text classification, speech recognition, semantic modeling, image recognition, image segmentation and computer vision [106], [107], [108], [109]. In some areas, it has even surpassed the human brain [110] and has become the most popular research branch in the field of machine learning. Therefore, an increasing number of researchers have seen the possibility of using deep learning techniques to solve complex problems in the fields of bioinformatics and medical research, such as small-compound-drug discovery, activity prediction, chemical structure design, bioimaging, and medical imaging-based diagnosis [35], [90], [111], [112], [113], [114].

Deep learning is a complex machine learning technique that simulates the learning mechanism of the human brain by building and simulating the neural networks in the human brain and uses this mechanism to interpret data. Deep learning is mainly implemented in three ways: convolutional neural networks (CNNs), deep belief networks (DBNs) and self-encoding neural networks. Among them, CNN is the most popular approach used in fields other than computer science since it is relatively simple to use and generalize. CNN is a kind of feedforward neural network. Similar to traditional artificial neural networks (ANNs) [115], CNN is also composed of multiple neurons and each of them does a part of the calculation base on a part of the input and give a part of the output, as below:

fwixi+b (6)

where x is the input, w is a set of weights, and b is the bias. f(x) is the activation function, which makes the neural network approximate the nonlinear function so that the network can be used in a nonlinear model. As described in Fig. 3, CNNs are mainly composed of three layers: the convolution layer, the pooling layer and the fully connected layer. The convolutional layer is used to extract different local features of the input; it consists of several convolutional units, and the parameters of each convolutional unit are optimized by backpropagation [116]. The pooling layer cuts the high dimensional local features obtained by convolutional layers into several regions and calculates the maximum value or the average value of them so that new low dimensional features can be generated. Finally, the fully connected layer combines all the local features into global features and calculates the score for each final class.

Fig. 3.

Fig. 3

A simple model of a convolutional neural network Hidden Layers are used to generate the classification result (multiple convolutional layers and pooling layers can be set in a CNN).

DBN is a highly scalable deep neural network, it consists of multiple layers of Restricted Boltzmann Machine (RBM) [117], which is used to learn a probability distribution of the inputs. The DBN training process can be divided into two main steps: First, unsupervised training is performed for each layer of RBM independently. Then, a supervised classifier is set after the last layer of RBM to receive the output features of RBMs and generate classification results. The structure of DBNS is shown in Fig. 4.

Fig. 4.

Fig. 4

A simple demonstration of deep belief network DBNs are constructed by combining multiple RBMs. Training of DBNs is performed layer by layer. The hidden layer is first inferred from the data vector, and this hidden layer is used as the input data vector of the next layer.

In the past two years, some protein LBS prediction methods using deep learning techniques have been reported. Developing new deep learning-based prediction method has become a new hotspot in LBS prediction. Table 4 lists some deep learning-based LBS prediction methods and related studies. Some representative LBS prediction methods or LBS highly related methods are introduced below.

Table 4.

Deep learning-based LBS prediction and binding affinity research methods.

Method Main Goal Network Type Year
A deep learning framework for modeling structural features of RNA-binding protein targets [118] Binding references modeling of RNA-binding proteins DBN 2015
DeepBind [119] Sequence specificities prediction of DNA- and RNA-binding proteins CNN 2015
DeepDTA [3] Drug-target interaction identification CNN 2018
KDEEP[120] Protein-ligand binding affinity prediction CNN 2018
DEEPSite [36] LBS Prediction CNN 2017
DeepCSeqSite [121] LBS Prediction CNN 2019
DeepConv-DTI [122] Drug-target interaction identification CNN 2019
DeepDrug3D [35] Binding pockets characterization and classification CNN 2019
Onionnet [123] Protein-ligand binding affinity prediction CNN 2019

In 2017, J Jiménez et al. developed the DEEPSite algorithm [36] for predicting binding sites for protein ligands. The basic idea of the algorithm is to treat the protein structure as a three-dimensional image and discretize it into a mesh with certain size voxels. A series of atomic attributes, such as hydrophobicity and hydrogen bond acceptors or donors, are used as features to calculate the occupancy of each attribute on each voxel. Finally, subgrids of a certain size are sampled, and the features of the subgrid are used as inputs to the convolutional neural network. The probability of the site being labeled a binding site is output. DEEPSite was compared with Fpocket and Concavity on the same test dataset, and the result indicated that DEEPSits outperforms other methods.

In 2019, Yifeng Cui et al. developed the DeepCSeqSite algorithm [121], which used the seven characteristics of the position-specific score matrix, relative solvent accessibility, secondary structure, the dihedral angle, conservation scores, residue type and position embeddings to construct the eigenspace. Each residue in the amino acid sequence is embedded in the eigenspace such that the amino acid sequence is converted to a feature map, and then the map is used as an input to the convolutional neural network. The output of the network is the predicted result of protein ligand binding residues. Instead of using any template, including the three-dimensional structure, DeepCSeqSite directly predicts the binding sites of protein ligands. Its performance on test datasets is significantly better than COACH, the most accurate SVM-based prediction method mentioned above.

Recently, Ingoo Lee et al. reported the DeepConv-DTI prediction model [122] to identify interactions between drugs and targets. The idea of the model is to input the entire protein sequence into a convolutional neural network, convolve the various amino acid subsequences of the protein to capture how the protein matches the local residue pattern participating in the DTI, and use that as the input to the higher layer network to build the model and extract features. The new features will connect the model to the drug signature and predict the likelihood of DTI through a higher fully connected layer in the network. By further optimizing the model, it achieves better predictions of performance. Through the model, new features will be linked to drug characteristics and predict the likelihood of DTI through a higher fully connected layer in the network. Finally, the model is further optimized to achieve better predictive performance. As a result, the local features detected by DeepConv-DTI show better performance than other protein descriptors, such as CTD and SW scores according to the authors.

In 2019, Limeng Pu et al. presented DeepDrug3D [35], a new deep learning-based binding pockets characterization and classification algorithm, which can classify nucleotide- and heme-binding sites by learning the patterns of specific molecular interactions between ligands and their protein targets. First, the ligand–protein complexes are converted into 3D pocket grids, and the physicochemical properties of binding pockets are considered and characterized. These 3D pocket grids are then voxelized into a 3D image with 14 channels. These voxels are used as inputs for a designed convolutional neural network to get the classification result. DeepDrug3D was tested on the PDB dataset of nucleotide- and heme-binding sites and achieved an accuracy of 95%, which is much better than volume- and shape-based approaches.

6. Discussion

From the long history of LBS prediction methods, we have seen that the research focus of LBS predictions has shifted from analyzing simple 3D structure features and sequence/structure similarities to the integration of multiple features. Machine learning algorithms [21], [22], [24], [124], [125], [126], [127], [128], [129], [130] have played a critical role in this process. Particularly, the application of deep learning algorithms has begun to show great value in LBS predictions. Furthermore, information about binding affinity and crystal structures can be used as inputs to machine learning or deep learning algorithms to help complete the LBS prediction, which makes LBS predictions more closely integrated with areas such as affinity prediction and molecular docking [23], [131].

With the continuous publication of more excellent machine learning and deep learning-based LBS prediction methods, other biological studies using these methods, such as protein structure and function prediction, protein–protein interaction site prediction, and drug design, have also made new breakthroughs [132], [133], [134], [135], [136], [137]. For instance, in 2015, COACH was used in drug design studies targeting MARK4 regulatory enzymes related to cancer, type 2 diabetes and many other diseases [138]. In 2019, DeepDTA was used to research protein kinases to help develop a predictive model which can estimate kinase-ligand pKi values [139].

New solutions often bring new challenges and problems while solving problems. Although deep learning-based LBS prediction methods have been used and applied in the past 2 years, there are still some problems and deficiencies to this type of solution. A key problem is that deep learning algorithms often require extremely high training costs (expensive computing resources, huge training sets, etc.) compared with traditional machine learning algorithms [140], [141].

Studies have also been inconclusive about whether deep learning approaches are always superior to traditional machine learning algorithms in all cases. In fact, traditional machine learning algorithms and even some 3D structure-based binding affinity prediction methods are constantly being optimized. For instance, some methods can predict binding affinity based on the known crystal structure of a specific ligand or a protein can accurately identify the key LBS [131], [142], [143], [144]. Additionally, the performance of deep learning algorithms is similar to traditional machine learning algorithms in some cases with low dimensional or small amounts of data. Thus, how to take advantage of deep learning to obtain the best solution for LBS predictions in the near future is still an open question.

In addition, researchers also think that the series of LBS prediction methods mentioned in the article cannot completely solve the problem of LBS detection since there exist some cryptic sites that are not evident in the unbound protein but form upon ligand binding [145]. Conformational change is critical to reveal these cryptic sites. Thus, detecting cryptic binding sites has received lots of attention in the past few years, and molecular dynamics simulations have become one of the most popular methods for conformational sampling in this field [2], [5], [146], [147], [148]. For instance, Bowman and Geissler built Markov state models from molecular dynamics (MD) simulations that can identify prospective cryptic sites [149], and a series of studies have been carried out by Gorfe’s team to find hidden binding sites in Ras proteins using probe‐based molecular dynamics simulations [150], [151], [152], [153]. We believe that in the future, the advanced machine learning or deep learning approaches together with protein conformational sampling technique is also likely to become a new development direction in the field of LBS prediction.

CRediT authorship contribution statement

Jingtian Zhao: Investigation, Writing - original draft. Yang Cao: Conceptualization, Resources, Writing - review & editing. Le Zhang: Writing - review & editing, Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by National Natural Science Foundation of China [number 61372138 and 81973243], and the National Science and Technology Major Project [2018ZX10201002].

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.csbj.2020.02.008.

Contributor Information

Yang Cao, Email: cao@scu.edu.cn.

Le Zhang, Email: zhangle06@scu.edu.cn.

Appendix A. Supplementary data

The following are the Supplementary data to this article:

Supplementary data 1
mmc1.xml (270B, xml)

References

  • 1.Chen K., Mizianty M.J., Kurgan L. Proteome science. BioMed Central; 2011. ATPsite: sequence-based prediction of ATP-binding residues. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Durrant J.D., McCammon J.A. Molecular dynamics simulations and drug discovery. BMC Biol. 2011;9:71. doi: 10.1186/1741-7007-9-71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Öztürk H., Özgür A., Ozkirimli E. DeepDTA: deep drug–target binding affinity prediction. Bioinformatics. 2018;34:i821–i829. doi: 10.1093/bioinformatics/bty593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ballester P.J., Mitchell J.B. A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics. 2010;26:1169–1175. doi: 10.1093/bioinformatics/btq112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Seco J., Luque F.J., Barril X. Binding site detection and druggability index from first principles. J Med Chem. 2009;52:2363–2371. doi: 10.1021/jm801385d. [DOI] [PubMed] [Google Scholar]
  • 6.Heo L., Shin W.-H., Lee M.S., Seok C. GalaxySite: ligand-binding-site prediction by using molecular docking. Nucleic Acids Res. 2014;42:W210–W214. doi: 10.1093/nar/gku321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  • 8.Vajda S., Guarnieri F. Characterization of protein-ligand interaction sites using experimental and computational methods. Curr Opin Drug Discov Devel. 2006;9:354. [PubMed] [Google Scholar]
  • 9.Marrone T.J., Briggs A., James M., McCammon J.A. Structure-based drug design: computational advances. Annual Rev Pharmacol Toxicol. 1997;37:71–90. doi: 10.1146/annurev.pharmtox.37.1.71. [DOI] [PubMed] [Google Scholar]
  • 10.Kubinyi H. Combinatorial and computational approaches in structure-based drug design. Curr Opin Drug Discov Devel. 1998;1:16–27. [PubMed] [Google Scholar]
  • 11.Zhang Z., Li Y., Lin B., Schroeder M., Huang B. Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction. Bioinformatics. 2011;27:2083–2088. doi: 10.1093/bioinformatics/btr331. [DOI] [PubMed] [Google Scholar]
  • 12.Tong A.H.Y. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science. 2002;295:321–324. doi: 10.1126/science.1064987. [DOI] [PubMed] [Google Scholar]
  • 13.Henrich S., Salo-Ahen O.M., Huang B., Rippmann F.F., Cruciani G., Wade R.C. Computational approaches to identifying and characterizing protein binding sites for ligand design. J Mol Recognit. 2010;23:209–219. doi: 10.1002/jmr.984. [DOI] [PubMed] [Google Scholar]
  • 14.Moult J., Pedersen J.T., Judson R., Fidelis K. A large-scale experiment to assess protein structure prediction methods. Proteins. 1995;23:ii–iv. doi: 10.1002/prot.340230303. [DOI] [PubMed] [Google Scholar]
  • 15.Haas J. (2013) The Protein Model Portal—a comprehensive resource for protein structure and model information. Database. 2013 doi: 10.1093/database/bat031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Radivojac P. A large-scale evaluation of computational protein function prediction. Nat Methods. 2013;10:221. doi: 10.1038/nmeth.2340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bernstein F.C. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977;112:535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
  • 18.Berman H.M., Bourne P.E., Westbrook J., Zardecki C. Protein structure. CRC Press; 2003. The protein data bank; pp. 394–410. [Google Scholar]
  • 19.Yang J., Roy A., Zhang Y. BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions. Nucleic Acids Res. 2012;41:D1096–D1103. doi: 10.1093/nar/gks966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Matthews B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta (BBA)-Protein Struct. 1975;405:442–451. doi: 10.1016/0005-2795(75)90109-9. [DOI] [PubMed] [Google Scholar]
  • 21.Zhang L. Computed tomography angiography-based analysis of high-risk intracerebral haemorrhage patients by employing a mathematical model. BMC Bioinf. 2019;20:193. doi: 10.1186/s12859-019-2741-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhang L., Dai Z., Yu J., Xiao M. CpG-Island-based annotation and analysis of human house-keeping genes. Briefings Bioinform. 2019 doi: 10.1093/bib/bbz134. [DOI] [PubMed] [Google Scholar]
  • 23.Li J., Fu A., Zhang L. An overview of scoring functions used for protein-ligand interactions in molecular docking. Interdiscip Sci. 2019;11:320–328. doi: 10.1007/s12539-019-00327-w. [DOI] [PubMed] [Google Scholar]
  • 24.Zhang L. Building up a robust risk mathematical platform to predict colorectal cancer. Complexity. 2017;2017:14. [Google Scholar]
  • 25.Xia Y. Exploring the key genes and signaling transduction pathways related to the survival time of glioblastoma multiforme patients by a novel survival analysis model. BMC Genomics. 2017;18:950. doi: 10.1186/s12864-016-3256-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hendlich M., Rippmann F., Barnickel G. LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model. 1997;15:359–363. doi: 10.1016/s1093-3263(98)00002-3. [DOI] [PubMed] [Google Scholar]
  • 27.Levitt D.G., Banaszak L.J. POCKET: a computer graphies method for identifying and displaying protein cavities and their surrounding amino acids. J Mol Graph. 1992;10:229–234. doi: 10.1016/0263-7855(92)80074-n. [DOI] [PubMed] [Google Scholar]
  • 28.Hernandez M., Ghersi D., Sanchez R. SITEHOUND-web: a server for ligand binding site identification in protein structures. Nucleic Acids Res. 2009;37:W413–W416. doi: 10.1093/nar/gkp281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wass M.N., Kelley L.A., Sternberg M.J. 3DLigandSite: predicting ligand-binding sites using similar structures. Nucleic Acids Res. 2010;38:W469–W473. doi: 10.1093/nar/gkq406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Dou Y., Wang J., Yang J., Zhang C. L1pred: a sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier. PLoS ONE. 2012;7 doi: 10.1371/journal.pone.0035666. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Yang J., Roy A., Zhang Y. Protein–ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics. 2013;29:2588–2595. doi: 10.1093/bioinformatics/btt447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Min S., Lee B., Yoon S. Deep learning in bioinformatics. Briefings Bioinf. 2017;18:851–869. doi: 10.1093/bib/bbw068. [DOI] [PubMed] [Google Scholar]
  • 33.Quang D., Chen Y., Xie X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics. 2014;31:761–763. doi: 10.1093/bioinformatics/btu703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Almagro Armenteros J.J., Sønderby C.K., Sønderby S.K., Nielsen H., Winther O. DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics. 2017;33:3387–3395. doi: 10.1093/bioinformatics/btx431. [DOI] [PubMed] [Google Scholar]
  • 35.Pu L., Govindaraj R.G., Lemoine J.M., Wu H.-C., Brylinski M. DeepDrug3D: classification of ligand-binding pockets in proteins with a convolutional neural network. PLoS Comput Biol. 2019;15 doi: 10.1371/journal.pcbi.1006718. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Jiménez J., Doerr S., Martínez-Rosell G., Rose A.S., De Fabritiis G. DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics. 2017;33:3036–3042. doi: 10.1093/bioinformatics/btx350. [DOI] [PubMed] [Google Scholar]
  • 37.Sotriffer C., Klebe G. Identification and mapping of small-molecule binding sites in proteins: computational tools for structure-based drug design. Il Farmaco. 2002;57:243–251. doi: 10.1016/s0014-827x(02)01211-9. [DOI] [PubMed] [Google Scholar]
  • 38.Rose P.W. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 2014;43:D345–D356. doi: 10.1093/nar/gku1214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Goodford P.J. A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J Med Chem. 1985;28:849–857. doi: 10.1021/jm00145a002. [DOI] [PubMed] [Google Scholar]
  • 40.Laskowski R.A. SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph. 1995;13:323–330. doi: 10.1016/0263-7855(95)00073-9. [DOI] [PubMed] [Google Scholar]
  • 41.Liang J., Woodward C., Edelsbrunner H. Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand design. Protein Sci. 1998;7:1884–1897. doi: 10.1002/pro.5560070905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Dundas J., Ouyang Z., Tseng J., Binkowski A., Turpaz Y., Liang J. CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res. 2006;34:W116–W118. doi: 10.1093/nar/gkl282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Binkowski T.A., Naghibzadeh S., Liang J. CASTp: computed atlas of surface topography of proteins. Nucleic Acids Res. 2003;31:3352–3355. doi: 10.1093/nar/gkg512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Edelsbrunner H., Facello M., Liang J. On the definition and the construction of pockets in macromolecules. Discrete Appl Math. 1998;88:83–102. [PubMed] [Google Scholar]
  • 45.Laurie A.T., Jackson R.M. Q-SiteFinder: an energy-based method for the prediction of protein–ligand binding sites. Bioinformatics. 2005;21:1908–1916. doi: 10.1093/bioinformatics/bti315. [DOI] [PubMed] [Google Scholar]
  • 46.Huang B., Schroeder M. LIGSITE csc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol. 2006;6:19. doi: 10.1186/1472-6807-6-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Amari S. VISCANA: visualized cluster analysis of protein− ligand interaction based on the ab initio fragment molecular orbital method for virtual ligand screening. J Chem Inf Model. 2006;46:221–230. doi: 10.1021/ci050262q. [DOI] [PubMed] [Google Scholar]
  • 48.Le Guilloux V., Schmidtke P., Tuffery P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinf. 2009;10:168. doi: 10.1186/1471-2105-10-168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Ghersi D., Sanchez R. Improving accuracy and efficiency of blind protein-ligand docking by focusing on predicted binding sites. Proteins. 2009;74:417–424. doi: 10.1002/prot.22154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Zhu H., Pisabarro M.T. MSPocket: an orientation-independent algorithm for the detection of ligand binding pockets. Bioinformatics. 2010;27:351–358. doi: 10.1093/bioinformatics/btq672. [DOI] [PubMed] [Google Scholar]
  • 51.Ngan C.-H., Hall D.R., Zerbe B., Grove L.E., Kozakov D., Vajda S. FTSite: high accuracy detection of ligand binding sites on unbound protein structures. Bioinformatics. 2011;28:286–287. doi: 10.1093/bioinformatics/btr651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Lin Y., Yoo S., Sanchez R. SiteComp: a server for ligand binding site analysis in protein structures. Bioinformatics. 2012;28:1172–1173. doi: 10.1093/bioinformatics/bts095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Xie Z.-R., Liu C.-K., Hsiao F.-C., Yao A., Hwang M.-J. LISE: a server using ligand-interacting and site-enriched protein triangles for prediction of ligand-binding sites. Nucleic Acids Res. 2013;41:W292–W296. doi: 10.1093/nar/gkt300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Zhu X., Xiong Y., Kihara D. Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2. 0. Bioinformatics. 2014;31:707–713. doi: 10.1093/bioinformatics/btu724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Liu Y., Grimm M., Dai W.-T., Hou M.-C., Xiao Z.-X., Cao Y. CB-Dock: a web server for cavity detection-guided protein–ligand blind docking. Acta Pharmacol Sin. 2019:1–7. doi: 10.1038/s41401-019-0228-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Glaser F. ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics. 2003;19:163–164. doi: 10.1093/bioinformatics/19.1.163. [DOI] [PubMed] [Google Scholar]
  • 57.Capra J.A., Singh M. Predicting functionally important residues from sequence conservation. Bioinformatics. 2007;23:1875–1882. doi: 10.1093/bioinformatics/btm270. [DOI] [PubMed] [Google Scholar]
  • 58.Brylinski M., Skolnick J. A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proc Natl Acad Sci. 2008;105:129–134. doi: 10.1073/pnas.0707684105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Oh M., Joo K., Lee J. Protein-binding site prediction based on three-dimensional protein modeling. Proteins. 2009;77:152–156. doi: 10.1002/prot.22572. [DOI] [PubMed] [Google Scholar]
  • 60.Roche D.B., Tetchner S.J., McGuffin L.J. FunFOLD: an improved automated method for the prediction of ligand binding residues using 3D models of proteins. BMC Bioinf. 2011;12:160. doi: 10.1186/1471-2105-12-160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Roy A., Zhang Y. Recognizing protein-ligand binding sites by global structural alignment and local geometry refinement. Structure. 2012;20:987–997. doi: 10.1016/j.str.2012.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Bianchi V., Mangone I., Ferre F., Helmer-Citterich M., Ausiello G. webPDBinder: a server for the identification of ligand binding sites on protein structures. Nucleic Acids Res. 2013;41:W308–W313. doi: 10.1093/nar/gkt457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Skolnick J., Kihara D., Zhang Y. Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm. Proteins. 2004;56:502–518. doi: 10.1002/prot.20106. [DOI] [PubMed] [Google Scholar]
  • 64.Skolnick J., Kihara D. Defrosting the frozen approximation: PROSPECTOR—a new approach to threading. Proteins. 2001;42:319–331. [PubMed] [Google Scholar]
  • 65.Zhang Y., Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005;33:2302–2309. doi: 10.1093/nar/gki524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ortiz A.R., Strauss C.E., Olmea O. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci. 2002;11:2606–2621. doi: 10.1110/ps.0215902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Lopez G., Ezkurdia I., Tress M.L. Assessment of ligand binding residue predictions in CASP8. Proteins. 2009;77:138–146. doi: 10.1002/prot.22557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Needleman S.B., Wunsch C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48:443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
  • 69.Deng W., Breneman C., Embrechts M.J. Predicting protein− ligand binding affinities using novel geometrical descriptors and machine-learning methods. J Chem Inf Comput Sci. 2004;44:699–703. doi: 10.1021/ci034246+. [DOI] [PubMed] [Google Scholar]
  • 70.Rosipal R., Trejo L.J. Kernel partial least squares regression in reproducing kernel hilbert space. J Mach Learn Res. 2001;2:97–123. [Google Scholar]
  • 71.Ye K., Anton Feenstra K., Heringa J., IJzerman A.P., Marchiori E. Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting. Bioinformatics. 2007;24:18–25. doi: 10.1093/bioinformatics/btm537. [DOI] [PubMed] [Google Scholar]
  • 72.Kononenko I. Springer; 1994. Estimating attributes: analysis and extensions of RELIEF, in European conference on machine learning. [Google Scholar]
  • 73.Sotriffer C.A., Sanschagrin P., Matter H., Klebe G. SFCscore: scoring functions for affinity prediction of protein–ligand complexes. Proteins. 2008;73:395–419. doi: 10.1002/prot.22058. [DOI] [PubMed] [Google Scholar]
  • 74.Chauhan J.S., Mishra N.K., Raghava G.P. Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinf. 2009;10:434. doi: 10.1186/1471-2105-10-434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Capra J.A., Laskowski R.A., Thornton J.M., Singh M., Funkhouser T.A. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol. 2009;5 doi: 10.1371/journal.pcbi.1000585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Huang B. MetaPocket: a meta approach to improve protein ligand binding site prediction. OMICS. 2009;13:325–330. doi: 10.1089/omi.2009.0045. [DOI] [PubMed] [Google Scholar]
  • 77.Bandyopadhyay S., Coyle E.J. IEEE INFOCOM 2003. Twenty-second annual joint conference of the IEEE computer and communications societies (IEEE Cat. No. 03CH37428) Vol. 3. IEEE; 2003. An energy efficient hierarchical clustering algorithm for wireless sensor networks. [Google Scholar]
  • 78.Si J., Zhang Z., Lin B., Schroeder M., Huang B. MetaDBSite: a meta approach to improve protein DNA-binding sites prediction. BMC Syst Biol. 2011;5:S7. doi: 10.1186/1752-0509-5-S1-S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Chen K., Mizianty M.J., Kurgan L. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics. 2011;28:331–341. doi: 10.1093/bioinformatics/btr657. [DOI] [PubMed] [Google Scholar]
  • 80.Durrant J.D., McCammon J.A. NNScore 2.0: a neural-network receptor–ligand scoring function. J Chem Inf Model. 2011;51:2897–2903. doi: 10.1021/ci2003889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Durrant J.D., McCammon J.A. NNScore: a neural-network-based scoring function for the characterization of protein–ligand complexes. J Chem Inf Model. 2010;50:1865–1871. doi: 10.1021/ci100244v. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Siu K.-Y., Bruck J. Neural computation of arithmetic functions. Proc IEEE. 1990;78:1669–1675. [Google Scholar]
  • 83.Yu D.-J., Hu J., Yang J., Shen H.-B., Tang J., Yang J.-Y. Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans Comput Biol Bioinf. 2013;10:994–1008. doi: 10.1109/TCBB.2013.104. [DOI] [PubMed] [Google Scholar]
  • 84.Brylinski M., Feinstein W.P. eFindSite: improved prediction of ligand binding sites in protein models using meta-threading, machine learning and auxiliary ligands. J Comput Aided Mol Des. 2013;27:551–567. doi: 10.1007/s10822-013-9663-5. [DOI] [PubMed] [Google Scholar]
  • 85.Panwar B., Gupta S., Raghava G.P. Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information. BMC Bioinf. 2013;14:44. doi: 10.1186/1471-2105-14-44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Chen P., Huang J.Z., Gao X. BMC bioinformatics. BioMed Central; 2014. LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Yu D.-J., Hu J., Li Q.-M., Tang Z.-M., Yang J.-Y., Shen H.-B. Constructing query-driven dynamic machine learning model with application to protein-ligand binding sites prediction. IEEE Trans Nanobiosci. 2015;14:45–58. doi: 10.1109/TNB.2015.2394328. [DOI] [PubMed] [Google Scholar]
  • 88.Chen P. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction. IEEE/ACM Trans Comput Biol Bioinf. 2015;13:901–912. doi: 10.1109/TCBB.2015.2505286. [DOI] [PubMed] [Google Scholar]
  • 89.Krivák R., Hoksza D. Improving protein-ligand binding site prediction accuracy by classification of inner pocket points using local features. J Cheminf. 2015;7:12. doi: 10.1186/s13321-015-0059-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Cang Z., Wei G.W. Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction. Int J Numer Meth Biomed Eng. 2018;34 doi: 10.1002/cnm.2914. [DOI] [PubMed] [Google Scholar]
  • 91.Pedregosa F. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–2830. [Google Scholar]
  • 92.Morrone Xavier M. SAnDReS a computational tool for statistical analysis of docking results and development of scoring functions. Comb Chem High Throughput Screening. 2016;19:801–812. doi: 10.2174/1386207319666160927111347. [DOI] [PubMed] [Google Scholar]
  • 93.Krivák R., Hoksza D. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. J Cheminf. 2018;10:39. doi: 10.1186/s13321-018-0285-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Wu Q., Peng Z., Zhang Y., Yang J. COACH-D: improved protein–ligand binding sites prediction with refined ligand-binding poses through molecular docking. Nucleic Acids Res. 2018;46:W438–W442. doi: 10.1093/nar/gky439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.da Silva A.D., Bitencourt-Ferreira G., de Azevedo W.F. Taba: A tool to analyze the binding affinity. J Comput Chem. 2019 doi: 10.1002/jcc.26048. [DOI] [PubMed] [Google Scholar]
  • 96.Ofran Y., Mysore V., Rost B. Prediction of DNA-binding residues from sequence. Bioinformatics. 2007;23:i347–i353. doi: 10.1093/bioinformatics/btm174. [DOI] [PubMed] [Google Scholar]
  • 97.Yan C., Terribilini M., Wu F., Jernigan R.L., Dobbs D., Honavar V. Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinf. 2006;7:262. doi: 10.1186/1471-2105-7-262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Wang L., Brown S.J. BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res. 2006;34:W243–W248. doi: 10.1093/nar/gkl298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Wang L., Yang M.Q., Yang J.Y. Prediction of DNA-binding residues from protein sequence information using random forests. BMC Genomics. 2009;10:S1. doi: 10.1186/1471-2164-10-S1-S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Hwang S., Gou Z., Kuznetsov I.B. DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics. 2007;23:634–636. doi: 10.1093/bioinformatics/btl672. [DOI] [PubMed] [Google Scholar]
  • 101.Ahmad S., Gromiha M.M., Sarai A. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics. 2004;20:477–486. doi: 10.1093/bioinformatics/btg432. [DOI] [PubMed] [Google Scholar]
  • 102.Kent W.J. BLAT—the BLAST-like alignment tool. Genome Res. 2002;12:656–664. doi: 10.1101/gr.229202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Chauhan J.S., Mishra N.K., Raghava G.P. Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information. BMC Bioinf. 2010;11:301. doi: 10.1186/1471-2105-11-301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Roy A., Yang J., Zhang Y. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res. 2012;40:W471–W477. doi: 10.1093/nar/gks372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Hinton G.E., Salakhutdinov R.R. Reducing the dimensionality of data with neural networks. Science. 2006;313:504–507. doi: 10.1126/science.1127647. [DOI] [PubMed] [Google Scholar]
  • 106.Amodei D. et al. (2016) Deep speech 2: End-to-end speech recognition in english and mandarin,in International conference on machine learning Vol.
  • 107.He K., Zhang X., Ren S., Sun J. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Deep residual learning for image recognition. [Google Scholar]
  • 108.Papandreou G., Chen L.-C., Murphy K.P., Yuille A.L. Proceedings of the IEEE international conference on computer vision. 2015. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. [Google Scholar]
  • 109.Voulodimos A., Doulamis N., Doulamis A., Protopapadakis E. (2018) Deep learning for computer vision: a brief review. Comput Intel Neurosci. 2018 doi: 10.1155/2018/7068349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Wang F.-Y. Where does AlphaGo go: From church-turing thesis to AlphaGo thesis and beyond. IEEE/CAA J Autom Sin. 2016;3:113–120. [Google Scholar]
  • 111.Greenspan H., Van Ginneken B., Summers R.M. Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans Med Imaging. 2016;35:1153–1159. [Google Scholar]
  • 112.Sun W., Zheng B., Qian W. Medical imaging 2016: computer-aided diagnosis Vol. 9785. International Society for Optics and Photonics; 2016. Computer aided lung cancer diagnosis with deep learning algorithms. [Google Scholar]
  • 113.Chen H., Engkvist O., Wang Y., Olivecrona M., Blaschke T. The rise of deep learning in drug discovery. Drug Discovery Today. 2018;23:1241–1250. doi: 10.1016/j.drudis.2018.01.039. [DOI] [PubMed] [Google Scholar]
  • 114.Cheng J.-Z. Computer-aided diagnosis with deep learning architecture: applications to breast lesions in US images and pulmonary nodules in CT scans. Sci Rep. 2016;6:24454. doi: 10.1038/srep24454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Kleene SC. “Representation of events in nerve nets and finite automata,” RAND PROJECT AIR FORCE SANTA MONICA CA, 1951.
  • 116.Rumelhart D.E., Hinton G.E., Williams R.J. Learning representations by back-propagating errors. Cognitive Model. 1988;5:1. [Google Scholar]
  • 117.Smolensky P. Chapter 6: information processing in dynamical systems: foundations of harmony theory, Parallel distributed processing: explorations in the microstructure of cognition 1.
  • 118.Zhang S. A deep learning framework for modeling structural features of RNA-binding protein targets. Nucl Acids Res. 2015;44 doi: 10.1093/nar/gkv1025. e32-e32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Alipanahi B., Delong A., Weirauch M.T., Frey B.J. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol. 2015;33:831. doi: 10.1038/nbt.3300. [DOI] [PubMed] [Google Scholar]
  • 120.Jimenez J., Skalic M., Martinezrosell G., De Fabritiis G. KDEEP: protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks. J Chem Inf Model. 2018;58:287–296. doi: 10.1021/acs.jcim.7b00650. [DOI] [PubMed] [Google Scholar]
  • 121.Cui Y., Dong Q., Hong D., Wang X. Predicting protein-ligand binding residues with deep convolutional neural networks. BMC Bioinf. 2019;20:93. doi: 10.1186/s12859-019-2672-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Lee I., Keum J., Nam H. DeepConv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS Comput Biol. 2019;15 doi: 10.1371/journal.pcbi.1007129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Zheng L, Fan J, Mu Y. (2019) OnionNet: a multiple-layer inter-molecular contact based convolutional neural network for protein-ligand binding affinity prediction, arXiv preprint arXiv:1906.02418. [DOI] [PMC free article] [PubMed]
  • 124.Zhang L., Bai W., Yuan N., Du Z. Comprehensively benchmarking applications for detecting copy number variation. PLoS Comput Biol. 2019;15 doi: 10.1371/journal.pcbi.1007069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Zhang L. Discovery of a ruthenium complex for the theranosis of glioma through targeting the mitochondrial DNA with bioinformatic methods. Int J Mol Sci. 2019;20:4643. doi: 10.3390/ijms20184643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Zhang L. Revealing dynamic regulations and the related key proteins of myeloma-initiating cells by integrating experimental data into a systems biological model. Bioinformatics. 2019 doi: 10.1093/bioinformatics/btz542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Zhang L. EZH2-, CHD4-, and IDH-linked epigenetic perturbation and its association with survival in glioma patients. J Mol Cell Biol. 2017;9:477–488. doi: 10.1093/jmcb/mjx056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Zhang L. Investigation of mechanism of bone regeneration in a porous biodegradable calcium phosphate (CaP) scaffold by a combination of a multi-scale agent-based model and experimental optimization/validation. Nanoscale. 2016;8:14877–14887. doi: 10.1039/c6nr01637e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Zhang L., Xiao M., Zhou J., Yu J. Lineage-associated underrepresented permutations (LAUPs) of mammalian genomic sequences based on a Jellyfish-based LAUPs analysis application (JBLA) Bioinformatics. 2018;34:3624–3630. doi: 10.1093/bioinformatics/bty392. [DOI] [PubMed] [Google Scholar]
  • 130.Zhang L., Zhang S. Using game theory to investigate the epigenetic control mechanisms of embryo development: Comment on: “Epigenetic game theory: How to compute the epigenetic control of maternal-to-zygotic transition” by Qian Wang et al. Phys Life Rev. 2017;20:140–142. doi: 10.1016/j.plrev.2017.01.007. [DOI] [PubMed] [Google Scholar]
  • 131.Levin N.M.B., Pintro V.O., Bitencourt-Ferreira G., de Mattos B.B., de Castro Silvério A., de Azevedo Jr W.F. Development of CDK-targeted scoring functions for prediction of binding affinity. Biophys Chem. 2018;235:1–8. doi: 10.1016/j.bpc.2018.01.004. [DOI] [PubMed] [Google Scholar]
  • 132.Yang J., Zhang Y. I-TASSER server: new development for protein structure and function predictions. Nucleic Acids Res. 2015;43:W174–W181. doi: 10.1093/nar/gkv342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Yang J., Yan R., Roy A., Xu D., Poisson J., Zhang Y. The I-TASSER Suite: protein structure and function prediction. Nat Methods. 2015;12:7. doi: 10.1038/nmeth.3213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Li G.-Q., Liu Z., Shen H.-B., Yu D.-J. Target M6A: identifying N 6-methyladenosine sites from RNA sequences via position-specific nucleotide propensities and a support vector machine. IEEE Trans Nanobiosci. 2016;15:674–682. doi: 10.1109/TNB.2016.2599115. [DOI] [PubMed] [Google Scholar]
  • 135.Wei Z.-S., Yang J.-Y., Shen H.-B., Yu D.-J. A cascade random forests algorithm for predicting protein-protein interaction sites. IEEE Trans Nanobiosci. 2015;14:746–760. doi: 10.1109/TNB.2015.2475359. [DOI] [PubMed] [Google Scholar]
  • 136.Wei Z.-S., Han K., Yang J.-Y., Shen H.-B., Yu D.-J. Protein–protein interaction sites prediction by ensembling SVM and sample-weighted random forests. Neurocomputing. 2016;193:201–212. [Google Scholar]
  • 137.Wass M.N., Barton G., Sternberg M.J. CombFunc: predicting protein function using heterogeneous data sources. Nucleic Acids Res. 2012;40:W466–W470. doi: 10.1093/nar/gks489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Naz F., Shahbaaz M., Bisetty K., Islam A., Ahmad F., Hassan M.I. Designing new kinase inhibitor derivatives as therapeutics against common complex diseases: structural basis of microtubule affinity-regulating kinase 4 (MARK4) inhibition. OMICS. 2015;19:700–711. doi: 10.1089/omi.2015.0111. [DOI] [PubMed] [Google Scholar]
  • 139.Govinda K., Hassan M.M., Sirimulla S. KinasepKipred: a predictive model for estimating ligand-kinase inhibitor constant (pKi) BioRxiv. 2019 [Google Scholar]
  • 140.Goodfellow I., Bengio Y., Courville A. MIT press; 2016. Deep learning. [Google Scholar]
  • 141.LeCun Y., Bengio Y., Hinton G. Deep learning. nature. 2015;521:436. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
  • 142.de Ávila M.B., Bitencourt-Ferreira G., de Azevedo W.F. Structural basis for inhibition of enoyl-[acyl carrier protein] reductase (InhA) from Mycobacterium tuberculosis. Curr Med Chem. 2019 doi: 10.2174/0929867326666181203125229. [DOI] [PubMed] [Google Scholar]
  • 143.Volkart P.A., Bitencourt-Ferreira G., Souto A.A., de Azevedo W.F. Cyclin-dependent kinase 2 in cellular senescence and cancer. A structural and functional review. Curr Drug Targets. 2019;20:716–726. doi: 10.2174/1389450120666181204165344. [DOI] [PubMed] [Google Scholar]
  • 144.de Ávila M.B., Xavier M.M., Pintro V.O., de Azevedo Jr W.F. Supervised machine learning techniques to predict binding affinity. A study for cyclin-dependent kinase 2. Biochem Biophys Res Commun. 2017;494:305–310. doi: 10.1016/j.bbrc.2017.10.035. [DOI] [PubMed] [Google Scholar]
  • 145.Cimermancic P. CryptoSite: expanding the druggable proteome by characterization and prediction of cryptic binding sites. J Mol Biol. 2016;428:709–719. doi: 10.1016/j.jmb.2016.01.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.Guterres H., Lee H.S., Im W. Ligand-binding-site structure refinement using molecular dynamics with restraints derived from predicted binding site templates. J Chem Theory Comput. 2019;15:6524–6535. doi: 10.1021/acs.jctc.9b00751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Bowman G.R., Bolin E.R., Hart K.M., Maguire B.C., Marqusee S. Discovery of multiple hidden allosteric sites by combining Markov state models and experiments. Proc Natl Acad Sci. 2015;112:2734–2739. doi: 10.1073/pnas.1417811112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148.Udi Y. Unraveling hidden regulatory sites in structurally homologous metalloproteases. J Mol Biol. 2013;425:2330–2346. doi: 10.1016/j.jmb.2013.04.009. [DOI] [PubMed] [Google Scholar]
  • 149.Bowman G.R., Geissler P.L. Equilibrium fluctuations of a single folded protein reveal a multitude of potential cryptic allosteric sites. Proc Natl Acad Sci. 2012;109:11681–11686. doi: 10.1073/pnas.1209309109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.McCarthy M., Prakash P., Gorfe A.A. Computational allosteric ligand binding site identification on Ras proteins. Acta Biochim Biophy Sin. 2015;48:3–10. doi: 10.1093/abbs/gmv100. [DOI] [PubMed] [Google Scholar]
  • 151.Prakash P., Hancock J.F., Gorfe A.A. Binding hotspots on K-ras: Consensus ligand binding sites and other reactive regions from probe-based molecular dynamics analysis. Proteins. 2015;83:898–909. doi: 10.1002/prot.24786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152.Prakash P., Sayyed-Ahmad A., Gorfe A.A. pMD-membrane: a method for ligand binding site identification in membrane-bound proteins. PLoS Comput Biol. 2015;11 doi: 10.1371/journal.pcbi.1004469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153.Prakash P., Zhou Y., Liang H., Hancock J.F., Gorfe A.A. Oncogenic K-Ras binds to an anionic membrane in two distinct orientations: a molecular dynamics analysis. Biophys J. 2016;110:1125–1138. doi: 10.1016/j.bpj.2016.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data 1
mmc1.xml (270B, xml)

Articles from Computational and Structural Biotechnology Journal are provided here courtesy of Research Network of Computational and Structural Biotechnology

RESOURCES