Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2024 May 12;25(3):bbae162. doi: 10.1093/bib/bbae162

A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond

Pengzhen Jia 1,#, Fuhao Zhang 2,3,#, Chaojin Wu 4, Min Li 5,
PMCID: PMC11089422  PMID: 38739759

Abstract

Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein–ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein–ligand interactions. Here, we review a comprehensive set of over 160 protein–ligand interaction predictors, which cover protein–protein, protein−nucleic acid, protein−peptide and protein−other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.

Keywords: protein–ligand interaction, protein–protein interaction, protein–peptide interaction, protein–nucleic acid interaction, protein–other ligands interaction

INTRODUCTION

Proteins are essential participants in most biological processes within cells. In order to carry out cellular functions, proteins interact with a variety of ligands, such as proteins, nucleic acids, peptides, nucleotides, hemes and ions, forming stable complexes. To be specific, protein–protein interactions (PPIs) underlie many cellular processes, such as signal transduction, transport and metabolism [1, 2]. Interactions of proteins and nucleic acids are involved in post-transcriptional modification, gene replication, gene expression and many other biological processes [3–5]. Furthermore, peptides mediate ~40% of protein interactions that contribute to abnormal cellular behaviors causing various diseases [6]. In addition, protein–other ligands (nucleotide, heme, ion) interactions are also indispensable for biological activities [7]. For example, protein–nucleotide interactions play a crucial role in energy provision [8], interactions of proteins and hemes are significant for circadian rhythm and cell-cycle regulation [9], while protein–ion interactions contribute to protein structural stability [10]. Knowledge of protein–ligand interactions facilitates the annotation of protein functions, the comprehension of cellular processes, understanding the pathogenesis of diseases and developing new therapeutic approaches [11–14]. There are some databases that have compiled information on protein–ligand interactions at protein-, residue- and atom-level. Protein-level databases provide information on whether proteins interact with ligands, such as STRING [15], mentha [16] and BioGRID [17]. Residue-level databases, such as BioLip [18] and DisProt [19], provide residue-level annotations that indicate whether amino acids in the protein sequence interact with ligands. Compared with protein-level databases, residue-level databases provide more detailed annotations about the interactions of proteins, nucleic acids, peptides and other ligands. PDB [20] is the atom-level database that provides a large amount of 3D structural information on protein–ligand interactions. However, identification of these protein–ligand interactions by biological experiments, such as affinity chromatography, nuclear magnetic resonance and site-directed mutagenesis, is time-consuming and relatively expensive. Given the continuous advancement of high-throughput sequencing technologies, biological experiments are unable to match the rapid increase in the number of proteins. For example, the Uniprot [21] database contains over 250 million protein sequences (as of April 2023), while BioLiP contains 882 170 proteins (as of 29 September 2023), PDB contains 1278 757 protein structures, including 210 180 experimental structures and 1068 577 computational structures. STRING (as of 26 July 2023) contains 59 309 604 proteins. Therefore, it is highly urgent and challenging to develop computational methods for predicting protein–ligand interactions.

In the past decades, a large number of computational methods have been developed to predict protein–ligand interactions [22–34]. Generally, the pipeline of these methods begins with inputs, from which protein feature profiles are extracted. Models are employed to identify the interactions between proteins and ligands or to infer the specific interaction residues, as illustrated in Figure 1. First, the inputs for these methods can be categorized into three types: protein sequences [24, 35], protein structures [5, 36] and PPI networks [37, 38]. Protein sequences are typically characterized using amino acid binary encodings, such as one-hot encoding, and now protein language models are employed to generate sequence embeddings. The second common type of inputs is protein structures. Similar to sequence-based predictors, structure-based predictors also extract various protein feature profiles to improve predictions. However, given the difficulty in obtaining protein structure data and the direct influence of structure quality on prediction results, only a few methods focus on protein structures compared with sequences. Methods that utilize PPI networks primarily predict PPIs, but compared with methods employing sequence data, these display certain limitations and lower scalability.

Figure 1.

Figure 1

The pipeline of protein–ligand interaction predictors includes inputs, feature profiles, models and prediction. Inputs mainly include protein sequences, protein structures and PPI networks. Feature profiles panel shows various features including amino acid binary encoding, evolutionary information, embedding features based on the protein language model, etc. The model panel displays the commonly used prediction methods including homology-, machine learning- and DL-based methods. The prediction-level and ligand types are shown in the prediction panel.

Second, according to models, these methods can be grouped into two categories: (i) homology-based methods [39, 40], typically identifying annotated proteins similar to the query proteins and assigning those annotations accordingly. (ii) Learning-based methods [24, 35], applicable even to proteins with few similarities to previously characterized ones. Learning-based methods typically utilize prediction models based on machine learning or deep learning (DL) algorithms, optimizing the model for optimal prediction performance on a training dataset of annotated proteins. The optimized model then infers interactions of proteins outside the training dataset. This study primarily focuses on the research pertaining to learning-based methods.

From the perspective of the prediction level, the computational methods can be classified into two classes: protein-level [41, 42] and residue-level [24, 35]. Protein-level methods predict whether a protein interacts with ligands, whereas residue-level methods determine which specific residues within the protein interact with ligands. Residue-level methods offer more detailed annotations compared with protein-level methods. Furthermore, based on the type of interaction ligands, these methods can be divided into four categories. The first category comprises predictors of PPIs [22, 42], encompassing both protein- and residue-level predictors. Protein-level methods infer interactions by utilizing PPI networks or pairs of proteins, while residue-level methods typically employ sliding windows as inputs to identify interaction residues. The second category involves predictions of protein–nucleic acid interactions [24, 35], including protein–DNA, protein–RNA and general protein–nucleic acid interactions. This category includes both protein- and residue-level methods and primarily takes protein sequences as inputs. The third category involves protein–peptide interactions [43, 44] at residue-level and the remaining interaction ligands are classified into the fourth category, such as nucleotides [8, 45, 46], hemes [9, 47, 48] and ions [49–51]. The classification and temporal distribution of all surveyed predictors are illustrated in Figure 2.

Figure 2.

Figure 2

The distribution of protein–ligand interaction predictors across different ligand types (proteins, nucleic acids, peptides and others) and time periods (every 5 years as a time period). The horizontal axis indicates the number of relevant prediction methods and the vertical axis indicates the time period.

In this study, we review a comprehensive set of over 160 predictors section by section, including protein–protein, protein−nucleic acid, protein−peptide and protein–other ligands (nucleotide, heme, ions) interactions. We discuss and try to offer insightful analysis of their inputs, feature profiles, models, availability, etc. Finally, we give a summary and directions for future research.

PROTEIN–PROTEIN INTERACTION

According to the varying inputs and outputs, we grouped 58 PPI predictors into three categories, as shown in Figure 3: (i) PAIR-pro predictors [38, 42], which identify interactions between a pair of proteins while cannot provide the specific positions of the interaction residues; (ii) PAIR-res predictors [52, 53], which predict whether proteins interact with each other and the positions of the interaction residues and (iii) SINGLE-res predictors [54, 55], inferring interaction residues within a single protein. Among the 58 predictors, there are 24 PAIR-pro predictors, 4 PAIR-res predictors and 30 SINGLE-res predictors.

Figure 3.

Figure 3

PPI predictors are divided into three categories based on inputs and outputs: PAIR-pro, PAIR-res and SINGLE-res. The PAIR-pro predictor can predict whether two proteins interact or not, but cannot provide the specific positions of the interaction residues. However, the PAIR-res predictor can predict whether proteins interact with each other and the positions of the interaction residues. In addition, the SINGLE-res predictor can predict the interaction residues on a single protein.

PAIR-pro predictors

PAIR-pro [37, 38, 41, 42, 56–75] predictors, summarized in Table 1, focus on protein-level interactions and accept two proteins as inputs. The most frequently employed input types for PAIR-pro predictors are protein sequences and protein structures, involving a total of 19 predictors. In addition, a few predictors utilize PPI networks or gene ontology (GO) annotation sets as inputs, such as TransformerGo [42] and the predictor proposed by Kovács et al. [71]. The analysis of the dataset for PAIR-pro reveals that the majority of the data originates from public databases, such as the PDB [20], DIP [76], HPRD [77], HIPPLE [78], HINT [79], STRING [15], SKEMPI [80], BioGRID [81], PrePPI [82] and IntAct [83]. Among these, DIP and HPRD are the most commonly utilized PPI databases. Some manually curated datasets have also been extensively employed by researchers, such as the high-quality dataset containing 5594 interaction pairs compiled by Guo et al. [58].

Table 1.

Summary of PAIR-pro predictors in terms of inputs, feature profiles, models and availability

Predictors Inputs Feature profiles Models Year Availability
Bock et al. [56] Seqs Charge, hydrophobicity, surface tension SVM 2001 ×
Shen et al. [57] Seqs Amino acids classification, CT method SVM 2007 ×
Guo et al. [58] Seqs AC (hydrophobicity, hydrophicility, volumes of side chains of amino acids, polarity, polarizability, solvent-accessible surface area and net charge index of side chains of amino acids) SVM 2008 ×
Yang et al. [59] Seqs Amino acids classification, LDs KNN 2010 ×
LR_PPI [60] Seqs Amino acids classification, CT method Latent dirichlet allocation-RF 2010
PCA-EELM [61] Seqs CT scores, LD, AC (hydrophobicity, volumes of side chains of amino acids, polarity, polarizability, solvent-accessible surface area and net charge index of side chains, Moran autocorrelation Principal component analysis, ELM 2013 ×
MCDPPI [62] Seqs Multi-scale continuous and discontinuous, LD SVM 2014 ×
You et al. [63] Seqs LD ELM 2014 ×
PR-LPQ Descriptor [64] Seqs Physicochemical property response matrix (hydrophobicity) RF 2015 ×
DeepPPI [65] Seqs Amino acid composition, dipeptide composition, LD, Quasi-Sequence-Order descriptors, Amphiphilic Pseudoamino Acid Composition DNN 2017 ×
Sun et al. [66] Seqs AC (hydrophobicity, hydrophilicity, net charge index of side chains, polarity, polarizability, solvent accessible surface area, volume of side chains), CT method Stacked autoencoder 2017 ×
DANEOsf [37] Nets Evolutionary distance, geometric embedding Density function, Bayes 2017 ×
DPPI [67] Seqs Evolutionary profiles CNN 2018
DNN-PPI [68] Seqs Amino acids encoding CNN, LSTM 2018 ×
PIPR [69] Seqs Amino acids classification, pre-training the Skip-Gram model RCNN 2019
LightGBM-PPI [70] Seqs Pseudo amino acid composition, autocorrelation descriptor, CT method, LD LightGBM 2019
Kovács et al. [71] Nets N/A L3 2019
Sim [72] Nets N/A Random network, L3 2020
D-SCRIPT [73] Seqs Amino acids classification LSTM 2021
DeepTrio [41] Seqs One-hot encoding CNN 2022
FoldDock [74] Seqs/Strs MSA DNN 2022
TransformerGO [42] GO GO terms Transformer 2022
protein2vec [75] GO Amino acids encoding, GO terms LSTM, DNN 2022
PPISB [38] Nets N/A Mixed membership stochastic blockmodel 2022 ×

Note: Seqs, Strs and Nets correspond to protein sequences, protein structures and PPI networks.

In the context of machine learning or DL models, proteins are typically encoded as embeddings to serve as inputs. The analysis of feature profiles employed across all PAIR-pro predictors reveals that each amino acid of a protein is generally characterized using an assortment of amino acid binary encodings (such as one-hot encoding, classification encoding), physicochemical profiles (including hydrophobicity, polarity and electrostatic potential), structural features (such as secondary structure (SS), solvent accessibility and molecular surface curvature) and evolutionary information (including conservation, position-weight matrices and coevolution). Each category captures unique and critical aspects of protein features that are essential for understanding protein interactions. Amino acid binary encodings provide a straightforward and efficient strategy for protein sequence representation, and these encodings are also highly interpretable. PPI can be defined as four interaction modes: electrostatic interaction, hydrophobic interaction, steric interaction and hydrogen bond, so physicochemical profiles are often used to reflect these modes [58]. Besides, the interaction between proteins and ligands is closely related to the spatial structure of the proteins, and structural features reveal the spatial structural information of proteins. For example, the use of solvent accessibility is motivated by the fact that interactions occur on the protein surface [84]. Furthermore, interaction residues are typically conserved across homologous protein sequences [84], thus evolutionary information is used in protein–ligand interaction predictions to quantify the conservation of residues. In addition, a variety of popular methods have been proposed to characterize protein sequences, such as local descriptor (LD) [85, 86] and GO terms [87].

Amino acid binary encodings primarily encompass one-hot encoding and amino acid classification encoding. One-hot encoding, alternatively referred to as one-bit valid encoding, utilizes N-bit status registers to record N states. A protein sequence of length L is encoded into an L*20 binary vector through one-hot encoding. It typically depicts amino acids as a 20-dimensional binary vector, with each bit corresponding to a distinct class of amino acids. For amino acid classification encoding, given the dipoles and volumes of the side chains of amino acids reflect electrostatic and hydrophobic interactions that dominate PPIs, respectively, Shen et al. calculated them correspondingly by the density-functional theory method and molecular modeling approach [57], which resulted in the classification of the 20 types of amino acids into 7 categories.

Expanding on the amino acid classification encoding, Shen et al. introduced the conjoint triad (CT) [57] to capture adjacent amino acid information. CT bears similarity to the 3-mer approach, whereby three sequential amino acids are treated as a single entity, and the frequency of each triad's occurrence forms the profile outputs. Through CT, a protein sequence can be represented within a binary space (V, F) where V denotes the vector space of sequence features, with each bit corresponding to a specific triad type, while F signifies the respective frequency corresponding to V. It notably reduces the feature dimension relative to 3-mer coding, while still retaining local information. Among all PPI predictors, over 20% [57, 60, 61, 66, 70] employ CT for encoding protein sequences.

Concurrently, the physicochemical profiles can substantially improve the accuracy of predictions. Features such as electrostatic and hydrophobic profiles, which have been found to be crucial in PPIs, are utilized in over 37% of PAIR-pro predictors. Other features including hydrophilicity, polarity, polarizability, aromaticity, hydrogen bond acceptance or donation, positive or negative ionizability and metallicity, among others, also prove beneficial for PPI predictions [58, 61, 66]. Among predictors utilizing physicochemical profiles, three representations are predominantly employed. The first is direct property values, which necessitate normalization prior to implementation. The second is binary values, such as polarity and non-polarity. The third is auto covariance (AC) [58], which leverages multiple physicochemical properties to construct a comprehensive feature profile tensor. PCA-EELM [61] and predictors [58, 66] utilize AC with properties, such as hydrophobicity, volumes of amino acid side chains, polarity, polarizability, solvent-accessible surface area, and the net charge index of side chains. The demonstrated efficacy of these predictors lends support to the assertion that AC is beneficial for PPI predictions.

Moreover, the structural features used by over 20% of predictors of PAIR-pro [37, 56, 58, 61, 66] can be categorized into three groups according to their sources: (i) primary structural features, such as sequence length and amino acid composition; (ii) secondary structural features, including the SS; (iii) tertiary structural features, encompassing atom composition, surface tension, accessibility and more. Typically, primary and tertiary structural features can be readily obtained from raw protein sequences and structures. However, secondary structural features are generally collected from predictions based on primary structures or calculations of tertiary structures. Therefore, we discuss two approaches to derive SSs from protein sequences and structures, respectively. PSIPRED [88] is a sequence-based tool for predicting SSs, employing a neural network to filter and convolve the protein evolutionary information generated by PSI-BLAST [89], ultimately outputting a protein SS scoring matrix. Conversely, DSSP [90, 91] calculates the most probable SS distribution by analyzing the protein's tertiary structures.

Proteins that derive from the same ancestral protein typically exhibit similar sequences and are likely to share spatial structures and biological functions [92]. Hence, the unknown structure and function of a protein can be inferred based on other annotated proteins with similar sequences. The alignment of sequences from multiple species facilitates the identification of highly conserved residues. Consequently, protein evolutionary information is beneficial for PPI predictions. Six methods [37, 62, 65, 67, 70, 74] utilize evolutionary information to predict PPIs, and there exists a wide variety of evolutionary information, such as multiple sequence alignment (MSA), sequence profiles, evolutionary distance and more. MSA arranges protein sequences to identify similar regions and is utilized in FoldDock [74], which draws inspiration from AlphaFold [93] and employs MSA to predict PPI complexes [69].

On the other hand, descriptors are employed by >20% of the PPI predictors [59, 61–63, 65, 70], and almost all of these predictors utilize LD. LD is a non-aligned method employed to characterize local protein information. It divides each protein into 10 local regions, with varying lengths and compositions. In addition, proteins contain a lot of functional domains, which can be represented by the identifiers of GO terms, and then they can be encoded as embeddings and served as inputs.

In summary, amino acid binary encodings stand out as the most representative and easily accessible. Electrostatic and hydrophobic interactions, which are dominant in PPIs [57], are the most popular profiles among all physicochemical profiles. The application of structural features depends on the inputs. For instance, spatial structural features are more commonly used in structure-based predictors. Since evolutionary information is related to structures and functions of proteins, it is also often used to improve the predictions of PPIs.

Among all PAIR-pro predictors, we note that most of them rely on either machine learning or DL models. In the early stages, there was a tendency toward using machine learning models. For instance, there are four predictors using support vector machines (SVM), two predictors utilizing random forests (RF), two predictors utilizing extreme learning machines (ELM) and several others using techniques like K-nearest neighbors (KNN). However, DL models have become more and more popular in recent years. From 2018, there were only three methods that used machine learning, compared with eight predictors that used DL. Moreover, the complexity of prediction models is incrementally increasing. For example, DPPI [67] harnesses convolutional neural networks (CNN), DNN-PPI [68] engages both CNN and long short-term memory (LSTM) and TransformerGO [42] applies the Transformer model to predict PPIs. Compared with machine learning models, these DL models achieve more accurate predictions. Furthermore, we find that the area under the ROC curve (AUC) has risen as the predominant assessment criterion currently. On the other hand, to evaluate the binary predictions, precision emerges as one of the most frequently employed assessment criteria.

Furthermore, regarding availability, 11 out of the 24 methods provide available source code (10 methods) or web servers (3 methods). Most predictors provide source code rather than web servers. LR_PPI [60], D-SCRIPT [73] and DeepTrio [41] are the only three predictors that provide currently running web servers. Details of availability are shown in Supplementary Table S1. For the latest several predictors, we evaluated their performance separately based on their inputs. For predictors with GO as inputs, TransformerGO improves the performance on average with 5% across all subsets compared with protein2vec [75]. For predictors with PPI networks as inputs, Kovács et al. proposed the L3-based predictor [71], while Sim combined a new link prediction approach with L3 and proved that Sim is always superior to L3 [72]. For sequence-based predictors, we assessed the performance of DeepTrio [41] and D-SCRIPT [73] on the virus–human interaction dataset [41, 94]. DeepTrio attains an AUC value of 79.9% [41] while D-SCRIPT attains that of 66.5% (Supplementary Table S2).

PAIR-res predictors

Compared with PAIR-pro predictors, PAIR-res predictors focus on the interaction residues between protein pairs, which are summarized in Table 2. There are four PAIR-res predictors: BIPSPI [52], Plnet [95], BIPSPI+ [53] and the predictor proposed by Liu et al. [96]. Similar to PAIR-pro predictors, PAIR-res predictors accept protein pairs as inputs. The difference is that these predictors only accept protein sequences or structures as inputs, instead of PPI networks and GO terms that PAIR-pro predictors would use. In particular, BIPSPI+ extends its inputs beyond just pairs of protein sequences or structures. It also supports a combination of a protein sequence and a protein structure as inputs, with different inputs determining unique prediction patterns within BIPSPI+. Our investigation into the dataset shows PAIR-res predictors typically source interaction proteins and residues from PDB and four predictors utilize the protein–protein Docking Benchmark version 5 (DBv5) dataset [52, 97]. The DBv5 comprises 230 non-redundant protein complexes, all of which have bound and unbound structures. Each complex boasts a resolution better than 3.25 Å, and each sequence exceeds 30 amino acids in length.

Table 2.

Summary of PAIR-res predictors in terms of inputs, feature profiles, models and availability

Predictor Inputs Feature profiles Models Year Availability
BIPSPI [52] Seqs/Strs One-hot encoding, PSSM, PSFM, MSA conservation, sequence length, solvent accessibility, SS XGBoost 2018
Liu et al. [96] Seqs Amino acids encoding, affinities GNN 2020 ×
Plnet [95] Strs Geometry, electrostatics, hydrophobicity Geometric DL 2022
BIPSPI+ [53] Seqs/Strs/Seqs+Strs One-hot encoding, PSSM, PSFM, MSA conservation, sequence length, solvent accessibility, SS XGBoost 2022

Note: Seqs and Strs correspond to protein sequences and protein structures.

PAIR-res predictors extract features such as electrostatics, hydrophobicity, solvent accessibility, SS, position-specific scoring matrix (PSSM), etc. based on sequence or structural information. One-hot encoding and PSSM are the two most frequently utilized profiles. PSSM captures evolutionary information and contains crucial information about amino acid occurrence frequency and variation. It is generated through PSI-BLAST [89] that searches in corresponding databases such as Non-Redundant Protein Sequence Database (NR), NRDB90 [98], Swiss-Prot [99], etc. For a protein sequence of length L, the PSSM of the protein is a matrix with a shape of L × 20.

graphic file with name DmEquation1.gif (1)

where Inline graphic represents the likelihood of the presence of the j amino acid at the i-th position in the protein sequence.

In addition, we note that both machine learning and DL models are prevalent in predicting interaction residues. For instance, BIPSPI utilizes XGBoost, while Plnet and the predictor proposed by Liu et al. leverage graph neural networks (GNN) and geometric DL, respectively.

Furthermore, we examine the availability of source code or web servers of these four methods (Supplementary Table S1). BIPSPI and BIPSPI+ offer web servers and source code, both of which are still working well. Besides, Plnet provides source code. We collected the result of available predictors on DBv5 dataset. As shown in Supplementary Table S2, BIPSPI+ outperforms BIPSPI and Plnet (AUC of 82.3% [52] and 75.3% [95], respectively), achieving the best AUC of 84.8% [53].

SINGLE-res predictors

Unlike PAIR-pro and PAIR-res predictors, SINGLE-res predictors [22, 54, 55, 100–125] accept a single protein as input and infer whether residues in protein sequences interact with other proteins. Table 3 summarizes the main information of SINGLE-res predictors and shows that 19 predictors utilize protein sequences as inputs, while 11 predictors accept protein structures. With the advancement of high-throughput sequencing technologies, protein sequence data are increasing rapidly. Consequently, a large number of sequence-based predictors have been proposed over the past 15 years. Currently, structure-based predictors are also proliferating, which thanks to the advancements in protein structure determination techniques and the emergence of high-quality predictors of protein structures such as AlphaFold [93]. The datasets of SINGLE-res predictors are divided into two classes: structure-annotated proteins and disorder-annotated proteins. Structure-annotated proteins are usually derived from complexes within PDB, while disorder-annotated proteins originate from DisProt [19].

Table 3.

Summary of SINGLE-res predictors in terms of inputs, feature profiles, models and availability

Predictor Inputs Feature profiles Models Year Availability
Fariselli et al. [100] Strs HSSP ANN 2002 ×
Ofran et al. [101] Strs HSSP ANN 2003 ×
ODA [102] Strs Solvent-accessible surface Surface analysis 2005 ×
Burgoyne et al. [103] Strs Hydrophobicity, desolvation, electrostatics, conservative profiles Surfaces clefts analysis 2006 ×
SPPIDER [104] Strs Hydrophobicity, number of contacts, PSSM, amino acid frequencies, entropies, charge, size of side chain, hydrophobicity, the level of surface exposure, the number and distances between surface, the difference between the predicted and observed in an unbound structure surface exposure of an amino acid residue ML 2007
ISIS [105] Seqs Evolutionary profiles, solvent accessibility, SS ANN 2007 ×
ANCHOR [106] Seqs Amino acids encoding, disorder regions Pairwise energy estimation 2009
Raf–Ras [107] Seqs SS, average hydrophobicity, accessible surface area, average depth index, average protrusion index, minimal protrusion index, maximal protrusion index, maximal depth index RF 2009 ×
PSIVER [108] Seqs PSSM, relative solvent accessibility Naïve Bayes classifier 2010
SPRINGS [109] Seqs PSSM, hydropathy, relative solvent accessibility ANN 2014 ×
LORIS [110] Seqs PSSM, hydropathy, relative solvent accessibility L1-regularized LR 2014 ×
CRF_PPI [111] Seqs PSSM, averaged cumulative hydropathy, relative solvent accessibility RF 2015
DC-RF-RUS-RF [112] Seqs PSSM, averaged cumulative hydropathy, relative solvent accessibility RF 2016
SSWRF [113] Seqs PSSM, averaged cumulative hydropathy, averaged cumulative relative solvent accessibility RF, SVM 2016
RF_PPI [114] Seqs Sequence entropy, sequence specificity score, sequence length and HSP length, backbone dynamics, accessibility, SS RF 2017
DeepSite [115] Strs Hydrophobic, aromatic, hydrogen bond acceptor or donor, positive or negative ionizable and metallic CNN 2017
EL-SMURF [116] Seqs PSSM-SPF, RER RF 2019
SCRIBER [117] Seqs Putative protein-binding intrinsically disordered regions, SS, aliphaticity, aromaticity, acidity and size, relative solvent accessibility, evolutionary conservation, relative amino acid propensity LR 2019
SASNet [118] Strs Atom encoding CNN 2019
DLPred [119] Seqs PSSM, physical properties (a steric parameter, polarizability, volume, hydrophobicity, isoelectric point, helix probability, sheet probability), hydrophobicity scales, physicochemical characteristics (the number of atoms, electrostatic charges and potential hydrogen bonds), PKx, 3D-1D scores, conservation score, one-hot encoding LSTM 2019 ×
Deng et al. [120] Seqs Residue space sequence, sequence information entropy, relative entropy, residue sequence weight,and residue conservative fraction XGBoost 2020 ×
DeepPPISP [22] Seqs PSSM,SS, sequence CNN 2020
ProNA2020 [54] Seqs SS, predicted relative solvent accessibility, bio-physical properties of amino acids Homology, SVM, ProtVec, ANN 2020
MaSIF-site [121] Strs Solvent excluded surface, shape index, distance-dependent curvature, hydropathy index, continuum electrostatics, the location of free electrons and proton donors CNN 2020
GraphPPIS [1, 122] Strs PSSM, HMM, DSSP GNN 2021
DELPHI [55] Seqs High-scoring segment pair, 3-mer amino acid embedding, position information, PSSM, evolutionary conservation, putative relative solvent accessibility, relative amino acid propensity, putative protein-binding disorder, hydropathy index, physicochemical characteristics, physical properties, PKx CNN, RNN 2021
EGRET [122] Strs Distance and relative orientation between the residues, protein language model Transfer learning 2022
ScanNet [123] Strs Amino acid encoding, SS, accessible surface area, surface convexity and evolutionary conservation DL 2022
ProB-Site [124] Seqs SS, HMM, PSSM CNN 2022
hybridPBRpred [125] Seqs SCRIBER [117], DisoRDPbind [126] Fusion 2022

Note: Seqs and Strs correspond to protein sequences and protein structures.

Similar to PAIR-pro and PAIR-res predictors, SINGLE-res predictors also utilize amino acid binary encodings, physicochemical profiles, structural features and evolutionary information. Furthermore, SINGLE-res predictors employ more extensive protein feature profiles to predict PPI residues. For example, DELPHI [55] uses 3-mer amino acid encoding to characterize proteins. Besides, some predictors [54, 55, 103, 104, 107, 111–113, 115, 117, 119] incorporate a broader range of physicochemical profiles, such as desolvation energy, aliphaticity, aromaticity, acidity and so on [126]. In summary, >66% of predictors use physicochemical profiles to improve predictions. For evolutionary information, the predictors proposed by Fariselli et al. [100] and Ofran et al. [101] utilize features from the homology-derived structures of proteins (HSSP) database [127]. The HSSP database integrates information from one-dimensional sequences and three-dimensional protein structures and aligns each protein with known 3D structures in the PDB with all its probable sequence homologs. HSSP is valuable for analyzing residue conservation in structures and for studying protein evolution and folding. The Hidden Markov Model (HMM) is another evolutionary information widely used in predicting PPI residues, as seen in GraphPPIS [1] and ProB-Site [124]. For a protein sequence of length L, the HMM matrix is shaped as L × 30, where each residue is represented by a 30-dimensional feature vector. HMM matrix is typically normalized using the following formula:

graphic file with name DmEquation2.gif (2)

In contrast to previous works using PSSM, EL-SMURF [116] adopts a different approach. It utilizes the fusion of sequence profile feature in PSSM (PSSM-SPF) and residue evolution rate (RER) to extract features of neighboring residues with a sliding window. It is worth noting that EGRET [122] generates protein embeddings using the self-supervised protein language model ProtTrans [128]. In fact, feature profiles generated by protein language models have some advantages over traditional ones. First, protein language models enable faster feature generation compared with some traditional evolutionary features like PSSM and HMM, which require extensive time for matching. Second, protein language models can generate features for all target proteins, while traditional feature generation tools like PSI-BLAST sometimes fail to match rare proteins and generate feature profiles, which is noticeable when searching small databases like Swiss-Prot. Similar to PAIR-pro and PAIR-res, most SINGLE-res predictors utilize machine learning models at an early stage. With the development of protein databases and DL technology, DL-based predictors are increasingly available. In the past 5 years, >64% of predictors are based on DL, and their model architectures are becoming more and more complex.

Interestingly, cross-prediction rate (CPR) [117] is used to evaluate cross-predictions of other types of interaction residues into protein interaction residues, quantifying the extent to which the model confuses different types of interaction residues. Based on CPR, the area under cross-prediction curve (AUCPC) is also introduced, where the cross-prediction curve is a relation of CPR against recall. Besides, given the imbalance in the dataset, SCRIBER [117] also quantifies the area under the low false positive rate ROC curve (AULC) and normalizes it by dividing the AULC value of the target predictor by the AULC value of the random predictor. Furthermore, we assessed the availability of SINGLE-res predictors (Supplementary Table S1). Out of 30 predictors, 20 predictors are now available to users. Four methods offer user-friendly web platforms, nine provide source code and seven provide both web servers and source code. Besides, on Test_60 dataset [124], we assessed the performance of available predictors published in 2020 and beyond, including DeepPPISP [22], ProNA2020 [54], MaSIF-site [121], GraphPPIS [122], DELPHI [55], ScanNet [123], ProB-Site [124] and hybridPBRpred [125]. As shown in Supplementary Table S2, ProB-Site achieves the best performance with an AUC of 84.4% [124].

PROTEIN–NUCLEIC ACID INTERACTION

We survey 77 protein–nucleic acid interaction predictors and classify them into two categories: (i) protein-level [129, 130] and (ii) residue-level [24, 35], as illustrated in Figure 4. In contrast to PPI predictors, the majority of protein-level predictors for protein–nucleic acid interactions primarily focus on determining whether proteins interact with DNA or RNA, without providing annotations of the specific interaction partners. Therefore, our investigation into protein-level predictors mainly centers around the latest methods proposed since 2018.

Figure 4.

Figure 4

The distribution of protein–nucleic acid interaction predictors at different prediction levels (protein-level, residue-level, both) and for different ligand types (DNA, RNA, both).

A total of 14 protein-level predictors for protein–nucleic acid interactions are investigated, comprising 7 protein–DNA interaction predictors [129–135], 6 protein–RNA interaction predictors [136–141] and 1 predictor for protein–nucleic acid interactions [142], as illustrated in Figure 5. We conduct a comprehensive analysis of the protein feature profiles and availability.

Figure 5.

Figure 5

Predictors of protein–nucleic acid interactions at the protein-level. Protein–RNA interaction predictors are in the left circle, protein–DNA interaction predictors are in the right circle and protein–nucleic acid interaction predictors are in the middle cross section.

PSSM is the most commonly used feature profile among the available options. Notably, iDRBP_MMC [129] and DeepDRBP-2L [130] only employ PSSM as input features for predictions. IDRBP-PPCT [142] combines PSSM with the position-specific frequency matrix (PSFM). PSFM which is also derived from PSI-BLAST, contains protein evolutionary information, indicating the occurrence frequency of residues at specific positions in the protein sequence. Unlike PSSM, PSFM does not account for the mutation probability of residues. Studies have indicated that interaction residues of proteins tend to be evolutionarily conserved [84, 143], which is why evolutionarily conserved features like PSSM are widely used in predicting protein–nucleic acid interactions. Additionally, various PSSM-based feature profiles are employed in protein–nucleic acid interactions. For instance, DBP-DeepCNN [134] reduces PSSM complexity and generates global patterns, using discrete wavelet transform (DWT) for de-noising to create a new protein feature profile, R-PSSM-DWT. PlDBPred [135] considers 10 different PSSM-based feature profiles, including PSSMBLOCK, AADP-PSSM, PSSM-DWT and so on. On the other hand, RBP-TSTL [141] also uses the protein language model to generate embeddings. For availability, four out of seven protein–DNA interaction predictors, five out of six protein–RNA interaction predictors and the predictor for protein–nucleic acid interactions are available to users (Supplementary Table S1).

Additionally, we have summarized 63 methods for predicting protein–nucleic acid interactions at residue-level, which are shown in Table 4 [144]. This comprehensive summary considers various aspects such as ligands, inputs, feature profiles, models and availability. For ligands, we observe that 24 methods are exclusively designed for predicting protein–DNA interaction residues [39, 132, 145–166], while 26 methods focus solely on predicting protein–RNA interaction residues [24, 36, 40, 167–189]. Furthermore, 13 methods are capable of identifying both protein–DNA and protein–RNA interaction residues [3–5, 35, 54, 126, 190–196]. It is worth noting that BindN [190] is the earliest method to provide simultaneous predictions for both types of residues, which was published in 2006. Interestingly, among the eight methods proposed since 2021, five of them offer predictions for both DNA-interaction and RNA-interaction residues. Furthermore, the predictor proposed by Wang et al. [157] predicts DNA-type-specific interaction residues such as single-stranded DNA and double-stranded DNA. DNAgenie [166] takes into account A-DNA (common double-stranded DNA subtype), B-DNA (the most abundant double-stranded DNA conformation) and single-stranded DNA. It is an innovative research trend to predict multiple interaction residues and fine-grained specific interaction residues.

Table 4.

Summary of residue-level protein–nucleic acid interaction predictors in terms of ligands, inputs, feature profiles, models and availability

Ligands Predictor Inputs Feature profiles Models Year Availability
AAE EVO SS RSA PC
DNA DBS-pred [145] Seqs × × × × ML (NN) 2004 ×
DBS-PSSM [146] Seqs × × × × ML (NN) 2005 ×
DNABindR [147] Seqs × × × × ML (Naïve Bayes) 2006 ×
Ho et al. [148] Seqs × × × × ML (SVM) 2007 ×
DP-Bind [149] Seqs × × × ML (SVM, KLP, PLR) 2007
DISIS [150] Seqs × × ML (SVM, NN) 2007 ×
BindN-RF [151] Seqs × × × ML (RF) 2009 ×
DBindR [152] Seqs × ML (RF) 2009 ×
DBD-Threader [39] Seqs × × × × × Homology 2009 ×
DNABR [153] Seqs × × ML (RF) 2012 ×
Dey et al. [154] Seqs × × × ML (SVM) 2012 ×
DNABind [155] Seqs/Strs × ML (SVM), Homology 2013
SPOT-Seq (DNA) [156] Seqs × × × × × Homology 2014 ×
Wang et al. [157] Seqs × × ML (SVM) 2014 ×
PDNAsite [158] Seqs ML (LSA, SVM, ensemble learning) 2016 ×
Local-DPP [159] Seqs × × × × ML (RF) 2017 ×
TargetDNA [160] Seqs × × × ML (SVM) 2017
StackDPPred [132] Seqs × × × × ML(Stacking) 2019
DNAPred [161] Seqs × × ML (SVM, AdaBoost) 2019
iProDNA-CapsNet [162] Seqs × × × × DL (capsule neural network) 2019
EL_LSTM [163] Seqs DL (LSTM, bagging, ensemble learning) 2020 ×
TargetDBP [164] Seqs × × × ML(SVM) 2020
funDNApred [165] Seqs × × × ML(FCM) 2020 ×
DNAgenie [166] Seqs × ML 2021
RNA Jeong et al. [167] Seqs × × × ML (NN) 2004 ×
Jeong et al. [168] Seqs × × × × ML (NN) 2006 ×
RNABindR [169] Seqs × × × × ML (Naïve Bayes) 2007 ×
PRINTR [170] Seqs × × ML (SVM) 2008 ×
RISP [171] Seqs × × × × ML (SVM) 2008 ×
Pprint [172] Seqs × × × × ML (SVM) 2008 ×
RNAProB [173] Seqs × × × × ML (SVM) 2008 ×
PiRaNhA [174] Seqs × × ML (SVM) 2010 ×
ProteRNA [175] Seqs × × × ML (SVM) 2010 ×
RBRpred [176] Seqs × ML (SVM) 2010 ×
PRNA [177] Seqs × ML (RF) 2010 ×
OPRA [178] Strs × × × × Computational 2010 ×
Wang et al. [179] Seqs × × ML (SVM) 2011 ×
PRBR [180] Seqs × × ML (RF) 2011 ×
SPOT-Seq [40] Seqs × × × × × Homology 2011 ×
Choi et al. [181] Seqs × × × ML (SVM) 2011 ×
RNABindRPlus [182] Seqs × × × × ML (SVM, LR), Homology 2014 ×
DR_bind1 [183] Seqs × × Computational 2014
aaRNA [184] Seqs/Strs ML (NN) 2014 ×
Ren et al. [185] Strs × × × × ML (ensemble learning) 2015 ×
PRIdictor [186] Seqs × × × ML (SVM) 2016 ×
RNAProSite [187] Strs × ML (RF) 2016 ×
PredRBR [188] Strs × ML (GTB) 2017 ×
RPI-Bind [189] Strs × × ML (RF, SVM, NN) 2017 ×
PST-PRNA [36] Strs × DL 2022
HybridRNAbind [24] Seqs ML(RF) 2023
DNA and RNA BindN [190] Seqs × × × × ML (SVM) 2006 ×
BindN+ [191] Seqs × × × ML (SVM) 2010 ×
NAPS [192] Seqs × × ML (C4.5) 2010 ×
SNBRFinder [193] Seqs × ML (SVM), Homology 2015 ×
DisoRDPbind [126] Seqs × × ML (LR) 2015
DRNApred [4] Seqs ML (LR) 2017
NucBind [194] Seqs × × × ML(SVM), Homology 2019
ProNA2020 [54] Seqs × × ML (SVM, ANN, ProtVec), Homology 2020
NCBRPred [195] Seqs × × DL 2021
MTDsite [196] Seqs × × × DL 2021
GraphBind [5] Strs × × DL(GNN) 2021
DeepDISOBind [35] Seqs × × × DL 2022
iDRNA-ITF [3] Seqs × × DL 2022

Note: Seqs and Strs correspond to protein sequences and protein structures.

AAE, PC, RSA, SS and EVO correspond to amino acid binary encodings, physicochemical profiles, relative solvent accessibility, SS and evolutionary information.

Based on the inputs, we divide the predictors into sequence-based [24] and structure-based [5] predictors. The inputs of 54 predictors are protein sequences, while only 7 predictors infer interaction residues based on protein structures. Furthermore, two predictors [155, 184] simultaneously support both protein sequences and protein structures. The scarcity of structure-based predictors can be attributed to the limited quantity of high-quality protein structure data in prior years. In recent years, the development of protein structure determination techniques and predictors of protein structures such as AlphaFold has provided structure information for a wider range of proteins and led to the development of structure-based predictors.

Then, we find that the datasets of protein–nucleic acid interaction residues are mainly from PDB and BioLip and are generally unbalanced, with positive samples less than negative samples. A recent representative dataset is the one organized by GraphBind [5], which is collected from the BioLip database. Due to the small number of interaction residues, GraphBind applies data augmentation on the dataset, uses bl2seq [197] and TM-align [198] to evaluate the sequence identity and structural similarity between proteins, and then clusters them. The annotation of a protein in the same cluster is transferred to the protein with the largest number of residues to increase the number of interaction residues.

According to the investigation, similar to the previous section, we divide all protein feature profiles into the following five categories: (i) amino acid binary encodings; (ii) physicochemical profiles; (iii) relative solvent accessibility; (iv) SS; (v) evolutionary information, as illustrated in Figure 6. We have observed that almost all the predictors utilize at least one kind of the above features. Evolutionary information is the most popular feature profile, which mainly includes PSSM, HMM, other PSSM-based features and so on. Among these evolutionary features, PSSM is the most representative. Half of the predictors apply physicochemical profiles which are the second commonly used profiles. Besides, tools like ASAquick [199], ACCpro [200], and others are capable of generating the relative solvent accessibility, which is typically presented as an L*1 matrix for a protein of length L and used by 30 predictors. SS is used by 44.4% of predictors, which is mainly generated by PSIPRED and DSSP. Finally, 15 predictors utilize amino acid binary encodings, which are the most direct and easiest to obtain. The most commonly used amino acid binary encodings, described in detail in the PPI section, is to represent each residue with a 20-dimensional vector using one-hot encoding.

Figure 6.

Figure 6

The distribution of features profiles used by predictors of residue-level protein–nucleic acid interactions.

Furthermore, we summarize the models used in 63 predictors, as shown in Table 4. The majority of these predictors (50/63) employ machine learning models, whereas six out of eight predictors proposed after 2021 utilize DL models. It can be seen that DL models have become a research trend for predicting protein–nucleic acid interaction residues. It is worth noting that some protein–nucleic acid interaction residue predictors are based on homology, such as DBD-Threader [39], SPOT-Seq [40], SNBRFinder [193], NucBind [194] and so on. Homology-based methods match the target protein with annotated proteins in the library, and transfer the annotation of similar proteins into the target protein. In addition, GraphBind [5] represents protein structure data and feature profiles as graphs and mines them using GNN, which provides new insights for subsequent related studies. In addition, we find that MCC is the most popular comprehensive assessment criterion and AUC is the most commonly used criterion to evaluate the prediction propensities. Besides, DNAgenie [166] and HybridRNAbind [24] utilize over-prediction rate (OPR) and the area under the over-prediction curve (AUOPC) to evaluate whether a predictor tends to predict residues that don’t interact with ligands as nucleic acid interaction residues.

Finally, in terms of availability, 50 of the 63 predictors provide web servers or source code at the time of publication, and 20 are currently available (Supplementary Table S1). Most early-published predictors are not accessible to users. Among the 31 predictors proposed before 2013, only DP-Bind [149] is still working. Luckily, all eight predictors proposed in 2021 and later can be obtained. For DNA interaction residue predictions, we recommend DNAgenie [166] for fine-grained predictions and DeepDISOBind [35] (AUC of 73.6% [35]) for predictions involving intrinsically disordered proteins compared with DisoRDPbind [126] (AUC of 67.1% [35]). Besides, we assessed the performance of ProNA2020 [54], NCBRPred [195], GraphBind [5] and iDRNA-ITF [3] on DNA-129_Test [3, 5]. iDRNA-ITF achieves the highest AUC of 88.3%, followed by GraphBind (85.5%) and NCBRPred (82.3%) [3]. On the other hand, for RNA interaction residue predictions, we evaluated eight predictors on the test dataset of HybridRNAbind [24], including the structure-annotated and the disorder-annotated proteins. HybridRNAbind is the best-preforming predictor with an AUC of 73.8% [24], followed by iDRNA-ITF (70.0%). Details are shown in Supplementary Table S2.

PROTEIN–PEPTIDE INTERACTION

We investigate nine predictors of protein–peptide interactions, as shown in Table 5. Six protein–peptide interaction predictors accept only proteins as inputs and identify interaction residues in proteins without focusing on specific peptide partners. Specifically, three predictors (SPRINT-Seq [23], PepBind [201] and PepBCL [6]) accept protein sequences as inputs and three predictors (SPRINT-Str [202], PEPSITE [203] and PeptiMap [204]) are based on protein structures. In particular, InterPep [44] and PepNN [43] accept both protein and peptide to make residue-level predictions. It is also worth noting that CAMP [205] takes both protein sequences and peptide sequences as inputs, not only to infer interactions between peptides and proteins, but also to identify their interaction residues in peptides. For datasets, we find that most of the datasets mainly come from PEPSITE, SPRINT-Seq and SPRINT-Str. PDB, BioLiP and DrugBank [206] are the three most commonly used databases.

Table 5.

Summary of protein–peptide interaction predictors in terms of inputs, feature profiles, models and availability

Predictor Inputs Feature profiles Models Year Availability
PEPSITE [203] Strs S-PSSMs Computational algorithms 2009 ×
PeptiMap [204] Strs Functional unit classification, receptor classification (CATH, MSA) Fragment mapping 2013 ×
SPRINT-Seq [23] Seqs One-hot encoding, PSSM, accessible surface area, SS, steric parameter, hydrophobicity, volume, polariz-ability, isoelectric point, helix probability, sheet probability SVM 2016 ×
SPRINT-Str [202] Strs PSSM, SS, amino acid encoding, half sphere exposure, flexibility RF 2018 ×
PepBind [201] Seqs Intrinsic disorder-based features, SS, PSSM, HMM, TM-SITE [207], S-SITE [207] SVM 2018
InterPep [44] Protein Strs+
Peptide Seqs
TM-align (length, quality), amino acid composition distance, SS, surface information, template peptide information, model information Homology, cluster, RF 2019
CAMP [205] Protein Seqs
+Peptide Seqs
Residue-level structural and physicochemical properties, PSSM, intrinsic disorder tendencies CNN, attention 2021
PepNN [43] Protein Seqs/strs
+Peptide Seqs
Graph representation (amino acid one-hot encoding, residue distance, rotation, residue relative position, torsional backbone angle), protein language model DNN, transfer learning 2022
PepBCL [6] Seqs Protein language model Contrastive learning 2022

Note: Seqs and Strs correspond to protein sequences and protein structures.

Because peptides and proteins have many similarities in composition and properties, some feature profiles are also widely used by protein–peptide interaction predictors, such as amino acid binary encodings, evolutionary information (PSSM, HMM, MSA), physicochemical profiles (hydrophobicity), structural features (SS, solvent accessibility). At the same time, some predictors use spatial position specific scoring matrices (S-PSSMs) [203] and template peptide information [44]. In particular, PepNN [43] and PepBCL [6] obtain sequence embedding features by using the protein language model instead of the traditional protein features.

These methods differ in their respective models. Similar to the predictors in previous sections, most methods also employ machine learning to simulate protein–peptide interaction residues. However, PEPSITE [203] and PeptiMap [204] introduce novel algorithms for protein–peptide interaction predictions. PEPSITE uses S-PSSM to infer interactions from known protein–peptide complexes, while PeptiMap computes interaction residues based on fragment mapping. Additionally, PepBind [201] presents a new consensus-based method that combines SVMpep [201] with two homology-based predictors, S-SITE and TM-SITE [207]. In addition, we assess the availability of their web servers and source code, as shown in Supplementary Table S1. Regrettably, only two web servers (PepBind and PepBCL) are currently available, and only four predictors (InterPep, CAMP, PepNN and PepBCL) provide source code for local deployment. Finally, we assessed the performance of available predictors (Supplementary Table S2). According to inputs and motivation, these predictors are classified into three groups: (i) PepBind and PepBCL, which take proteins as inputs and infer interaction residues in proteins, (ii) InterPep and PepNN, which use proteins and peptides as inputs and predict interaction residues in proteins, and (iii) CAMP, to make protein-level predictions or to infer interaction residues in peptides. As reported in PepBCL, PepBCL achieves an AUC of 84.1% on TE125 dataset compared with PepBind (AUC of 79.3%) [6]. The result in PepNN shows that the AUC of PepNN is 83.3%, which is 4% higher than InterPep [43].

PROTEIN–OTHER LIGANDS INTERACTION

Proteins also interact with a variety of other ligands, such as nucleotides, heme, ions, etc. We conduct investigations on them and collect 33 protein–other ligands interaction predictors. It is noteworthy that 32 methods are residue-level predictors [5, 7–10, 45–51, 208–227], aiming to predict the ligand-interaction residues in proteins. In contrast, mebipred [25] is the only protein-level predictor, which can identify whether the protein can interact with Ca, Co, Cu, Fe, K, Mg, Mn, Na, Ni and Zn. Three sequence-derived features are used by the multi-layer perceptron-based predictor, including amino acid composition, physicochemical properties and a count of the metal-binding amino acid 5mers [25]. In this section, we focus on residue-level predictors and the main information of them is summarized in Table 6. According to ligands, residue-level protein–other ligands interaction predictors can be mainly divided into three categories: protein-nucleotide (including ATP, ADP, AMP, GTP, GDP, etc.) interaction residue predictors [8, 45, 46], protein-heme interaction residue predictors [9, 47, 48] and protein-ion (Ca, Mg, Mn, etc.) interaction residue predictors [49–51]. As a result, there are 16 predictors for protein–nucleotide interactions, 7 predictors for protein-heme interactions and 17 predictors for protein-ion interactions, as illustrated in Figure 7. It is worth noting that TargetS [7], TargetCom [219], DELIA [225] and GraphBind [5] are capable of providing simultaneous predictions of protein–nucleotide, protein−heme and protein−ion interaction residues. In the following, protein–other ligands interaction residue predictors will be summarized according to the categories.

Table 6.

Summary of protein–others interaction predictors in terms of ligands, inputs, feature profiles, models and availability

Predictor Ligands Inputs Feature profiles Models Year Availability
CHED [208] Co, Cu, Fe, Mn, Ni, Zn Strs Geometric search ML (SVM), decision tree 2008 ×
ATPint [209] ATP Seqs PSSM, Hydrophobicity, Beta-Sheet, Polarity, Solvation potential, Residue interface propensities, Net charge, Average accessible surface area ML(SVM) 2009
GTPBinder [210] GTP Seqs One-hot, PSSM ML(SVM) 2010
FINDSITE-metal [211] Ca, Co, Cu, Fe, Mg, Mn, Ni, Zn Strs Protein structure modeling ML(SVM), Homology 2011 ×
Firoz et al. [212] AMP, ADP, ATP, GMP, GDP, GTP Seqs PSSM ML(SVM) 2011 ×
HemeBIND [47] HEME Strs PSSM, relative accessible surface area (RASA), Depth index (DPX), Protrusion index (CX) ML(SVM) 2011 ×
hemeNet [48] HEME Seqs/Strs Structural analysis of heme proteins: implications for design and prediction. PSSM, RASA, depth index (DPX), protrusion index (CX). ML(SVM) 2011
pfinder [213] Phosphate Strs Position with respect to the solvent accessible surface, clefts on the surface of the protein. Homology 2011 ×
MetalDetector [214] Ions Seqs HMM ML (SVM, NN) 2011 ×
ATPsite [215] ATP Seqs PSSM, SS, RSA, conservation scores, amino acid (AA) groups, dihedral angle, Terminal indicator, SS segment indicator for helix/ strand/ coil, Collocation of AA pairs ML(SVM) 2011 ×
NsitePred [45] ATP, ADP, AMP, GTP, GDP Seqs Sequence, SS, RSA, dihedral angles, PSSM, Terminus indicator, SS segment indicators for helix/strand/coil, Residue conservation scores, collocation of significant AA pairs ML(SVM) 2012
TargetS [7] ATP, ADP, AMP, GDP, GTP, Ca, Mn, Mg, Fe, Zn, HEME Seqs PSSM, SS, Ligand-Specific Binding Propensity Feature Ensemble method (SVM, AdaBoost), cluster 2013 ×
TargetATP [216] ATP Seqs PSSM, SS Ensemble method (SVM, AdaBoost) 2013
TargetATPsite [217] ATP Seqs PSSM, Sparse representation of evolution image. Ensemble method (ATP, AdaBoost) 2013 ×
TargetSOS [8] ATP, ADP, AMP, GTP, GDP Seqs PSSM, SS ML(SVM, Supervised Over-sampling) 2014
mFASD [218] Ca, Cu, Fe, Mg, Mn, Zn Strs Functional atoms、local chemical environment、distance between two functional atoms、distance between two functional atom sets Novel computational algorithms 2015 ×
TargetCom [219] Cu, Fe, Zn, (SO4)2−, (PO4)3−, ATP, FMN, HEME Seqs PSSM, SS, RSA, torsion angles, Conservation scores, COFACTOR [232], TM-SITE [207], S-SITE [207] and COACH [207] Ensemble method (SVM, AdaBoost) 2016 ×
IonCom [220] Zn, Cu, Fe, Ca, Mg, Mn, Na, K, (NO2), (CO3)2−, (SO4)2−, (PO4)3− Seqs/Strs PSSM, SS, RSA, backbone torsion angles, position and segment specific conservation scores, ligand-specific binding propensity. COFACTOR [232], TM-SITE [207], S-SITE [207] and COACH [207] Ensemble method (SVM, AdaBoost) 2016
TargetNUCs [221] ATP, ADP, AMP, GTP, GDP Seqs PSSM, SS Ensemble method 2016
ATPbind [222] ATP Strs PSSM, SS, solvent accessibility, TM-SITE [207], S-SITE [207] Ensemble method (SVM) 2018
Wang et al. [223] Zn, Cu, Fe, Ca, Mg, Mn, Na, K, Co, (NO2), (CO3)2−, (SO4)2−, (PO4)3− Seqs Component information, position conservation information, hydropathy, polarization charge, SS, relative solvent accessibility SMO 2019 ×
Liu et al. [224] (NO2), (CO3)2−, (SO4)2−, (PO4)3− Seqs The composition information of amino acid, polarization charge, hydrophilic-hydrophobic, SS and relative solvent availability KNN 2019 ×
SeqD-HBM [9] HEME Seqs Net charge, solvent accessibility Stepwise validation 2019 ×
DELIA [225] Ca, Mn, Mg, ATP, HEME Strs PSSM, SS, HMM, RSA, S-SITE-based feature, structure-based distance matrix DL 2020
PBSP [226] Phosphate Strs AutoDockFR [233], AutoSite [234] Energy-based、reverse focused docking 2021
ATPensemble [227] ATP Seqs PSSM, SS, one-hot Ensemble method (CNN, LightGBM), Homology 2021
DeepATPseq [46] ATP Seqs PSFM Ensemble method (CNN. SVM) 2021
GraphBind [5] Ca, Mn, Mg, ATP, HEME Strs Pseudo-positions, atom mass, B-factor, whether it is a residue side-chain atom, electronic charge, the number of hydrogen atoms bonded to it, whether it is in a ring, and the van der Waals radius of the atom, SS, PSSM, HMM DL(GNN) 2021
MetalSiteHunter [50] Ca, Fe, Mg, Mn, Zn, Na Strs 3D voxels, positive_ionizable, hbond_acceptor, occupancies, negative_ionizable and hbond_donor DL (3D CNN) 2022
GASS-Metal [51] Zn, Ca, Mg, Mn, Cu, Fe, Co, Na, K, Cd, Ni Strs Residue position, substitution matrix to handle conservative mutations Homology, genetic algorithms 2022
MIB2 [49] Ca, Cu, Mg, Mn, Zn, Cd, Fe, Ni, Hg, Co, Au, Ba, Pb, Pt, Sm, Sr Seqs/Strs BLOSUM62 substitution matrix, weighted contact number of each metal ion (PS)2 2022
LMetalSite [10] Zn, Ca, Mg, Mn Seqs Protein language model DL 2022

Note: Seqs and Strs correspond to protein sequences and protein structures.

Figure 7.

Figure 7

Protein–other ligands interaction predictors are divided into four categories according to ligand types, with the middle predictors providing protein–nucleotide, heme, ion interaction predictions simultaneously.

Protein–nucleotide interaction

Our investigation reveals that almost 15 out of 16 protein–nucleotide interaction predictors involve protein-adenosine triphosphate (ATP) interactions. This prevalence could be attributed to ATP's biological significance, as it serves as the primary energy source in living organisms. A total of 13 predictors utilize protein sequences as inputs, whereas only GraphBind [5], DELIA [225] and ATPbind [43] are based on protein structures. PDB and BioLiP serve as the primary data sources for these predictors. Several high-quality datasets are commonly used, including those from ATPint [209], ATPsite [215] and ATPbind [43]. An analysis of the protein feature profiles shows that PSSM is used most frequently in protein–nucleotide interaction residue predictors. Further investigation into the models reveals ensemble models (8 out of 15) and SVM (10 out of 15) as the dominant choices. We show that the models integrated in ensemble models are different. For instance, ATPensemble [227] combines a deep CNN with the LightGBM algorithm, while DeepATPseq [46] pairs a deep CNN with SVM. Considering the availability, most provide web servers or source code, and now 11 links remain valid, allowing the prediction of 5 kinds of nucleotides: ATP, ADP, AMP, GTP and GDP. It is a great pity that predictions for other nucleotides such as CMP and PCG are currently inaccessible to users. Finally, we collected the protein-ATP interaction prediction performance of GraphBind, DeepATPseq, ATPensemble and DELIA on PATP-TEST dataset [46]. DeepATPseq achieves the best performance, followed by ATPensemble.

Protein–heme interaction

We investigate seven protein–heme interaction residue predictors: HemeBIND [47], hemeNet [48], TargetS [7] and TargetCom [219], SeqD-HBM [9], DELIA [225] and GraphBind [5]. It is noteworthy that HemeBIND, hemeNet and SeqD-HBM focus exclusively on the predictions of heme interaction residues, while the remaining methods predict other ligands as well. Three predictors derive from protein sequences [7, 9, 219], another three from protein structures [5, 47, 225], with hemeNet [48] providing predictions based on both. Notably, despite the limited protein structure data available in 2011, HemeBIND and hemeNet were developed based on protein structures. Besides, protein–heme interaction residue data usually comes from the BioLiP database. In addition, we summarize the protein feature profiles used by these predictors and learn that in addition to the common profiles such as PSSM and SS, there are also some special features used by these predictors. For example, HemeBIND and hemeNet use the Depth index (DPX) [228] and Protrusion index (CX) [229] features. DPX is the distance between the target atom and its nearest solvent, which is generated by the PSAIA tool [230]. The CX feature represents the degree to which the atom protrudes from the surface of the protein and the composition of CX features is similar to that of DPX. In terms of models, the above seven predictors mainly use SVM, ensemble model and DL model. In particular, SeqD-HBM is based on stepwise validation using the knowledge gained from in-depth spectroscopic studies on heme-peptide complexes. Unfortunately, only three of the above seven predictors are still available, and hemeNet is the only currently available method that specifically focuses on predicting heme interaction residues, which was proposed in 2011. For recently published and available predictors, as reported in GraphBind, GraphBind achieves an AUC of 96.2%, higher than DELIA (95.1%) [5].

Protein–ion interaction

An examination of 17 predictors associated with protein–ion interaction residues reveals that most predictors identify interaction residues with Mn, followed by iron ions, Ca, Mg and Zn. It is important to highlight that different predictor offers diverse insights into ion ligands. For instance, GASS-Metal [51] and MIB2 [49] differentiate between Fe2+ and Fe3+ ions, whereas mFASD [218] and MetalSiteHunter [50] do not. MetalDetector [214] opts not to make fine-grained predictions of protein-metal ions and treats all metal ions as a single class. Interestingly, two predictors, pfinder [213] (proposed in 2011) and PBSP [226] (proposed in 2021), exclusively predict phosphates. Moreover, unlike previously mentioned protein–ligand interaction residue predictors and other protein–ion interaction residue predictors that focus on all types of residues, CHED [208] predicts interaction residues of four types: Cys, His, Glu and Asp. Similar to CHED, MetalDetector identifies Cys and His involved in protein–metal interactions. With regard to inputs, the majority of these methods (11 out of 17) are based on protein structures, with six predictors using protein sequences. Additionally, an investigation into the datasets reveals that protein–ion interaction residue datasets are generally unbalanced and mainly derive from the PDB, BioLip and MetalPDB [231] databases. Notably, datasets corresponding to different ions vary substantially in size. For instance, IonCom [220] uses datasets comprising 379 interaction proteins and 1778 interaction residues for Mn, in contrast to 53 interaction proteins and 536 interaction residues for K. Furthermore, given the prevalence of predictors based on protein structures, the protein feature profiles vary significantly [207, 232–234] . For instance, mFASD incorporates four protein feature profiles: functional atoms, local chemical environment, the distance between two functional atoms and the distance between two functional atoms sets, while MetalSiteHunter employs 3D volume cubes (voxels) features. Notably, similar to previous sections, protein language models are also deployed in this field. LMetalSite [10] uses ProtTrans to produce protein embeddings and generate further predictions. In contrast, RBP-TSTL [141] employs the ProtT5-XL model, EGRET and PepBCL [6] use the ProtBert model and LMetalSite leverages the PROTT5-XL-U50 model, with each protein language model varying in terms of the number of parameters and training strategies. An overview of the algorithms and models reveals that they utilize a variety of innovative algorithms and models, such as MIB2 using the (PS)2 algorithm [235], GASS-Metal combining the homology method and the gene algorithm, and mFASD implementing a novel structure-based computational method. Regarding availability, eight predictors remain accessible to users, covering all ion types. Finally, we assessed the performance of protein–Mn interaction predictors, including LMetalSite, GraphBind and DELIA. The result in LMetalSite shows that LMetalSite (AUC = 96.6%) outperforms GraphBind (AUC = 93.0%) and DELIA (AUC = 90.2%) [10].

SUMMARY

Prompted by the necessity to decipher protein–ligand interactions on a large scale, we review a comprehensive set of over 160 predictors. These encompass interactions between proteins and a range of ligands including proteins, nucleic acids, peptides, nucleotides, hemes and ions. We have scrutinized these predictors through several pertinent lenses, including inputs, feature profiles, models and availability, among others.

According to our investigation, most predictors identify interactions using protein sequences. Especially, evolutionary information, which is derived from protein sequences, is the most widely employed feature profile (used by 67% of all predictors) in the past decades and PSSM stands out as the most representative among evolutionary information. Besides, pretrained large models based on sequences are becoming a new trend. Embeddings from pretrained large models are on pair with traditional features in quality and speed, greatly promoting the development of feature profiles. On the other hand, with the advancements in protein structure determination techniques and predictors of protein structures, an increasing number of structure-based predictors have also emerged, leading to a new trend which is the development of large-scale pretrained models based on multimodal protein data.

Compared with other technologies, DL can yield more accurate predictions and presents the opportunity to unearth deeper information embedded within proteins. In addition, more predictors focus on protein–protein and protein–nucleic acid interactions, indicating that macromolecular ligands are more of a concern. Furthermore, the majority of methods prioritize residue-level predictions over protein-level ones, showing a clear interest in more granular predictions from the perspective of proteins. Besides, some predictors further identify subtypes of ligands, indicating a trend toward fine-grained predictions from the perspective of ligand subtypes.

The current methods for predicting protein interactions focus mainly on structural proteins and individual proteins. However, these methods show significant shortcomings in the prediction of interactions involving inherently disorder proteins and protein complexes, both of which hold critical importance in biological processes. Inherently disorder proteins play a crucial role in cell signaling and regulation, while protein complexes are instrumental in various biological processes such as metabolism, DNA repair and signal transduction. Moreover, these predictors often overlook dynamics, that is, proteins exhibit different interaction mechanisms in the dynamic cellular environment. Therefore, research into inherently disorder proteins, protein complexes and protein dynamics is poised to become the frontier of future studies. Further exploration in these areas is expected to reveal more profound mechanisms of cellular biology, thereby advancing new strategies in drug discovery and disease treatment.

Despite a variety of ligands considered by current protein interaction predictors, the development of predictors for different ligands varies considerably, with some ligands still lacking sufficient predictors. It is necessary to develop new models to predict protein–ligand interactions for less studied ligands, such as GMP and K ions, and explore interactions with a broader range of ligands. Moreover, we find that while the availability of protein–ligand interaction prediction methods is a concern, many predictors remain inaccessible to researchers. The accessibility of these predictors is paramount for researchers and biologists, and we suggest that future studies should prioritize providing access to predictors and maintaining this accessibility.

Key Points

  • We review over 160 methods for predicting protein–ligand interactions, which focus on protein–protein, protein–nucleic acid, protein–peptide and protein–other ligands (nucleotide, heme, ion) interactions.

  • Our survey covers various types of methods, including protein- and residue-level predictors, as well as structure- and sequence-based predictors.

  • A comprehensive analysis is conducted from several significant perspectives including inputs, feature profiles, models, availability and so on. Evolutionary information, which is derived from protein sequences, is the most widely employed feature profile and PSSM stands out as the most representative among evolutionary information.

  • Finally, the challenges and future development directions are presented.

UNCOMMON ABBREVIATIONS

ELM: extreme learning machine.

L3: network paths of length three

KLR: kernel logistic regression

PLR: penalized logistic regression

LSA: latent semantic analysis

FCM: fuzzy cognitive map

GTB: gradient tree boosting

C4.5: a decision tree algorithm

SMO: sequential minimal optimization

(PS)2: an automatic protein structure prediction server

Supplementary Material

supplementary_data_bbae162

Author Biographies

Pengzhen Jia received the BS degree in computer science from Central South University, Changsha, China, in 2022. Currently, he is working toward the PhD degree in computer science and technology at Central South University, Changsha, China. His current research interests include bioinformatics and protein–ligand interactions.

Fuhao Zhang received the BS degree from the Chongqing University of Posts and Telecommunications, China, in 2014 and PhD degrees from Central South University, China, in 2023. He is currently an associate Professor at College of Information Engineering, Northwest A&F University. His main research interests include bioinformatics and deep learning.

Chaojin Wu received the BS degree in computer science from Central South University, Changsha, China, in 2022. Currently, he is working toward the MS degree in computer science and technology at Central South University, Changsha, China. His current research interests include bioinformatics and protein–ligand interactions.

Min Li received the BS degree in communication engineering and the MS and PhD degrees in computer science from Central South University, Changsha, China, in 2001, 2004 and 2008, respectively. She is currently a Professor at the School of Computer Science and Engineering, Central South University. Her main research interests include bioinformatics and system biology.

Contributor Information

Pengzhen Jia, School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China.

Fuhao Zhang, School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China; College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China.

Chaojin Wu, School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China.

Min Li, School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China.

FUNDING

This work is supported by the National Natural Science Foundation of China under Grant No. (62225209), the Science and Technology Innovation Program of Hunan Province (2021RC0048).

DATA AVAILABILITY

The protein-ligand interaction datasets used in this manuscript are publicly available, and their specific sources are summarized as follows: The dataset used for evaluating sequenced-based PAIR-pro predictors is the virus-human interaction dataset [41, 94]. The dataset used for evaluating SINGLE-res predictors is the Test_60 dataset [124]. The DNA-129_Test dataseet is from publications [3, 5]. The dataset used to evaluate protein-RNA interaction residue predictors is from the publication [24].

References

  • 1. Yuan  Q, Chen  J, Zhao  H, et al.  Structure-aware protein–protein interaction site prediction using deep graph convolutional network. Bioinformatics  2021;38(1):125–32. [DOI] [PubMed] [Google Scholar]
  • 2. Zhang  J, Kurgan  L. Review and comparative assessment of sequence-based predictors of protein-binding residues. Brief Bioinform  2018;19(5):821–37. [DOI] [PubMed] [Google Scholar]
  • 3. Wang  N, Yan  K, Zhang  J, Liu  B. iDRNA-ITF: identifying DNA- and RNA-binding residues in proteins based on induction and transfer framework. Brief Bioinform  2022;23(4):bbac236. [DOI] [PubMed] [Google Scholar]
  • 4. Yan  J, Kurgan  L. DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues. Nucleic Acids Res  2017;45(10):e84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Xia  Y, Xia  CQ, Pan  X, Shen  HB. GraphBind: protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues. Nucleic Acids Res  2021;49(9):e51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Wang  R, Jin  J, Zou  Q, et al.  Predicting protein-peptide binding residues via interpretable deep learning. Bioinformatics  2022;38(13):3351–60. [DOI] [PubMed] [Google Scholar]
  • 7. Yu  DJ, Hu  J, Yang  J, et al.  Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans Comput Biol Bioinform  2013;10(4):994–1008. [DOI] [PubMed] [Google Scholar]
  • 8. Hu  J, He  X, Yu  DJ, et al.  A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction. PloS One  2014;9(9):e107676. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Wißbrock  A, Paul George  AA, Brewitz  HH, et al.  The molecular basis of transient heme-protein interactions: analysis, concept and implementation. Biosci Rep  2019;39(1):BSR20181940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Yuan  Q, Chen  S, Wang  Y, et al.  Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning. Brief Bioinform  2022;23(6):bbac444. [DOI] [PubMed] [Google Scholar]
  • 11. Wells  JA, Mcclendon  CL. Reaching for high-hanging fruit in drug discovery at protein-protein interfaces. Nature  2007;450(7172):1001–9. [DOI] [PubMed] [Google Scholar]
  • 12. De Las Rivas  J, Fontanillo  C. Protein–protein interaction networks: unraveling the wiring of molecular machines within the cell. Brief Funct Genomics  2012;11(6):489–96. [DOI] [PubMed] [Google Scholar]
  • 13. Orii  N, Ganapathiraju  MK. Wiki-pi: a web-server of annotated human protein-protein interactions to aid in discovery of protein function. PloS One  2012;7(11):e49029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Kuzmanov  U, Emili  A. Protein-protein interaction networks: probing disease mechanisms using model systems. Genome Med  2013;5(4):37–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Szklarczyk  D, Gable  AL, Lyon  D, et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res  2019;47(D1):D607–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Calderone  A, Castagnoli  L, Cesareni  G. Mentha: a resource for browsing integrated protein-interaction networks. Nat Methods  2013;10(8):690–1. [DOI] [PubMed] [Google Scholar]
  • 17. Oughtred  R, Stark  C, Breitkreutz  B-J, et al.  The BioGRID interaction database: 2019 update. Nucleic Acids Res  2019;47(D1):D529–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Yang  J, Roy  A, Zhang  Y. BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res  2013;41(Database issue):D1096–103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Quaglia  F, Mészáros  B, Salladini  E, et al.  DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation. Nucleic Acids Res  2022;50(D1):D480–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. wwPDB consortium . Protein data bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res  2019;47(D1):D520–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Uniprot: the universal protein knowledgebase in 2023. Nucleic Acids Res  2023;51(D1):D523–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Zeng  M, Zhang  F, Wu  FX, et al.  Protein-protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics  2020;36(4):1114–20. [DOI] [PubMed] [Google Scholar]
  • 23. Taherzadeh  G, Yang  Y, Zhang  T, et al.  Sequence-based prediction of protein-peptide binding sites using support vector machine. J Comput Chem  2016;37(13):1223–9. [DOI] [PubMed] [Google Scholar]
  • 24. Zhang  F, Li  M, Zhang  J, Kurgan  L. HybridRNAbind: prediction of RNA interacting residues across structure-annotated and disorder-annotated proteins. Nucleic Acids Res  2023;51(5):e25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Aptekmann  AA, Buongiorno  J, Giovannelli  D, et al.  Mebipred: identifying metal-binding potential in protein sequence. Bioinformatics  2022;38(14):3532–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Yuvaraj  N, Srihari  K, Chandragandhi  S, et al.  Analysis of protein-ligand interactions of SARS-Cov-2 against selective drug using deep neural networks. Big Data Min Anal  2021;4(2):76–83. [Google Scholar]
  • 27. Wu  Y, Gao  M, Zeng  M, et al.  BridgeDPI: a novel graph neural network for predicting drug-protein interactions. Bioinformatics  2022;38(9):2571–8. [DOI] [PubMed] [Google Scholar]
  • 28. Li  M, Lu  Z, Wu  Y, Li  YH. BACPI: a bi-directional attention neural network for compound-protein interaction and binding affinity prediction. Bioinformatics  2022;38(7):1995–2002. [DOI] [PubMed] [Google Scholar]
  • 29. Wang  K, Zhou  R, Tang  J, Li  M. GraphscoreDTA: optimized graph neural network for protein-ligand binding affinity prediction. Bioinformatics  2023;39(6):btad340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Wang  K, Li  M. Fusion-based deep learning architecture for detecting drug-target binding affinity using target and drug sequence and structure. IEEE J Biomed Health Inform  2023;27(12):6112–20. [DOI] [PubMed] [Google Scholar]
  • 31. Wang  K, Zhou  R, Li  Y, Li  M. DeepDTAF: a deep learning method to predict protein-ligand binding affinity. Brief Bioinform  2021;22(5):bbab072. [DOI] [PubMed] [Google Scholar]
  • 32. Lei  C, Lu  Z, Wang  M, Li  M. StackCPA: a stacking model for compound-protein binding affinity prediction based on pocket multi-scale features. Comput Biol Med  2023;164:107131. [DOI] [PubMed] [Google Scholar]
  • 33. Wang  M, Kurgan  L, Li  M. A comprehensive assessment and comparison of tools for HLA class I peptide-binding prediction. Brief Bioinform  2023;24(3):bbad150. [DOI] [PubMed] [Google Scholar]
  • 34. Zhang  F, Li  M, Zhang  J, et al.  DeepPRObind: modular deep learner that accurately predicts structure and disorder-annotated protein binding residues. J Mol Biol  2023;435(14):167945. [DOI] [PubMed] [Google Scholar]
  • 35. Zhang  F, Zhao  B, Shi  W, et al.  DeepDISOBind: accurate prediction of RNA-, DNA- and protein-binding intrinsically disordered residues with deep multi-task learning. Brief Bioinform  2022;23(1):bbab521. [DOI] [PubMed] [Google Scholar]
  • 36. Li  P, Liu  ZP. PST-PRNA: prediction of RNA-binding sites using protein surface topography and deep learning. Bioinformatics  2022;38(8):2162–8. [DOI] [PubMed] [Google Scholar]
  • 37. Huang  L, Liao  L, Wu  CH. Evolutionary analysis and interaction prediction for protein-protein interaction network in geometric space. PloS One  2017;12(9):e0183495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Wang  X, Yang  W, Yang  Y, et al.  PPISB: a novel network-based algorithm of predicting protein-protein interactions with mixed membership stochastic blockmodel. IEEE/ACM Trans Comput Biol Bioinform  2023;20(2):1606–12. [DOI] [PubMed] [Google Scholar]
  • 39. Gao  M, Skolnick  J. A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLoS Comput Biol  2009;5(11):e1000567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Zhao  H, Yang  Y, Zhou  Y. Highly accurate and high-resolution function prediction of RNA binding proteins by fold recognition and binding affinity prediction. RNA Biol  2011;8(6):988–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Hu  X, Feng  C, Zhou  Y, et al.  DeepTrio: a ternary prediction system for protein–protein interaction using mask multiple parallel convolutional neural networks. Bioinformatics  2022;38(3):694–702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Ieremie  I, Ewing  RM, Niranjan  M. TransformerGO: predicting protein–protein interactions by modelling the attention between sets of gene ontology terms. Bioinformatics  2022;38(8):2269–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Abdin  O, Nim  S, Wen  H, Kim  PM. PepNN: a deep attention model for the identification of peptide binding sites. Commun Biol  2022;5(1):503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Johansson-Åkhe  I, Mirabello  C, Wallner  B. Predicting protein-peptide interaction sites using distant protein complexes as structural templates. Sci Rep  2019;9(1):4267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Chen  K, Mizianty  MJ, Kurgan  L. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics  2012;28(3):331–41. [DOI] [PubMed] [Google Scholar]
  • 46. Hu  J, Zheng  LL, Bai  YS, et al.  Accurate prediction of protein-ATP binding residues using position-specific frequency matrix. Anal Biochem  2021;626:114241. [DOI] [PubMed] [Google Scholar]
  • 47. Liu  R, Hu  J. HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information. BMC Bioinformatics  2011;12:207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Liu  R, Hu  J. Computational prediction of heme-binding residues by exploiting residue interaction network. PloS One  2011;6(10):e25560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Lu  CH, Chen  CC, Yu  CS, et al.  MIB2: metal ion-binding site prediction and modeling server. Bioinformatics  2022;38(18):4428–9. [DOI] [PubMed] [Google Scholar]
  • 50. Mohamadi  A, Cheng  T, Jin  L, et al.  An ensemble 3D deep-learning model to predict protein metal-binding site. Cell Rep Phys Sci  2022;3(9):101046. [Google Scholar]
  • 51. Paiva  VA, Mendonça  MV, Silveira  SA, et al.  GASS-metal: identifying metal-binding sites on protein structures using genetic algorithms. Brief Bioinform  2022;23(5):bbac178. [DOI] [PubMed] [Google Scholar]
  • 52. Sanchez-Garcia  R, Sorzano  COS, Carazo  JM, Segura  J. BIPSPI: a method for the prediction of partner-specific protein–protein interfaces. Bioinformatics  2019;35(3):470–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Sanchez-Garcia  R, Macias  J, Sorzano  C, et al.  BIPSPI+: mining type-specific datasets of protein complexes to improve protein binding site prediction. J Mol Biol  2022;434(11):167556. [DOI] [PubMed] [Google Scholar]
  • 54. Qiu  J, Bernhofer  M, Heinzinger  M, et al.  ProNA2020 predicts protein–DNA, protein–RNA, and protein–protein binding proteins and residues from sequence. J Mol Biol  2020;432(7):2428–43. [DOI] [PubMed] [Google Scholar]
  • 55. Li  Y, Golding  GB, Ilie  L. DELPHI: accurate deep ensemble model for protein interaction sites prediction. Bioinformatics  2021;37(7):896–904. [DOI] [PubMed] [Google Scholar]
  • 56. Bock  JR, Gough  DA. Predicting protein–protein interactions from primary structure. Bioinformatics  2001;17(5):455–60. [DOI] [PubMed] [Google Scholar]
  • 57. Shen  J, Zhang  J, Luo  X, et al.  Predicting protein–protein interactions based only on sequences information. Proc Natl Acad Sci  2007;104(11):4337–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Guo  Y, Yu  L, Wen  Z, Li  M. Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res  2008;36(9):3025–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Yang  L, Xia  JF, Gui  J. Prediction of protein-protein interactions from protein sequence using local descriptors. Protein Pept Lett  2010;17(9):1085–90. [DOI] [PubMed] [Google Scholar]
  • 60. Pan  X-Y, Zhang  Y-N, Shen  H-B. Large-scale prediction of human protein− protein interactions from amino acid sequence based on latent topic features. J Proteome Res  2010;9(10):4992–5001. [DOI] [PubMed] [Google Scholar]
  • 61. You  Z-H, Lei  Y-K, Zhu  L, et al.  Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinformatics 2013;14 Suppl 8(Suppl 8):S10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. You  Z-H, Zhu  L, Zheng  C-H, et al.  Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinformatics 2014;15 Suppl 15(Suppl 15):S9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. You  ZH, Li  S, Gao  X, et al.  Large-scale protein-protein interactions detection by integrating big biosensing data with computational model. Biomed Res Int  2014;2014:598129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Wong  L, You  Z-H, Li  S, et al.  Detection of protein-protein interactions from amino acid sequences using a rotation forest model with a novel PR-LPQ descriptor. In: International Conference on Intelligent Computing. Fuzhou, China: Springer, 2015, 713–20.
  • 65. Du  X, Sun  S, Hu  C, et al.  DeepPPI: boosting prediction of protein–protein interactions with deep neural networks. J Chem Inf Model  2017;57(6):1499–510. [DOI] [PubMed] [Google Scholar]
  • 66. Sun  T, Zhou  B, Lai  L, Pei  J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics  2017;18:1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Hashemifar  S, Neyshabur  B, Khan  AA, Xu  J. Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics  2018;34(17):i802–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Li  H, Gong  XJ, Yu  H, Zhou  C. Deep neural network based predictions of protein interactions using primary sequences. Molecules  2018;23(8):1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Chen  M, Ju  CJ-T, Zhou  G, et al.  Multifaceted protein–protein interaction prediction based on Siamese residual RCNN. Bioinformatics  2019;35(14):i305–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Chen  C, Zhang  Q, Ma  Q, Yu  B. LightGBM-PPI: predicting protein-protein interactions through LightGBM with multi-information fusion. Chemom Intel Lab Syst  2019;191:54–64. [Google Scholar]
  • 71. Kovács  IA, Luck  K, Spirohn  K, et al.  Network-based prediction of protein interactions. Nat Commun  2019;10(1):1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Chen  Y, Wang  W, Liu  J, et al.  Protein interface complementarity and gene duplication improve link prediction of protein-protein interaction network. Front Genet  2020;11:291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Sledzieski  S, Singh  R, Cowen  L, Berger  B. D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions. Cell Syst  2021;12(10):969–82.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Bryant  P, Pozzati  G, Elofsson  A. Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun  2022;13(1):1265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Zhang  J, Zhu  M, Qian  Y. protein2vec: predicting protein-protein interactions based on LSTM. IEEE/ACM Trans Comput Biol Bioinform  2022;19(3):1257–66. [DOI] [PubMed] [Google Scholar]
  • 76. Xenarios  I, Rice  DW, Salwinski  L, et al.  DIP: the database of interacting proteins. Nucleic Acids Res  2000;28(1):289–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Keshava Prasad  TS, Goel  R, Kandasamy  K, et al.  Human protein reference database--2009 update. Nucleic Acids Res  2009;37(Database issue):D767–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Schaefer  MH, Fontaine  JF, Vinayagam  A, et al.  HIPPIE: integrating protein interaction networks with experiment based quality scores. PloS One  2012;7(2):e31826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Das  J, Yu  H. HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Syst Biol  2012;6:92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Moal  IH, Fernández-Recio  J. SKEMPI: a structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics  2012;28(20):2600–7. [DOI] [PubMed] [Google Scholar]
  • 81. Oughtred  R, Rust  J, Chang  C, et al.  The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci  2021;30(1):187–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82. Petrey  D, Zhao  H, Trudeau  S, et al.  PrePPI: a structure informed proteome-wide database of protein-protein interactions. J Mol Biol 2023;435(14):168052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Del Toro  N, Shrivastava  A, Ragueneau  E, et al.  The IntAct database: efficient access to fine-grained molecular interaction data. Nucleic Acids Res  2022;50(D1):D648–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84. Zhang  J, Ma  Z, Kurgan  L. Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains. Brief Bioinform  2019;20(4):1250–68. [DOI] [PubMed] [Google Scholar]
  • 85. Cui  J, Han  LY, Li  H, et al.  Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties. Mol Immunol  2007;44(4):514–20. [DOI] [PubMed] [Google Scholar]
  • 86. Zhang  ZH, Koh  JL, Zhang  GL, et al.  AllerTool: a web server for predicting allergenicity and allergic cross-reactivity in proteins. Bioinformatics  2007;23(4):504–6. [DOI] [PubMed] [Google Scholar]
  • 87. Gene ontology consortium: going forward. Nucleic Acids Res  2015;43(Database issue):D1049–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88. Mcguffin  LJ, Bryson  K, Jones  DT. The PSIPRED protein structure prediction server. Bioinformatics  2000;16(4):404–5. [DOI] [PubMed] [Google Scholar]
  • 89. Altschul  SF, Madden  TL, Schäffer  AA, et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res  1997;25(17):3389–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90. Kabsch  W, Sander  C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers  1983;22(12):2577–637. [DOI] [PubMed] [Google Scholar]
  • 91. Touw  WG, Baakman  C, Black  J, et al.  A series of PDB-related databanks for everyday needs. Nucleic Acids Res  2015;43(D1):D364–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Jani  J, Pappachan  A. Protein analysis: from sequence to structure. In: Singh  V, Kumar  A (eds). Advances in Bioinformatics. Singapore: Springer Singapore, 2021, 59–82. [Google Scholar]
  • 93. Jumper  J, Evans  R, Pritzel  A, et al.  Highly accurate protein structure prediction with AlphaFold. Nature  2021;596(7873):583–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94. Liu-Wei  W, Kafkas  Ş, Chen  J, et al.  DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes. Bioinformatics  2021;37(17):2722–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95. Dai  B, Bailey-Kellogg  C. Protein interaction interface region prediction by geometric deep learning. Bioinformatics  2021;37(17):2580–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96. Liu  Y, Yuan  H, Cai  L, et al.  Deep learning of high-order interactions for protein interface prediction. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Virtual Event, CA, USA: ACM, 2020, 679–87.
  • 97. Vreven  T, Moal  IH, Vangone  A, et al.  Updates to the integrated protein-protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J Mol Biol  2015;427(19):3031–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98. Holm  L, Sander  C. Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics  1998;14(5):423–9. [DOI] [PubMed] [Google Scholar]
  • 99. Gattiker  A, Michoud  K, Rivoire  C, et al.  Automated annotation of microbial proteomes in SWISS-PROT. Comput Biol Chem  2003;27(1):49–58. [DOI] [PubMed] [Google Scholar]
  • 100. Fariselli  P, Pazos  F, Valencia  A, Casadio  R. Prediction of protein–protein interaction sites in heterocomplexes with neural networks. Eur J Biochem  2002;269(5):1356–61. [DOI] [PubMed] [Google Scholar]
  • 101. Ofran  Y, Rost  B. Predicted protein–protein interaction sites from local sequence information. FEBS Lett  2003;544(1-3):236–9. [DOI] [PubMed] [Google Scholar]
  • 102. Fernandez-Recio  J, Totrov  M, Skorodumov  C, Abagyan  R. Optimal docking area: a new method for predicting protein–protein interaction sites. Proteins  2005;58(1):134–43. [DOI] [PubMed] [Google Scholar]
  • 103. Burgoyne  NJ, Jackson  RM. Predicting protein interaction sites: binding hot-spots in protein–protein and protein–ligand interfaces. Bioinformatics  2006;22(11):1335–42. [DOI] [PubMed] [Google Scholar]
  • 104. Porollo  A, Meller  J. Prediction-based fingerprints of protein–protein interactions. Proteins  2007;66(3):630–45. [DOI] [PubMed] [Google Scholar]
  • 105. Ofran  Y, Rost  B. ISIS: interaction sites identified from sequence. Bioinformatics  2007;23(2):e13–6. [DOI] [PubMed] [Google Scholar]
  • 106. Meszaros  B, Simon  I, Dosztanyi  Z. Prediction of protein binding regions in disordered proteins. PLoS Comput Biol  2009;5(5):e1000376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107. Sikić  M, Tomić  S, Vlahovicek  K. Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Comput Biol  2009;5(1):e1000278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108. Murakami  Y, Mizuguchi  K. Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites. Bioinformatics  2010;26(15):1841–8. [DOI] [PubMed] [Google Scholar]
  • 109. Singh  G, Dhole  K, Pai  PP, et al.  SPRINGS: prediction of protein-protein interaction sites using artificial neural networks. PeerJ PrePrints 2014;2:e266v2. [Google Scholar]
  • 110. Dhole  K, Singh  G, Pai  PP, Mondal  S. Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier. J Theor Biol  2014;348:47–54. [DOI] [PubMed] [Google Scholar]
  • 111. Wei  ZS, Yang  JY, Shen  HB, Yu  DJ. A cascade random forests algorithm for predicting protein-protein interaction sites. IEEE Trans Nanobioscience  2015;14(7):746–60. [DOI] [PubMed] [Google Scholar]
  • 112. Liu  GH, Shen  HB, Yu  DJ. Prediction of protein-protein interaction sites with machine-learning-based data-cleaning and post-filtering procedures. J Membr Biol  2016;249(1-2):141–53. [DOI] [PubMed] [Google Scholar]
  • 113. Wei  Z-S, Han  K, Yang  J-Y, et al.  Protein–protein interaction sites prediction by ensembling SVM and sample-weighted random forests. Neurocomputing  2016;193:201–12. [Google Scholar]
  • 114. Hou  Q, De Geest  PFG, Vranken  WF, et al.  Seeing the trees through the forest: sequence-based homo- and heteromeric protein-protein interaction sites prediction using random forest. Bioinformatics  2017;33(10):1479–87. [DOI] [PubMed] [Google Scholar]
  • 115. Jiménez  J, Doerr  S, Martínez-Rosell  G, et al.  DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics  2017;33(19):3036–42. [DOI] [PubMed] [Google Scholar]
  • 116. Wang  X, Yu  B, Ma  A, et al.  Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique. Bioinformatics  2019;35(14):2395–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117. Zhang  J, Kurgan  L. SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences. Bioinformatics  2019;35(14):i343–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118. Townshend  R, Bedi  R, Suriana  P, et al.  End-to-end learning on 3D protein structure for interface prediction. Adv Neural Inf Process Syst  2019;15642–51. [Google Scholar]
  • 119. Zhang  B, Li  J, Quan  L, et al.  Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network. Neurocomputing  2019;357:86–100. [Google Scholar]
  • 120. Deng  A, Zhang  H, Wang  W, et al.  Developing computational model to predict protein-protein interaction sites based on the XGBoost algorithm. Int J Mol Sci  2020;21(7):2274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121. Gainza  P, Sverrisson  F, Monti  F, et al.  Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat Methods  2020;17(2):184–92. [DOI] [PubMed] [Google Scholar]
  • 122. Mahbub  S, Bayzid  MS. EGRET: edge aggregated graph attention networks and transfer learning improve protein-protein interaction site prediction. Brief Bioinform  2022;23(2):bbab578. [DOI] [PubMed] [Google Scholar]
  • 123. Tubiana  J, Schneidman-Duhovny  D, Wolfson  HJ. ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nat Methods  2022;19(6):730–9. [DOI] [PubMed] [Google Scholar]
  • 124. Khan  SH, Tayara  H, Chong  KT. ProB-site: protein binding site prediction using local features. Cells  2022;11(13):2117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125. Zhang  J, Ghadermarzi  S, Kurgan  L. Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins. Bioinformatics  2020;36(18):4729–38. [DOI] [PubMed] [Google Scholar]
  • 126. Peng  Z, Kurgan  L. High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder. Nucleic Acids Res  2015;43(18):e121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127. Dodge  C, Schneider  R, Sander  C. The HSSP database of protein structure—sequence alignments and family profiles. Nucleic Acids Res  1998;26(1):313–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128. Elnaggar  A, Heinzinger  M, Dallago  C, et al.  ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell  2022;44(10):7112–27. [DOI] [PubMed] [Google Scholar]
  • 129. Zhang  J, Chen  Q, Liu  B. iDRBP_MMC: identifying DNA-binding proteins and RNA-binding proteins based on multi-label learning model and motif-based convolutional neural network. J Mol Biol  2020;432(22):5860–75. [DOI] [PubMed] [Google Scholar]
  • 130. Zhang  J, Chen  Q, Liu  B. DeepDRBP-2L: a new genome annotation predictor for identifying DNA-binding proteins and RNA-binding proteins using convolutional neural network and long short-term memory. IEEE/ACM Trans Comput Biol Bioinform  2021;18(4):1451–63. [DOI] [PubMed] [Google Scholar]
  • 131. Rahman  MS, Shatabda  S, Saha  S, et al.  DPP-PseAAC: a DNA-binding protein prediction model using Chou's general PseAAC. J Theor Biol  2018;452:22–34. [DOI] [PubMed] [Google Scholar]
  • 132. Mishra  A, Pokhrel  P, Hoque  MT. StackDPPred: a stacking based prediction of DNA-binding protein from sequence. Bioinformatics  2019;35(3):433–41. [DOI] [PubMed] [Google Scholar]
  • 133. Li  G, Du  X, Li  X, et al.  Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning. PeerJ  2021;9:e11262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134. Ali  F, Kumar  H, Patil  S, et al.  DBP-DeepCNN: prediction of DNA-binding proteins using wavelet-based denoising and deep learning. Chemom Intel Lab Syst  2022;229:104639. [Google Scholar]
  • 135. Pradhan  UK, Meher  PK, Naha  S, et al.  PlDBPred: a novel computational model for discovery of DNA binding proteins in plants. Brief Bioinform  2023;24(1):bbac483. [DOI] [PubMed] [Google Scholar]
  • 136. Zheng  J, Zhang  X, Zhao  X, et al.  Deep-RBPPred: predicting RNA binding proteins in the proteome scale based on deep learning. Sci Rep  2018;8(1):15264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137. Bressin  A, Schulte-Sasse  R, Figini  D, et al.  TriPepSVM: de novo prediction of RNA-binding proteins based on short amino acid motifs. Nucleic Acids Res  2019;47(9):4406–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138. Sun  X, Jin  T, Chen  C, et al.  RBPro-RF: use Chou’s 5-steps rule to predict RNA-binding proteins via random forest with elastic net. Chemom Intel Lab Syst  2020;197:103919. [Google Scholar]
  • 139. Mishra  A, Khanal  R, Kabir  WU, Hoque  T. AIRBP: accurate identification of RNA-binding proteins using machine learning techniques. Artif Intell Med  2021;113:102034. [DOI] [PubMed] [Google Scholar]
  • 140. Zhang  J, Yan  K, Chen  Q, Liu  B. PreRBP-TL: prediction of species-specific RNA-binding proteins based on transfer learning. Bioinformatics  2022;38(8):2135–43. [DOI] [PubMed] [Google Scholar]
  • 141. Peng  X, Wang  X, Guo  Y, et al.  RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins. Brief Bioinform  2022;23(4):bbac215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142. Wang  N, Zhang  J, Liu  B. IDRBP-PPCT: identifying nucleic acid-binding proteins based on position-specific score matrix and position-specific frequency matrix cross transformation. IEEE/ACM Trans Comput Biol Bioinform  2022;19(4):2284–93. [DOI] [PubMed] [Google Scholar]
  • 143. Yan  J, Friedrich  S, Kurgan  L. A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues. Brief Bioinform  2016;17(1):88–105. [DOI] [PubMed] [Google Scholar]
  • 144. Li  M, Zhang  F, Kurgan  L. Machine learning methods for predicting protein-nucleic acids interactions. In: Kurgan L (ed.), Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics. Singapore: World Scientific, 2023, 265–87. [Google Scholar]
  • 145. Ahmad  S, Gromiha  MM, Sarai  A. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics  2004;20(4):477–86.14990443 [Google Scholar]
  • 146. Ahmad  S, Sarai  A. PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics  2005;6:33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147. Yan  C, Terribilini  M, Wu  F, et al.  Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinformatics  2006;7:262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148. Ho  SY, Yu  FC, Chang  CY, Huang  HL. Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method. Biosystems  2007;90(1):234–41. [DOI] [PubMed] [Google Scholar]
  • 149. Hwang  S, Gou  Z, Kuznetsov  IB. DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics  2007;23(5):634–6. [DOI] [PubMed] [Google Scholar]
  • 150. Ofran  Y, Mysore  V, Rost  B. Prediction of DNA-binding residues from sequence. Bioinformatics  2007;23(13):i347–53. [DOI] [PubMed] [Google Scholar]
  • 151. Wang  L, Yang  MQ, Yang  JY. Prediction of DNA-binding residues from protein sequence information using random forests. BMC Genomics  2009;10 Suppl 1(Suppl 1):S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152. Wu  J, Liu  H, Duan  X, et al.  Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature. Bioinformatics  2009;25(1):30–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153. Ma  X, Guo  J, Liu  HD, et al.  Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information. IEEE/ACM Trans Comput Biol Bioinform  2012;9(6):1766–75. [DOI] [PubMed] [Google Scholar]
  • 154. Dey  S, Pal  A, Guharoy  M, et al.  Characterization and prediction of the binding site in DNA-binding proteins: improvement of accuracy by combining residue composition, evolutionary conservation and structural parameters. Nucleic Acids Res  2012;40(15):7150–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155. Liu  R, Hu  J. DNABind: a hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches. Proteins  2013;81(11):1885–99. [DOI] [PubMed] [Google Scholar]
  • 156. Zhao  H, Wang  J, Zhou  Y, Yang  Y. Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome. PloS One  2014;9(5):e96694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157. Wang  W, Liu  J, Xiong  Y, et al.  Analysis and classification of DNA-binding sites in single-stranded and double-stranded DNA-binding proteins using protein information. IET Syst Biol  2014;8(4):176–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 158. Zhou  J, Xu  R, He  Y, et al.  PDNAsite: identification of DNA-binding site from protein sequence by incorporating spatial and sequence context. Sci Rep  2016;6:27653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 159. Wei  L, Tang  J, Zou  Q. Local-DPP: an improved DNA-binding protein prediction method by exploring local evolutionary information. Inform Sci  2017;384:135–44. [Google Scholar]
  • 160. Hu  J, Li  Y, Zhang  M, et al.  Predicting protein-DNA binding residues by weightedly combining sequence-based features and boosting multiple SVMs. IEEE/ACM Trans Comput Biol Bioinform  2017;14(6):1389–98. [DOI] [PubMed] [Google Scholar]
  • 161. Zhu  YH, Hu  J, Song  XN, Yu  DJ. DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines. J Chem Inf Model  2019;59(6):3057–71. [DOI] [PubMed] [Google Scholar]
  • 162. Nguyen  BP, Nguyen  QH, Doan-Ngoc  GN, et al.  iProDNA-CapsNet: identifying protein-DNA binding residues using capsule neural networks. BMC Bioinformatics  2019;20(Suppl 23):634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 163. Zhou  J, Lu  Q, Xu  R, et al.  EL_LSTM: prediction of DNA-binding residue from protein sequence by combining long short-term memory and ensemble learning. IEEE/ACM Trans Comput Biol Bioinform  2020;17(1):124–35. [DOI] [PubMed] [Google Scholar]
  • 164. Hu  J, Zhou  XG, Zhu  YH, et al.  TargetDBP: accurate DNA-binding protein prediction via sequence-based multi-view feature learning. IEEE/ACM Trans Comput Biol Bioinform  2020;17(4):1419–29. [DOI] [PubMed] [Google Scholar]
  • 165. Amirkhani  A, Kolahdoozi  M, Wang  C, Kurgan  LA. Prediction of DNA-binding residues in local segments of protein sequences with fuzzy cognitive maps. IEEE/ACM Trans Comput Biol Bioinform  2020;17(4):1372–82. [DOI] [PubMed] [Google Scholar]
  • 166. Zhang  J, Ghadermarzi  S, Katuwawala  A, Kurgan  L. DNAgenie: accurate prediction of DNA-type-specific binding residues in protein sequences. Brief Bioinform  2021;22(6):bbab336. [DOI] [PubMed] [Google Scholar]
  • 167. Jeong  E, Chung  IF, Miyano  S. A neural network method for identification of RNA-interacting residues in protein. Genome Inform  2004;15(1):105–16. [PubMed] [Google Scholar]
  • 168. Jeong  E, Miyano  S. A weighted profile based method for protein-RNA interacting residue prediction. In: Priami C, Cardelli L, Emmott S (eds.), Transactions on Computational Systems Biology IV. Berlin, Heidelberg: Springer, 2006, 123–39. [Google Scholar]
  • 169. Terribilini  M, Sander  JD, Lee  JH, et al.  RNABindR: a server for analyzing and predicting RNA-binding sites in proteins. Nucleic Acids Res  2007;35(Web Server issue):W578–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 170. Wang  Y, Xue  Z, Shen  G, Xu  J. PRINTR: prediction of RNA binding sites in proteins using SVM and profiles. Amino Acids  2008;35(2):295–302. [DOI] [PubMed] [Google Scholar]
  • 171. Tong  J, Jiang  P, Lu  ZH. RISP: a web-based server for prediction of RNA-binding sites in proteins. Comput Methods Programs Biomed  2008;90(2):148–53. [DOI] [PubMed] [Google Scholar]
  • 172. Kumar  M, Gromiha  MM, Raghava  GP. Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins  2008;71(1):189–94. [DOI] [PubMed] [Google Scholar]
  • 173. Cheng  CW, Su  EC, Hwang  JK, et al.  Predicting RNA-binding sites of proteins using support vector machines and evolutionary information. BMC Bioinformatics  2008;9 Suppl 12(Suppl 12):S6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 174. Murakami  Y, Spriggs  RV, Nakamura  H, Jones  S. PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences. Nucleic Acids Res  2010;38(Web Server issue):W412–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 175. Huang  YF, Chiu  LY, Huang  CC, Huang  CK. Predicting RNA-binding residues from evolutionary information and sequence conservation. BMC Genomics  2010;11 Suppl 4(Suppl 4):S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 176. Zhang  T, Zhang  H, Chen  K, et al.  Analysis and prediction of RNA-binding residues using sequence, evolutionary conservation, and predicted secondary structure and solvent accessibility. Curr Protein Pept Sci  2010;11(7):609–28. [DOI] [PubMed] [Google Scholar]
  • 177. Liu  ZP, Wu  LY, Wang  Y, et al.  Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics  2010;26(13):1616–22. [DOI] [PubMed] [Google Scholar]
  • 178. Pérez-Cano  L, Fernández-Recio  J. Optimal protein-RNA area, OPRA: a propensity-based method to identify RNA-binding sites on proteins. Proteins  2010;78(1):25–35. [DOI] [PubMed] [Google Scholar]
  • 179. Wang  CC, Fang  Y, Xiao  J, Li  M. Identification of RNA-binding sites in proteins by integrating various sequence information. Amino Acids  2011;40(1):239–48. [DOI] [PubMed] [Google Scholar]
  • 180. Ma  X, Guo  J, Wu  J, et al.  Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature. Proteins  2011;79(4):1230–9. [DOI] [PubMed] [Google Scholar]
  • 181. Choi  S, Han  K. Prediction of RNA-binding amino acids from protein and RNA sequences. BMC Bioinformatics  2011;12 Suppl 13(Suppl 13):S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 182. Walia  RR, Xue  LC, Wilkins  K, et al.  RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins. PloS One  2014;9(5):e97725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 183. Chen  YC, Sargsyan  K, Wright  JD, et al.  Identifying RNA-binding residues based on evolutionary conserved structural and energetic features. Nucleic Acids Res  2014;42(3):e15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 184. Li  S, Yamashita  K, Amada  KM, Standley  DM. Quantifying sequence and structural features of protein-RNA interactions. Nucleic Acids Res  2014;42(15):10086–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 185. Ren  H, Shen  Y. RNA-binding residues prediction using structural features. BMC Bioinformatics  2015;16:249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 186. Tuvshinjargal  N, Lee  W, Park  B, Han  K. PRIdictor: protein-RNA interaction predictor. Biosystems  2016;139:17–22. [DOI] [PubMed] [Google Scholar]
  • 187. Sun  M, Wang  X, Zou  C, et al.  Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors. BMC Bioinformatics  2016;17(1):231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 188. Tang  Y, Liu  D, Wang  Z, et al.  A boosting approach for prediction of protein-RNA binding residues. BMC Bioinformatics  2017;18(Suppl 13):465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 189. Luo  J, Liu  L, Venkateswaran  S, et al.  RPI-Bind: a structure-based method for accurate identification of RNA-protein binding sites. Sci Rep  2017;7(1):614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 190. Wang  L, Brown  SJ. BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res  2006;34(Web Server issue):W243–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 191. Wang  L, Huang  C, Yang  MQ, Yang  JY. BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst Biol  2010;4 Suppl 1(Suppl 1):S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 192. Carson  MB, Langlois  R, Lu  H. NAPS: a residue-level nucleic acid-binding prediction server. Nucleic Acids Res  2010;38(Web Server issue):W431–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 193. Yang  X, Wang  J, Sun  J, Liu  R. SNBRFinder: a sequence-based hybrid algorithm for enhanced prediction of nucleic acid-binding residues. PloS One  2015;10(7):e0133260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 194. Su  H, Liu  M, Sun  S, et al.  Improving the prediction of protein-nucleic acids binding residues via multiple sequence profiles and the consensus of complementary methods. Bioinformatics  2019;35(6):930–6. [DOI] [PubMed] [Google Scholar]
  • 195. Zhang  J, Chen  Q, Liu  B. NCBRPred: predicting nucleic acid binding residues in proteins based on multilabel learning. Brief Bioinform  2021;22(5):bbaa397. [DOI] [PubMed] [Google Scholar]
  • 196. Sun  Z, Zheng  S, Zhao  H, et al.  To improve prediction of binding residues with DNA, RNA, carbohydrate, and peptide via multi-task deep neural networks. IEEE/ACM Trans Comput Biol Bioinform  2022;19(6):3735–43. [DOI] [PubMed] [Google Scholar]
  • 197. Mcginnis  S, Madden  TL. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res  2004;32(Web Server issue):W20–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 198. Zhang  Y, Skolnick  J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res  2005;33(7):2302–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 199. Faraggi  E, Zhou  Y, Kloczkowski  A. Accurate single-sequence prediction of solvent accessible surface area using local and global features. Proteins  2014;82(11):3170–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 200. Magnan  CN, Baldi  P. SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics  2014;30(18):2592–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 201. Zhao  Z, Peng  Z, Yang  J. Improving sequence-based prediction of protein-peptide binding residues by introducing intrinsic disorder and a consensus method. J Chem Inf Model  2018;58(7):1459–68. [DOI] [PubMed] [Google Scholar]
  • 202. Taherzadeh  G, Zhou  Y, Liew  AW, Yang  Y. Structure-based prediction of protein- peptide binding regions using random forest. Bioinformatics  2018;34(3):477–84. [DOI] [PubMed] [Google Scholar]
  • 203. Petsalaki  E, Stark  A, García-Urdiales  E, Russell  RB. Accurate prediction of peptide binding sites on protein surfaces. PLoS Comput Biol  2009;5(3):e1000335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 204. Lavi  A, Ngan  CH, Movshovitz-Attias  D, et al.  Detection of peptide-binding sites on protein surfaces: the first step toward the modeling and targeting of peptide-mediated interactions. Proteins  2013;81(12):2096–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 205. Lei  Y, Li  S, Liu  Z, et al.  A deep-learning framework for multi-level peptide-protein interaction prediction. Nat Commun  2021;12(1):5465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 206. Wishart  DS, Feunang  YD, Guo  AC, et al.  DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res  2018;46(D1):D1074–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 207. Yang  J, Roy  A, Zhang  Y. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics  2013;29(20):2588–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 208. Babor  M, Gerzon  S, Raveh  B, et al.  Prediction of transition metal-binding sites from apo protein structures. Proteins  2008;70(1):208–17. [DOI] [PubMed] [Google Scholar]
  • 209. Chauhan  JS, Mishra  NK, Raghava  GP. Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinformatics  2009;10:434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 210. Chauhan  JS, Mishra  NK, Raghava  GP. Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information. BMC Bioinformatics  2010;11:301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 211. Brylinski  M, Skolnick  J. FINDSITE-metal: integrating evolutionary information and machine learning for structure-based metal-binding site prediction at the proteome level. Proteins  2011;79(3):735–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 212. Firoz  A, Malik  A, Joplin  KH, et al.  Residue propensities, discrimination and binding site prediction of adenine and guanine phosphates. BMC Biochem  2011;12:20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 213. Parca  L, Gherardini  PF, Helmer-Citterich  M, Ausiello  G. Phosphate binding sites identification in protein structures. Nucleic Acids Res  2011;39(4):1231–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 214. Passerini  A, Lippi  M, Frasconi  P. MetalDetector v2.0: predicting the geometry of metal binding sites from protein sequence. Nucleic Acids Res  2011;39(Web Server issue):W288–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 215. Chen  K, Mizianty  MJ, Kurgan  L. ATPsite: sequence-based prediction of ATP-binding residues. Proteome Sci  2011;9 Suppl 1(Suppl 1):S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 216. Yu  D-J, Hu  J, Tang  Z-M, et al.  Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing  2013;104:180–90. [Google Scholar]
  • 217. Yu  DJ, Hu  J, Huang  Y, et al.  TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J Comput Chem  2013;34(11):974–85. [DOI] [PubMed] [Google Scholar]
  • 218. He  W, Liang  Z, Teng  M, Niu  L. mFASD: a structure-based algorithm for discriminating different types of metal-binding sites. Bioinformatics  2015;31(12):1938–44. [DOI] [PubMed] [Google Scholar]
  • 219. Hu  X, Wang  K, Dong  Q. Protein ligand-specific binding residue predictions by an ensemble classifier. BMC Bioinformatics  2016;17(1):470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 220. Hu  X, Dong  Q, Yang  J, Zhang  Y. Recognizing metal and acid radical ion-binding sites by integrating ab initio modeling with template-based transferals. Bioinformatics  2016;32(21):3260–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 221. Hu  J, Li  Y, Yan  W-X, et al.  KNN-based dynamic query-driven sample rescaling strategy for class imbalance learning. Neurocomputing  2016;191:363–73. [Google Scholar]
  • 222. Hu  J, Li  Y, Zhang  Y, Yu  DJ. ATPbind: accurate protein-ATP binding site prediction by combining sequence-profiling and structure-based comparisons. J Chem Inf Model  2018;58(2):501–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 223. Wang  S, Hu  X, Feng  Z, et al.  Recognizing ion ligand binding sites by SMO algorithm. BMC Mol Cell Biol  2019;20(Suppl 3):53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 224. Liu  L, Hu  X, Feng  Z, et al.  Prediction of acid radical ion binding residues by K-nearest neighbors classifier. BMC Mol Cell Biol  2019;20(Suppl 3):52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 225. Xia  CQ, Pan  X, Shen  HB. Protein-ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data. Bioinformatics  2020;36(10):3018–27. [DOI] [PubMed] [Google Scholar]
  • 226. Lu  ZC, Jiang  F, Wu  YD. Phosphate binding sites prediction in phosphorylation-dependent protein-protein interactions. Bioinformatics  2021;37(24):4712–8. [DOI] [PubMed] [Google Scholar]
  • 227. Song  J, Liu  G, Jiang  J, et al.  Prediction of protein-ATP binding residues based on ensemble of deep convolutional neural networks and LightGBM algorithm. Int J Mol Sci  2021;22(2):939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 228. Pintar  A, Carugo  O, Pongor  S. DPX: for the analysis of the protein core. Bioinformatics  2003;19(2):313–4. [DOI] [PubMed] [Google Scholar]
  • 229. Jones  S, Thornton  JM. Analysis of protein-protein interaction sites using surface patches. J Mol Biol  1997;272(1):121–32. [DOI] [PubMed] [Google Scholar]
  • 230. Mihel  J, Sikić  M, Tomić  S, et al.  PSAIA - protein structure and interaction analyzer. BMC Struct Biol  2008;8:21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 231. Putignano  V, Rosato  A, Banci  L, Andreini  C. MetalPDB in 2018: a database of metal sites in biological macromolecular structures. Nucleic Acids Res  2018;46(D1):D459–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 232. Roy  A, Yang  J, Zhang  Y. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res  2012;40(Web Server issue):W471–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 233. Ravindranath  PA, Forli  S, Goodsell  DS, et al.  AutoDockFR: advances in protein-ligand docking with explicitly specified binding site flexibility. PLoS Comput Biol  2015;11(12):e1004586. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 234. Ravindranath  PA, Sanner  MF. AutoSite: an automated approach for pseudo-ligands prediction-from ligand-binding sites identification to predicting key ligand atoms. Bioinformatics  2016;32(20):3142–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 235. Chen  CC, Hwang  JK, Yang  JM. (PS)2-v2: template-based protein structure prediction server. BMC Bioinformatics  2009;10:366. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementary_data_bbae162

Data Availability Statement

The protein-ligand interaction datasets used in this manuscript are publicly available, and their specific sources are summarized as follows: The dataset used for evaluating sequenced-based PAIR-pro predictors is the virus-human interaction dataset [41, 94]. The dataset used for evaluating SINGLE-res predictors is the Test_60 dataset [124]. The DNA-129_Test dataseet is from publications [3, 5]. The dataset used to evaluate protein-RNA interaction residue predictors is from the publication [24].


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES